.. _model_executor_python_api:

Python API
##########

**Module:** ``simaai_model_executor``

.. code-block:: python

   import simaai_model_executor as me
   executor = me.ModelExecutor()

Constants
---------

.. list-table::
   :widths: 40 15 45
   :header-rows: 1

   * - **Name**
     - **Value**
     - **Description**
   * - ``DEFAULT_MODEL_EXECUTOR_DURATION_SECONDS``
     - ``30``
     - Default profiling duration in seconds
   * - ``DEFAULT_MODEL_EXECUTOR_OUTPUT_DIR``
     - ``"/var/tmp"``
     - Default output directory for profiling results

Enumerations
------------

.. list-table::
   :widths: 30 70
   :header-rows: 1

   * - **Enum**
     - **Values**
   * - ``me.KernelType``
     - ``EV74``, ``A65``
   * - ``me.ColorFormat``
     - ``COLOR_FORMAT_RGB``, ``COLOR_FORMAT_BGR``, ``COLOR_FORMAT_IYUV``, ``COLOR_FORMAT_NV12``, ``COLOR_FORMAT_GRAY``

Input Format
------------

The Python binding accepts inputs as either:

- A single ``numpy.ndarray`` — for single-input models.
- A ``dict[str, numpy.ndarray]`` — for multi-input models, where keys are input tensor names.

The **dtype** must match initialization:

- ``float32`` — when ``init()`` was called **without** ``mean``/``stddev``.
- ``uint8`` — when ``init()`` was called **with** ``mean`` and ``stddev``; normalization is applied internally.

Arrays must be **C-contiguous** and **native byte order**.

Methods
-------

init()
~~~~~~

.. code-block:: python

   executor.init(
       tarGzFilePath,                         # str — path to .tar.gz model archive
       kernelType=me.KernelType.EV74,         # me.KernelType.EV74 or me.KernelType.A65
       mean=[],                               # list[float] — per-channel means (empty = skip)
       stddev=[],                             # list[float] — per-channel std devs (empty = skip)
       interpolationType=1,                   # int — 1=BILINEAR, 2=BICUBIC, 3=NEAREST, 4=AREA
       resizePreservingAspectRatio=False,      # bool
       paddingPosition=0,                     # int — 0=CENTER, 1=TOP, 2=BOTTOM
   )

initBoxdecode()
~~~~~~~~~~~~~~~

Use for models with on-device NMS and top-k post-processing.

.. code-block:: python

   executor.initBoxdecode(
       tarGzFilePath,                         # str
       kernelType=me.KernelType.EV74,
       mean=[],
       stddev=[],
       interpolationType=1,
       resizePreservingAspectRatio=False,
       paddingPosition=0,
       decodeType="",                         # str — "yolov5", "ssd", "" = auto-detect
       topk=0,                                # int — max detections after NMS (0 = no limit)
       numClasses=0,                          # int
       detectionThreshold=-1.0,               # float — negative = use model default
       nmsIouThreshold=-1.0,                  # float — negative = use model default
       originalWidth=0,                       # int — 0 = use tensor width
       originalHeight=0,                      # int — 0 = use tensor height
       sigmoidOnProbabilities=-1,             # int — 1=yes, 0=no, -1=auto
   )

runSynchronous()
~~~~~~~~~~~~~~~~

Blocks until inference completes. Returns the first output tensor.

.. code-block:: python

   output = executor.runSynchronous(inputs)
   # inputs:  numpy.ndarray (float32 or uint8) or dict[str, numpy.ndarray]
   # returns: numpy.ndarray (float32) — first output tensor only

.. note::

   Only the first output tensor is returned. Use ``runAsynchronous()`` or the C++ API
   to retrieve all outputs from multi-output models.

Example:

.. code-block:: python

   import numpy as np

   frame = np.random.rand(1, 224, 224, 3).astype(np.float32)
   output = executor.runSynchronous(frame)

   # Multi-input model
   output = executor.runSynchronous({"input_image": frame, "mask": mask_array})

runAsynchronous()
~~~~~~~~~~~~~~~~~

Non-blocking. Returns immediately after enqueuing. The callback is invoked on a dedicated
worker thread.

.. code-block:: python

   pushed = executor.runAsynchronous(
       inputs,      # numpy.ndarray or dict[str, numpy.ndarray]
       metaData,    # None, bool, int, float, str, list, or dict
       callback,    # callable(output, metaData, ok) -> None
   )
   # returns: bool — True if enqueued, False if executor is stopping

The callback receives:

- ``output`` — ``numpy.ndarray`` (single output) or ``list[numpy.ndarray]`` (multiple outputs)
- ``metaData`` — the value passed to ``runAsynchronous()``, converted back to Python
- ``ok`` — ``bool``, ``False`` on failure

Example:

.. code-block:: python

   import threading

   done = threading.Event()
   result = {}

   def callback(output, meta, ok):
       if ok:
           result["output"] = output
           result["meta"] = meta
       done.set()

   executor.runAsynchronous(frame, {"frame_id": 1}, callback)
   done.wait(timeout=10)

profileModel()
~~~~~~~~~~~~~~

Runs synthetic inference for a fixed duration and returns JSON-encoded KPI metrics.

.. code-block:: python

   kpi_json_str = executor.profileModel(
       duration_seconds=30,          # int (default: 30)
       output_directory="/var/tmp",  # str (default: "/var/tmp")
       run_synchronous=False,        # bool — False = async mode (default)
   )
   # returns: str — JSON-encoded results

   import json
   kpi = json.loads(kpi_json_str)
   # keys: total_frames, throughput_fps,
   #       latency_min_ms, latency_max_ms, latency_avg_ms,
   #       latency_p50_ms, latency_p95_ms, latency_p99_ms