.. _model_executor: ModelExecutor ############# ``ModelExecutor`` is an on-device neural network inference API for SiMa.ai MLSoC. It abstracts GStreamer pipeline setup, model loading, input preprocessing, and output retrieval behind a simple C++ and Python interface. .. contents:: On this page :local: :depth: 2 Overview -------- Given a model packaged as a ``.tar.gz`` archive, ``ModelExecutor``: 1. Extracts and validates the archive 2. Reads pipeline, preprocessing, and quantization configuration 3. Constructs and launches a GStreamer pipeline (``appsrc → preprocessing → MLA → postprocessing → appsink``) 4. Handles frame injection, output retrieval, and memory management Both **synchronous** (blocking) and **asynchronous** (callback-based) inference modes are supported, along with an integrated profiling mode for KPI measurement. Architecture ------------ .. code-block:: text ┌──────────────────────────────────────────────────────────────────┐ │ ModelExecutor │ │ │ │ init(model.tar.gz, options) │ │ ├─ Extract archive → /var/tmp/modelExecutor/models/ │ │ ├─ Parse pipeline_sequence.json, preprocessing.json │ │ └─ Launch GStreamer pipeline: │ │ appsrc → [preproc] → MLA → [postproc] → appsink │ │ │ │ runSynchronous(inputs) ──────────────────────► outputs │ │ │ │ runAsynchronous(inputs, metadata, callback) │ │ ├─ Pending queue (max 8) │ │ ├─ Buffer prep worker │ │ ├─ Async producer → appsrc │ │ ├─ Appsink consumer │ │ ├─ Copy workers (×4) │ │ └─ Callback worker ──────────────────────► callback(output) │ └──────────────────────────────────────────────────────────────────┘ Installation ------------ ``ModelExecutor`` and its Python bindings come **preinstalled** on both eLxr and Yocto images. No separate installation step is needed. Example applications are available on the device at ``/usr/local/simaai/examples/model_executor/``. Threading Model --------------- .. list-table:: :widths: 40 60 :header-rows: 1 * - **Operation** - **Thread Safety** * - ``init()`` / ``stop()`` - Must be called from the **same thread** * - ``runSynchronous()`` - Caller must **serialize** calls (single-threaded) * - ``runAsynchronous()`` - **Thread-safe** — concurrent calls from multiple threads allowed * - Callbacks - Always invoked on the **callback worker thread**; serialized one at a time Asynchronous mode spawns 7 internal worker threads on first use: buffer-prep, async producer, appsink consumer, 4 copy workers, and 1 callback worker. All threads are joined on ``stop()``. Model Archive Format -------------------- The ``.tar.gz`` archive must contain: .. code-block:: text {project_name}/ ├── etc/ │ ├── pipeline_sequence.json GStreamer plugin chain definition │ ├── preprocessing.json Input size, format, and normalization metadata │ └── quantization.json Quantization parameters └── lib/ ├── *.elf Model binary (EV74 kernel) └── *.so Model shared library (A65 kernel) Archives are extracted to ``/var/tmp/modelExecutor/models/`` on the first ``init()`` call. Known Limitations ----------------- - **EV74 kernel: maximum tensor dimension of 4096.** Use ``KernelType::A65`` for larger tensors. - **Maximum of 15 pipeline segments per model.** ``init()`` will fail if exceeded. - **No mixed-precision models.** All layers must use a single uniform precision (INT8, INT16, or BF16). - **Python** ``runSynchronous()`` **returns only the first output tensor.** Use ``runAsynchronous()`` or the C++ API to retrieve all outputs from multi-output models. - **No re-initialization without** ``stop()``. Calling ``init()`` again without first calling ``stop()`` is not supported.