ModelExecutor

ModelExecutor is an on-device neural network inference API for SiMa.ai MLSoC. It abstracts GStreamer pipeline setup, model loading, input preprocessing, and output retrieval behind a simple C++ and Python interface.

Overview

Given a model packaged as a .tar.gz archive, ModelExecutor:

  1. Extracts and validates the archive

  2. Reads pipeline, preprocessing, and quantization configuration

  3. Constructs and launches a GStreamer pipeline (appsrc preprocessing MLA postprocessing appsink)

  4. Handles frame injection, output retrieval, and memory management

Both synchronous (blocking) and asynchronous (callback-based) inference modes are supported, along with an integrated profiling mode for KPI measurement.

Architecture

┌──────────────────────────────────────────────────────────────────┐
│                        ModelExecutor                             │
│                                                                  │
│  init(model.tar.gz, options)                                     │
│    ├─ Extract archive → /var/tmp/modelExecutor/models/           │
│    ├─ Parse pipeline_sequence.json, preprocessing.json           │
│    └─ Launch GStreamer pipeline:                                 │
│         appsrc → [preproc] → MLA → [postproc] → appsink          │
│                                                                  │
│  runSynchronous(inputs) ──────────────────────► outputs          │
│                                                                  │
│  runAsynchronous(inputs, metadata, callback)                     │
│    ├─ Pending queue (max 8)                                      │
│    ├─ Buffer prep worker                                         │
│    ├─ Async producer → appsrc                                    │
│    ├─ Appsink consumer                                           │
│    ├─ Copy workers (×4)                                          │
│    └─ Callback worker ──────────────────────► callback(output)   │
└──────────────────────────────────────────────────────────────────┘

Installation

ModelExecutor and its Python bindings come preinstalled on both eLxr and Yocto images. No separate installation step is needed. Example applications are available on the device at /usr/local/simaai/examples/model_executor/.

Threading Model

Operation

Thread Safety

init() / stop()

Must be called from the same thread

runSynchronous()

Caller must serialize calls (single-threaded)

runAsynchronous()

Thread-safe — concurrent calls from multiple threads allowed

Callbacks

Always invoked on the callback worker thread; serialized one at a time

Asynchronous mode spawns 7 internal worker threads on first use: buffer-prep, async producer, appsink consumer, 4 copy workers, and 1 callback worker. All threads are joined on stop().

Model Archive Format

The .tar.gz archive must contain:

{project_name}/
├── etc/
│   ├── pipeline_sequence.json   GStreamer plugin chain definition
│   ├── preprocessing.json       Input size, format, and normalization metadata
│   └── quantization.json        Quantization parameters
└── lib/
    ├── *.elf                    Model binary (EV74 kernel)
    └── *.so                     Model shared library (A65 kernel)

Archives are extracted to /var/tmp/modelExecutor/models/ on the first init() call.

Known Limitations

  • EV74 kernel: maximum tensor dimension of 4096. Use KernelType::A65 for larger tensors.

  • Maximum of 15 pipeline segments per model. init() will fail if exceeded.

  • No mixed-precision models. All layers must use a single uniform precision (INT8, INT16, or BF16).

  • Python runSynchronous() returns only the first output tensor. Use runAsynchronous() or the C++ API to retrieve all outputs from multi-output models.

  • No re-initialization without stop(). Calling init() again without first calling stop() is not supported.