ModelExecutor
ModelExecutor is an on-device neural network inference API for SiMa.ai MLSoC. It abstracts
GStreamer pipeline setup, model loading, input preprocessing, and output retrieval behind a simple
C++ and Python interface.
Overview
Given a model packaged as a .tar.gz archive, ModelExecutor:
Extracts and validates the archive
Reads pipeline, preprocessing, and quantization configuration
Constructs and launches a GStreamer pipeline (
appsrc → preprocessing → MLA → postprocessing → appsink)Handles frame injection, output retrieval, and memory management
Both synchronous (blocking) and asynchronous (callback-based) inference modes are supported, along with an integrated profiling mode for KPI measurement.
Architecture
┌──────────────────────────────────────────────────────────────────┐
│ ModelExecutor │
│ │
│ init(model.tar.gz, options) │
│ ├─ Extract archive → /var/tmp/modelExecutor/models/ │
│ ├─ Parse pipeline_sequence.json, preprocessing.json │
│ └─ Launch GStreamer pipeline: │
│ appsrc → [preproc] → MLA → [postproc] → appsink │
│ │
│ runSynchronous(inputs) ──────────────────────► outputs │
│ │
│ runAsynchronous(inputs, metadata, callback) │
│ ├─ Pending queue (max 8) │
│ ├─ Buffer prep worker │
│ ├─ Async producer → appsrc │
│ ├─ Appsink consumer │
│ ├─ Copy workers (×4) │
│ └─ Callback worker ──────────────────────► callback(output) │
└──────────────────────────────────────────────────────────────────┘
Installation
ModelExecutor and its Python bindings come preinstalled on both eLxr and Yocto images.
No separate installation step is needed. Example applications are available on the device at
/usr/local/simaai/examples/model_executor/.
Threading Model
Operation |
Thread Safety |
|---|---|
|
Must be called from the same thread |
|
Caller must serialize calls (single-threaded) |
|
Thread-safe — concurrent calls from multiple threads allowed |
Callbacks |
Always invoked on the callback worker thread; serialized one at a time |
Asynchronous mode spawns 7 internal worker threads on first use: buffer-prep, async producer,
appsink consumer, 4 copy workers, and 1 callback worker. All threads are joined on stop().
Model Archive Format
The .tar.gz archive must contain:
{project_name}/
├── etc/
│ ├── pipeline_sequence.json GStreamer plugin chain definition
│ ├── preprocessing.json Input size, format, and normalization metadata
│ └── quantization.json Quantization parameters
└── lib/
├── *.elf Model binary (EV74 kernel)
└── *.so Model shared library (A65 kernel)
Archives are extracted to /var/tmp/modelExecutor/models/ on the first init() call.
Known Limitations
EV74 kernel: maximum tensor dimension of 4096. Use
KernelType::A65for larger tensors.Maximum of 15 pipeline segments per model.
init()will fail if exceeded.No mixed-precision models. All layers must use a single uniform precision (INT8, INT16, or BF16).
Python
runSynchronous()returns only the first output tensor. UserunAsynchronous()or the C++ API to retrieve all outputs from multi-output models.No re-initialization without
stop(). Callinginit()again without first callingstop()is not supported.