ModelExecutor

ModelExecutor is an on-device neural network inference API for SiMa.ai MLSoC. It abstracts GStreamer pipeline setup, model loading, input preprocessing, and output retrieval behind a simple C++ and Python interface.

Overview 

Given a model packaged as a .tar.gz archive, ModelExecutor:

Extracts and validates the archive
Reads pipeline, preprocessing, and quantization configuration
Constructs and launches a GStreamer pipeline (appsrc → preprocessing → MLA → postprocessing → appsink)
Handles frame injection, output retrieval, and memory management

Both synchronous (blocking) and asynchronous (callback-based) inference modes are supported, along with an integrated profiling mode for KPI measurement.

Architecture 

┌──────────────────────────────────────────────────────────────────┐
│                        ModelExecutor                             │
│                                                                  │
│  init(model.tar.gz, options)                                     │
│    ├─ Extract archive → /var/tmp/modelExecutor/models/           │
│    ├─ Parse pipeline_sequence.json, preprocessing.json           │
│    └─ Launch GStreamer pipeline:                                 │
│         appsrc → [preproc] → MLA → [postproc] → appsink          │
│                                                                  │
│  runSynchronous(inputs) ──────────────────────► outputs          │
│                                                                  │
│  runAsynchronous(inputs, metadata, callback)                     │
│    ├─ Pending queue (max 8)                                      │
│    ├─ Buffer prep worker                                         │
│    ├─ Async producer → appsrc                                    │
│    ├─ Appsink consumer                                           │
│    ├─ Copy workers (×4)                                          │
│    └─ Callback worker ──────────────────────► callback(output)   │
└──────────────────────────────────────────────────────────────────┘

Installation 

ModelExecutor and its Python bindings come preinstalled on both eLxr and Yocto images. No separate installation step is needed. Example applications are available on the device at /usr/local/simaai/examples/model_executor/.

Threading Model 

Operation	Thread Safety
`init()` / `stop()`	Must be called from the same thread
`runSynchronous()`	Caller must serialize calls (single-threaded)
`runAsynchronous()`	Thread-safe — concurrent calls from multiple threads allowed
Callbacks	Always invoked on the callback worker thread; serialized one at a time

Asynchronous mode spawns 7 internal worker threads on first use: buffer-prep, async producer, appsink consumer, 4 copy workers, and 1 callback worker. All threads are joined on stop().

Model Archive Format 

The .tar.gz archive must contain:

{project_name}/
├── etc/
│   ├── pipeline_sequence.json   GStreamer plugin chain definition
│   ├── preprocessing.json       Input size, format, and normalization metadata
│   └── quantization.json        Quantization parameters
└── lib/
    ├── *.elf                    Model binary (EV74 kernel)
    └── *.so                     Model shared library (A65 kernel)

Archives are extracted to /var/tmp/modelExecutor/models/ on the first init() call.

Known Limitations 

EV74 kernel: maximum tensor dimension of 4096. Use KernelType::A65 for larger tensors.
Maximum of 15 pipeline segments per model. init() will fail if exceeded.
No mixed-precision models. All layers must use a single uniform precision (INT8, INT16, or BF16).
Python runSynchronous() returns only the first output tensor. Use runAsynchronous() or the C++ API to retrieve all outputs from multi-output models.
No re-initialization without stop(). Calling init() again without first calling stop() is not supported.

ModelExecutor

Overview

Architecture

Installation

Threading Model

Model Archive Format

Known Limitations