C++ APIο
Header: #include "ModelExecutor.hpp"
Namespace: simaai::
Library: libsimaai_model_executor_api.so
ModelExecutorο
-
class ModelExecutorο
Manages and executes a GStreamer-based model inference pipeline on SiMaβs MLSoC.
Encapsulates the full lifecycle of MLSoC-accelerated inference: model loading, input preprocessing, MLA execution, and output retrieval via custom GStreamer plugins.
Non-copyable, non-movable. Uses PIMPL for resource management.
- Lifecycle
Construct β allocates internal state; the pipeline is not yet usable.
init() / initBoxdecode() β extracts the model tarball, reads preprocessing config, registers GStreamer plugins, and builds the pipeline. Pipeline may be built lazily on the first inference call when generic_preproc (kernel 200) is enabled (non-empty mean/stddev).
run*() β execute inference (synchronous or asynchronous). May be called repeatedly after a successful init().
stop() β tears down worker threads, transitions the pipeline to NULL, and releases all resources. Cannot run inference again unless init() is called. The destructor calls stop() automatically.
- Threading
Synchronous mode blocks the calling thread. No extra threads are created.
Asynchronous mode supports concurrent runAsynchronous() calls from multiple threads.A worker pool is spawned on the first call (buffer-prep, producer, consumer, 4 copy-workers, callback thread) and runs until stop() is called.
- Ownership
Input tensors are caller-owned; the executor copies data into GStreamer buffers. Output tensors are newly allocated: caller owns the return value (sync) or receives ownership through the callback (async).
Public Types
-
enum InterpolationTypeο
Interpolation method used when resizing input images to the modelβs expected dimensions.
Values:
-
enumerator INTERPOLATION_BILINEARο
Bilinear interpolation (default).
-
enumerator INTERPOLATION_BICUBICο
Bicubic interpolation.
-
enumerator INTERPOLATION_NEARESTο
Nearest-neighbor interpolation.
-
enumerator INTERPOLATION_AREAο
Area-based (decimation) interpolation.
-
enumerator INTERPOLATION_BILINEARο
-
enum PaddingPositionο
Padding position when resizing with aspect-ratio preservation.
Values:
-
enumerator CENTERο
Pad equally on both sides (centered).
-
enumerator TOPο
Pad at the top edge.
-
enumerator BOTTOMο
Pad at the bottom edge.
-
enumerator CENTERο
-
enum InputTypeο
Color format of the input image tensor.
Values:
-
enumerator IYUVο
Planar YUV 4:2:0 (I420).
-
enumerator NV12ο
Semi-planar YUV 4:2:0.
-
enumerator RGBο
Interleaved RGB.
-
enumerator BGRο
Interleaved BGR.
-
enumerator GRAYο
Single-channel grayscale.
-
enumerator IYUVο
-
enum class KernelTypeο
Target kernel for pipeline execution on SiMaβs MLSoC.
Values:
-
enumerator EV74ο
EV74 vision processor.
-
enumerator A65ο
A65 application processor.
-
enumerator EV74ο
-
using SizeOrPoint = std::pair<unsigned, unsigned>ο
A (width, height) or (x, y) pair used for position/size queries.
Public Functions
-
ModelExecutor()ο
Constructs a ModelExecutor object.
The pipeline is not ready to use until init() is successfully called.
-
void init(const std::string &tarGzFilePath, ModelExecutorInitOptions options)ο
Initializes the pipeline from a model tarball.
Extracts the tarball to a temporary directory, reads preprocessing JSON configuration, registers GStreamer plugins, and builds the pipeline.
When generic_preproc (kernel 200) is enabled (non-empty mean_/stddev_ in options), pipeline construction is deferred until the first inference call.
- Parameters:
tarGzFilePath β Path to the .tar.gz file containing the model and config.
options β Initialization options to configure the pipeline.
- Throws:
std::invalid_argument β if mean_/stddev_ sizes are inconsistent.
std::runtime_error β on extraction or pipeline setup failure.
-
void initBoxdecode(const std::string &tarGzFilePath, BoxdecoderInitOptions options)ο
Initializes the pipeline with built-in box-decoding post-processing.
Same as init(), but additionally configures an on-device box-decoder stage (NMS, top-k filtering, sigmoid) in the pipeline for object-detection models.
- Parameters:
tarGzFilePath β Path to the .tar.gz file containing the model and config.
options β Box-decoder options (detection thresholds, NMS, etc.) plus base init options.
- Throws:
std::invalid_argument β if mean_/stddev_ sizes are inconsistent.
std::runtime_error β on extraction or pipeline setup failure.
-
std::vector<TensorInfo> getInputTensorInfo()ο
Retrieves information about the modelβs input tensors.
This includes the name, shape, and data type of each expected input.
- Throws:
std::runtime_error β if the ModelExecutor is not initialized.
- Returns:
A vector of TensorInfo
-
std::vector<TensorInfo> getOutputTensorInfo()ο
Retrieves information about the modelβs output tensors.
This includes the name, shape, and data type of each output tensor.
- Throws:
std::runtime_error β if the ModelExecutor is not initialized.
- Returns:
A vector of TensorInfo objects describing the output tensors.
-
std::vector<int> shape() constο
Returns the input shape [height, width, channels] detected from the first frame.
Available after the first call to runSynchronous() with a uint8 image tensor. The order matches the HWC layout used by image tensors.
- Returns:
A
std::vector<int>with elements [height, width, channels].
-
std::pair<SizeOrPoint, SizeOrPoint> getScaledPositionAndSize()ο
Gets the scaled position and size of the input frame within the target tensor.
This is useful for understanding how an input image is mapped to the modelβs input tensor, especially when resizing with aspect ratio preservation and padding.
- Throws:
std::runtime_error β if the ModelExecutor is not initialized.
- Returns:
A pair of
SizeOrPointobjects. The first represents the (x, y) position, and the second represents the (width, height) of the scaled frame.
-
std::vector<TensorFloat> runSynchronous(const std::vector<TensorFloat> &inputTensorList)ο
Runs inference synchronously with float input tensors.
Blocks the calling thread until inference completes and returns newly allocated output tensors owned by the caller.
- Parameters:
inputTensorList β A vector of input tensors for the model.
- Throws:
std::runtime_error β if the pipeline is not initialized or if an error occurs during execution.
std::invalid_argument β if the input tensor list is empty or does not match the modelβs requirements.
- Returns:
A vector of output tensors from the model (caller owns).
-
std::vector<TensorFloat> runSynchronous(const std::vector<TensorUInt8> &inputTensorList)ο
Runs inference synchronously with uint8 input tensors.
This method is intended for raw image inputs (for example HWC RGB bytes) when the preprocessing stage expects uint8 input.
On the first synchronous uint8 call, if the first tensor is marked as an image (
markTensorAsImage()), the tensor shape and color format metadata are used to update runtime input bookkeeping and preproc JSON metadata.- Parameters:
inputTensorList β A vector of uint8 input tensors for the model.
- Throws:
std::runtime_error β if the pipeline is not initialized or if an error occurs during execution.
std::invalid_argument β if the input tensor list is empty or does not match the modelβs requirements.
- Returns:
A vector of output tensors from the model (caller owns).
-
bool runAsynchronous(const std::vector<TensorFloat> &inputTensorList, const nlohmann::json &metaData, std::function<void(const std::vector<TensorFloat> &tensors, const nlohmann::json &metaData, bool result)> callbackFunc)ο
Runs inference asynchronously.
Enqueues the input tensors into an internal pending queue and returns immediately. Worker threads handle buffer preparation, pipeline execution, output copying, and callback invocation.
- Queue Behavior
The pending queue holds at most 8 requests. If the queue is full, this call blocks (with a 5-second retry timeout) until space is available or stop() is called. Returns false if the executor is stopping.
- Callback Guarantees
The callback is always invoked on a dedicated callback worker thread, never on the callerβs thread or the GStreamer pipeline thread.
Callbacks are invoked sequentially (one at a time) in completion order.
Exceptions thrown inside the callback are caught and silently discarded; they will not crash the executor.
On failure, the callback is invoked with an empty output vector and
result=false.
- Fatal Timeout
If no output is received from the pipeline for 10 seconds while requests are in-flight, the executor enters a fatal timeout state. All subsequent runAsynchronous() calls will throw std::runtime_error until stop()/init() resets the executor.
- Parameters:
inputTensorList β The vector of input tensors to be processed.
metaData β User-provided JSON metadata passed through to the callback.
callbackFunc β Completion callback: (output tensors, metadata, success).
- Throws:
std::runtime_error β if not initialized, or if a fatal timeout has occurred.
std::invalid_argument β if the callback is empty or the input tensor list is empty.
- Returns:
True if enqueued successfully; false if the executor is stopping.
-
bool runAsynchronous(const std::vector<TensorUInt8> &inputTensorList, const nlohmann::json &metaData, std::function<void(const std::vector<TensorFloat> &tensors, const nlohmann::json &metaData, bool result)> callbackFunc)ο
Runs inference asynchronously with uint8 input tensors.
Behaves identically to the float overload (same queue limits, callback guarantees, and timeout behavior) but accepts raw uint8 image data. See the float overload documentation for full details.
- Parameters:
inputTensorList β The vector of uint8 input tensors to be processed.
metaData β User-provided JSON metadata passed through to the callback.
callbackFunc β Completion callback: (output tensors, metadata, success).
- Throws:
std::runtime_error β if not initialized, or if a fatal timeout has occurred.
std::invalid_argument β if the callback is empty or the input tensor list is empty.
- Returns:
True if enqueued successfully; false if the executor is stopping.
-
void stop()ο
Stops the pipeline and releases all resources.
Shutdown sequence:
Signals all async worker threads to exit and joins them (blocks until done).
Releases GStreamer appsrc/appsink elements and buffer pools.
Transitions the GStreamer pipeline to NULL state.
Releases the pipeline.
Idempotent β safe to call multiple times. After stop(), no further inference is possible unless init() is called again. Also called automatically by the destructor.
-
nlohmann::json profileModel(int duration_seconds = DEFAULT_MODEL_EXECUTOR_DURATION_SECONDS, std::string output_directory = DEFAULT_MODEL_EXECUTOR_OUTPUT_DIR, bool run_synchronous = false)ο
Runs the pipeline in a KPI/throughput measurement mode.
Continuously feeds synthetic (random) input frames into the pipeline by invoking run() in a loop until interrupted (SIGINT/SIGTERM) or until the specified duration is reached.
- Parameters:
duration_seconds β The number of seconds to run the profiling.
output_directory β The directory where the output files will be saved.
run_synchronous β If true, runs synchronous profiling mode; otherwise, asynchronous.
- Throws:
std::runtime_error β if the pipeline is not initialized or if an error occurs during profiling.
- Returns:
A JSON object containing the calculated KPI results.
-
ModelExecutor(const ModelExecutor&) = deleteο
-
ModelExecutor &operator=(const ModelExecutor&) = deleteο
-
ModelExecutor(ModelExecutor&&) = deleteο
-
ModelExecutor &operator=(ModelExecutor&&) = deleteο
-
class BoxdecoderInitOptions : public ModelExecutor::ModelExecutorInitOptionsο
Configuration for models with on-device box-decoding (object detection).
Inherits all base preprocessing options and adds detection-specific parameters for NMS, top-k filtering, and sigmoid application. Negative values or zero for threshold/size fields indicate βuse model defaultβ.
Public Functions
-
inline BoxdecoderInitOptions()ο
Public Members
-
std::string decode_type_ο
Decoder type identifier (e.g. βssdβ, βyoloβ). Empty = auto-detect.
-
int topk_ο
Maximum number of detections to keep after NMS. 0 = no limit.
-
int num_classes_ο
Number of object classes the model predicts.
-
float detection_threshold_ο
Confidence threshold for detections. Negative = use model default.
-
float nms_iou_threshold_ο
IoU threshold for Non-Maximum Suppression. Negative = use model default.
-
int original_width_ο
Original input image width (for coordinate scaling). 0 = use tensor width.
-
int original_height_ο
Original input image height (for coordinate scaling). 0 = use tensor height.
-
int sigmoid_on_probabilities_ο
Apply sigmoid to class probabilities: 1 = yes, 0 = no, -1 = auto.
-
inline BoxdecoderInitOptions()ο
-
class ModelExecutorInitOptionsο
Configuration options for initializing the ModelExecutor.
Controls target kernel selection and image preprocessing behavior (normalization, resize interpolation, aspect-ratio handling).
Note
When both mean_ and stddev_ are non-empty, generic_preproc (kernel 200) is enabled and pipeline construction is deferred until the first inference call, so that image metadata from the first input tensor can be incorporated.
Subclassed by ModelExecutor::BoxdecoderInitOptions
Public Functions
-
inline ModelExecutorInitOptions()ο
Default constructor. Initializes options with default values: βEV74β kernel, no normalization, bilinear interpolation, and aspect ratio is not preserved on resize.
Public Members
-
KernelType kernelType_ο
The target kernel on SiMaβs MLSoC. Defaults to KernelType::EV74.
-
std::vector<float> mean_ο
Mean values for input normalization. If non-empty, normalization is applied to image tensors.
-
std::vector<float> stddev_ο
Standard deviation values for input normalization. If non-empty, normalization is applied to image tensors.
-
InterpolationType interpolationType_ο
The interpolation method to use for resizing input images.
-
bool resizePreservingAspectRatio_ο
If true, the aspect ratio of the input image is preserved during resize.
-
PaddingPosition paddingPosition_ο
The position to apply padding if aspect ratio is preserved.
-
inline ModelExecutorInitOptions()ο
TensorFloatο
-
class TensorFloat : public Tensor<float>ο
TensorUInt8ο
-
class TensorUInt8 : public Tensor<std::uint8_t>ο
TensorInfoο
-
class TensorInfoο
Provides metadata about a tensor, such as its name and dimensions. This class is typically used to describe the expected input or output shape of a model.
Public Functions
-
inline TensorInfo(const std::string &name, const std::vector<std::size_t> &dimensions)ο
Constructs a TensorInfo object.
- Parameters:
name β The name of the tensor.
dimensions β A vector specifying the size of each dimension.
-
inline TensorInfo(const std::string &name, const std::vector<std::size_t> &dimensions)ο