C++ API

Header: #include "ModelExecutor.hpp"

Namespace: simaai::

Library: libsimaai_model_executor_api.so

ModelExecutor

class ModelExecutor

Manages and executes a GStreamer-based model inference pipeline on SiMa’s MLSoC.

Encapsulates the full lifecycle of MLSoC-accelerated inference: model loading, input preprocessing, MLA execution, and output retrieval via custom GStreamer plugins.

Non-copyable, non-movable. Uses PIMPL for resource management.

Lifecycle

Construct – allocates internal state; the pipeline is not yet usable.
init() / initBoxdecode() – extracts the model tarball, reads preprocessing config, registers GStreamer plugins, and builds the pipeline. Pipeline may be built lazily on the first inference call when generic_preproc (kernel 200) is enabled (non-empty mean/stddev).
run*() – execute inference (synchronous or asynchronous). May be called repeatedly after a successful init().
stop() – tears down worker threads, transitions the pipeline to NULL, and releases all resources. Cannot run inference again unless init() is called. The destructor calls stop() automatically.

Threading

Synchronous mode blocks the calling thread. No extra threads are created.
Asynchronous mode supports concurrent runAsynchronous() calls from multiple threads.A worker pool is spawned on the first call (buffer-prep, producer, consumer, 4 copy-workers, callback thread) and runs until stop() is called.

Ownership

Input tensors are caller-owned; the executor copies data into GStreamer buffers. Output tensors are newly allocated: caller owns the return value (sync) or receives ownership through the callback (async).

Public Types

enum InterpolationType

Interpolation method used when resizing input images to the model’s expected dimensions.

Values:

enumerator INTERPOLATION_BILINEAR: Bilinear interpolation (default).

enumerator INTERPOLATION_BICUBIC: Bicubic interpolation.

enumerator INTERPOLATION_NEAREST: Nearest-neighbor interpolation.

enumerator INTERPOLATION_AREA: Area-based (decimation) interpolation.

enum PaddingPosition

Padding position when resizing with aspect-ratio preservation.

Values:

enumerator CENTER: Pad equally on both sides (centered).

enumerator TOP: Pad at the top edge.

enumerator BOTTOM: Pad at the bottom edge.

enum InputType

Color format of the input image tensor.

Values:

enumerator IYUV: Planar YUV 4:2:0 (I420).

enumerator NV12: Semi-planar YUV 4:2:0.

enumerator RGB: Interleaved RGB.

enumerator BGR: Interleaved BGR.

enumerator GRAY: Single-channel grayscale.

enum class KernelType

Target kernel for pipeline execution on SiMa’s MLSoC.

Values:

enumerator EV74: EV74 vision processor.

enumerator A65: A65 application processor.

using SizeOrPoint = std::pair<unsigned, unsigned>: A (width, height) or (x, y) pair used for position/size queries.

Public Functions

ModelExecutor()

Constructs a ModelExecutor object.

The pipeline is not ready to use until init() is successfully called.

void init(const std::string &tarGzFilePath, ModelExecutorInitOptions options)

Initializes the pipeline from a model tarball.

Extracts the tarball to a temporary directory, reads preprocessing JSON configuration, registers GStreamer plugins, and builds the pipeline.

When generic_preproc (kernel 200) is enabled (non-empty mean_/stddev_ in options), pipeline construction is deferred until the first inference call.

Parameters:

tarGzFilePath – Path to the .tar.gz file containing the model and config.
options – Initialization options to configure the pipeline.

Throws:

std::invalid_argument – if mean_/stddev_ sizes are inconsistent.
std::runtime_error – on extraction or pipeline setup failure.

void initBoxdecode(const std::string &tarGzFilePath, BoxdecoderInitOptions options)

Initializes the pipeline with built-in box-decoding post-processing.

Same as init(), but additionally configures an on-device box-decoder stage (NMS, top-k filtering, sigmoid) in the pipeline for object-detection models.

Parameters:

tarGzFilePath – Path to the .tar.gz file containing the model and config.
options – Box-decoder options (detection thresholds, NMS, etc.) plus base init options.

Throws:

std::invalid_argument – if mean_/stddev_ sizes are inconsistent.
std::runtime_error – on extraction or pipeline setup failure.

std::vector<TensorInfo> getInputTensorInfo()

Retrieves information about the model’s input tensors.

This includes the name, shape, and data type of each expected input.

Throws:: std::runtime_error – if the ModelExecutor is not initialized.
Returns:: A vector of TensorInfo

std::vector<TensorInfo> getOutputTensorInfo()

Retrieves information about the model’s output tensors.

This includes the name, shape, and data type of each output tensor.

Throws:: std::runtime_error – if the ModelExecutor is not initialized.
Returns:: A vector of TensorInfo objects describing the output tensors.

std::vector<int> shape() const

Returns the input shape [height, width, channels] detected from the first frame.

Available after the first call to runSynchronous() with a uint8 image tensor. The order matches the HWC layout used by image tensors.

Returns:: A std::vector<int> with elements [height, width, channels].

std::pair<SizeOrPoint, SizeOrPoint> getScaledPositionAndSize()

Gets the scaled position and size of the input frame within the target tensor.

This is useful for understanding how an input image is mapped to the model’s input tensor, especially when resizing with aspect ratio preservation and padding.

Throws:: std::runtime_error – if the ModelExecutor is not initialized.
Returns:: A pair of SizeOrPoint objects. The first represents the (x, y) position, and the second represents the (width, height) of the scaled frame.

std::vector<TensorFloat> runSynchronous(const std::vector<TensorFloat> &inputTensorList)

Runs inference synchronously with float input tensors.

Blocks the calling thread until inference completes and returns newly allocated output tensors owned by the caller.

Parameters:

inputTensorList – A vector of input tensors for the model.

Throws:

std::runtime_error – if the pipeline is not initialized or if an error occurs during execution.
std::invalid_argument – if the input tensor list is empty or does not match the model’s requirements.

Returns:

A vector of output tensors from the model (caller owns).

std::vector<TensorFloat> runSynchronous(const std::vector<TensorUInt8> &inputTensorList)

Runs inference synchronously with uint8 input tensors.

This method is intended for raw image inputs (for example HWC RGB bytes) when the preprocessing stage expects uint8 input.

On the first synchronous uint8 call, if the first tensor is marked as an image (markTensorAsImage()), the tensor shape and color format metadata are used to update runtime input bookkeeping and preproc JSON metadata.

Parameters:

inputTensorList – A vector of uint8 input tensors for the model.

Throws:

std::runtime_error – if the pipeline is not initialized or if an error occurs during execution.
std::invalid_argument – if the input tensor list is empty or does not match the model’s requirements.

Returns:

A vector of output tensors from the model (caller owns).

bool runAsynchronous(const std::vector<TensorFloat> &inputTensorList, const nlohmann::json &metaData, std::function<void(const std::vector<TensorFloat> &tensors, const nlohmann::json &metaData, bool result)> callbackFunc)

Runs inference asynchronously.

Enqueues the input tensors into an internal pending queue and returns immediately. Worker threads handle buffer preparation, pipeline execution, output copying, and callback invocation.

Queue Behavior

The pending queue holds at most 8 requests. If the queue is full, this call blocks (with a 5-second retry timeout) until space is available or stop() is called. Returns false if the executor is stopping.

Callback Guarantees

The callback is always invoked on a dedicated callback worker thread, never on the caller’s thread or the GStreamer pipeline thread.
Callbacks are invoked sequentially (one at a time) in completion order.
Exceptions thrown inside the callback are caught and silently discarded; they will not crash the executor.
On failure, the callback is invoked with an empty output vector and result=false.

Fatal Timeout

If no output is received from the pipeline for 10 seconds while requests are in-flight, the executor enters a fatal timeout state. All subsequent runAsynchronous() calls will throw std::runtime_error until stop()/init() resets the executor.

Parameters:

inputTensorList – The vector of input tensors to be processed.
metaData – User-provided JSON metadata passed through to the callback.
callbackFunc – Completion callback: (output tensors, metadata, success).

Throws:

std::runtime_error – if not initialized, or if a fatal timeout has occurred.
std::invalid_argument – if the callback is empty or the input tensor list is empty.

Returns:

True if enqueued successfully; false if the executor is stopping.

bool runAsynchronous(const std::vector<TensorUInt8> &inputTensorList, const nlohmann::json &metaData, std::function<void(const std::vector<TensorFloat> &tensors, const nlohmann::json &metaData, bool result)> callbackFunc)

Runs inference asynchronously with uint8 input tensors.

Behaves identically to the float overload (same queue limits, callback guarantees, and timeout behavior) but accepts raw uint8 image data. See the float overload documentation for full details.

Parameters:

inputTensorList – The vector of uint8 input tensors to be processed.
metaData – User-provided JSON metadata passed through to the callback.
callbackFunc – Completion callback: (output tensors, metadata, success).

Throws:

std::runtime_error – if not initialized, or if a fatal timeout has occurred.
std::invalid_argument – if the callback is empty or the input tensor list is empty.

Returns:

True if enqueued successfully; false if the executor is stopping.

void stop()

Stops the pipeline and releases all resources.

Shutdown sequence:

Signals all async worker threads to exit and joins them (blocks until done).
Releases GStreamer appsrc/appsink elements and buffer pools.
Transitions the GStreamer pipeline to NULL state.
Releases the pipeline.

Idempotent – safe to call multiple times. After stop(), no further inference is possible unless init() is called again. Also called automatically by the destructor.

nlohmann::json profileModel(int duration_seconds = DEFAULT_MODEL_EXECUTOR_DURATION_SECONDS, std::string output_directory = DEFAULT_MODEL_EXECUTOR_OUTPUT_DIR, bool run_synchronous = false)

Runs the pipeline in a KPI/throughput measurement mode.

Continuously feeds synthetic (random) input frames into the pipeline by invoking run() in a loop until interrupted (SIGINT/SIGTERM) or until the specified duration is reached.

Parameters:

duration_seconds – The number of seconds to run the profiling.
output_directory – The directory where the output files will be saved.
run_synchronous – If true, runs synchronous profiling mode; otherwise, asynchronous.

Throws:

std::runtime_error – if the pipeline is not initialized or if an error occurs during profiling.

Returns:

A JSON object containing the calculated KPI results.

~ModelExecutor() noexcept: Destructor. Ensures that stop() is called to release resources.

ModelExecutor(const ModelExecutor&) = delete

ModelExecutor &operator=(const ModelExecutor&) = delete

ModelExecutor(ModelExecutor&&) = delete

ModelExecutor &operator=(ModelExecutor&&) = delete

class BoxdecoderInitOptions : public ModelExecutor::ModelExecutorInitOptions 

Configuration for models with on-device box-decoding (object detection).

Inherits all base preprocessing options and adds detection-specific parameters for NMS, top-k filtering, and sigmoid application. Negative values or zero for threshold/size fields indicate “use model default”.

Public Functions

inline BoxdecoderInitOptions()

Public Members

std::string decode_type_: Decoder type identifier (e.g. “ssd”, “yolo”). Empty = auto-detect.

int topk_: Maximum number of detections to keep after NMS. 0 = no limit.

int num_classes_: Number of object classes the model predicts.

float detection_threshold_: Confidence threshold for detections. Negative = use model default.

float nms_iou_threshold_: IoU threshold for Non-Maximum Suppression. Negative = use model default.

int original_width_: Original input image width (for coordinate scaling). 0 = use tensor width.

int original_height_: Original input image height (for coordinate scaling). 0 = use tensor height.

int sigmoid_on_probabilities_: Apply sigmoid to class probabilities: 1 = yes, 0 = no, -1 = auto.

class ModelExecutorInitOptions

Configuration options for initializing the ModelExecutor.

Controls target kernel selection and image preprocessing behavior (normalization, resize interpolation, aspect-ratio handling).

Note

When both mean_ and stddev_ are non-empty, generic_preproc (kernel 200) is enabled and pipeline construction is deferred until the first inference call, so that image metadata from the first input tensor can be incorporated.

Subclassed by ModelExecutor::BoxdecoderInitOptions

Public Functions

inline ModelExecutorInitOptions(): Default constructor. Initializes options with default values: “EV74” kernel, no normalization, bilinear interpolation, and aspect ratio is not preserved on resize.

Public Members

KernelType kernelType_: The target kernel on SiMa’s MLSoC. Defaults to KernelType::EV74.

std::vector<float> mean_: Mean values for input normalization. If non-empty, normalization is applied to image tensors.

std::vector<float> stddev_: Standard deviation values for input normalization. If non-empty, normalization is applied to image tensors.

InterpolationType interpolationType_: The interpolation method to use for resizing input images.

bool resizePreservingAspectRatio_: If true, the aspect ratio of the input image is preserved during resize.

PaddingPosition paddingPosition_: The position to apply padding if aspect ratio is preserved.

TensorFloat

class TensorFloat : public Tensor<float>

TensorUInt8

class TensorUInt8 : public Tensor<std::uint8_t>

TensorInfo

class TensorInfo

Provides metadata about a tensor, such as its name and dimensions. This class is typically used to describe the expected input or output shape of a model.

Public Functions

inline TensorInfo(const std::string &name, const std::vector<std::size_t> &dimensions)

Constructs a TensorInfo object.

Parameters:

name – The name of the tensor.
dimensions – A vector specifying the size of each dimension.