Python API

Module: simaai_model_executor

import simaai_model_executor as me
executor = me.ModelExecutor()

Constants

Name

Value

Description

DEFAULT_MODEL_EXECUTOR_DURATION_SECONDS

30

Default profiling duration in seconds

DEFAULT_MODEL_EXECUTOR_OUTPUT_DIR

"/var/tmp"

Default output directory for profiling results

Enumerations

Enum

Values

me.KernelType

EV74, A65

me.ColorFormat

COLOR_FORMAT_RGB, COLOR_FORMAT_BGR, COLOR_FORMAT_IYUV, COLOR_FORMAT_NV12, COLOR_FORMAT_GRAY

Input Format

The Python binding accepts inputs as either:

  • A single numpy.ndarray — for single-input models.

  • A dict[str, numpy.ndarray] — for multi-input models, where keys are input tensor names.

The dtype must match initialization:

  • float32 — when init() was called without mean/stddev.

  • uint8 — when init() was called with mean and stddev; normalization is applied internally.

Arrays must be C-contiguous and native byte order.

Methods

init()

executor.init(
    tarGzFilePath,                         # str — path to .tar.gz model archive
    kernelType=me.KernelType.EV74,         # me.KernelType.EV74 or me.KernelType.A65
    mean=[],                               # list[float] — per-channel means (empty = skip)
    stddev=[],                             # list[float] — per-channel std devs (empty = skip)
    interpolationType=1,                   # int — 1=BILINEAR, 2=BICUBIC, 3=NEAREST, 4=AREA
    resizePreservingAspectRatio=False,      # bool
    paddingPosition=0,                     # int — 0=CENTER, 1=TOP, 2=BOTTOM
)

initBoxdecode()

Use for models with on-device NMS and top-k post-processing.

executor.initBoxdecode(
    tarGzFilePath,                         # str
    kernelType=me.KernelType.EV74,
    mean=[],
    stddev=[],
    interpolationType=1,
    resizePreservingAspectRatio=False,
    paddingPosition=0,
    decodeType="",                         # str — "yolov5", "ssd", "" = auto-detect
    topk=0,                                # int — max detections after NMS (0 = no limit)
    numClasses=0,                          # int
    detectionThreshold=-1.0,               # float — negative = use model default
    nmsIouThreshold=-1.0,                  # float — negative = use model default
    originalWidth=0,                       # int — 0 = use tensor width
    originalHeight=0,                      # int — 0 = use tensor height
    sigmoidOnProbabilities=-1,             # int — 1=yes, 0=no, -1=auto
)

runSynchronous()

Blocks until inference completes. Returns the first output tensor.

output = executor.runSynchronous(inputs)
# inputs:  numpy.ndarray (float32 or uint8) or dict[str, numpy.ndarray]
# returns: numpy.ndarray (float32) — first output tensor only

Note

Only the first output tensor is returned. Use runAsynchronous() or the C++ API to retrieve all outputs from multi-output models.

Example:

import numpy as np

frame = np.random.rand(1, 224, 224, 3).astype(np.float32)
output = executor.runSynchronous(frame)

# Multi-input model
output = executor.runSynchronous({"input_image": frame, "mask": mask_array})

runAsynchronous()

Non-blocking. Returns immediately after enqueuing. The callback is invoked on a dedicated worker thread.

pushed = executor.runAsynchronous(
    inputs,      # numpy.ndarray or dict[str, numpy.ndarray]
    metaData,    # None, bool, int, float, str, list, or dict
    callback,    # callable(output, metaData, ok) -> None
)
# returns: bool — True if enqueued, False if executor is stopping

The callback receives:

  • outputnumpy.ndarray (single output) or list[numpy.ndarray] (multiple outputs)

  • metaData — the value passed to runAsynchronous(), converted back to Python

  • okbool, False on failure

Example:

import threading

done = threading.Event()
result = {}

def callback(output, meta, ok):
    if ok:
        result["output"] = output
        result["meta"] = meta
    done.set()

executor.runAsynchronous(frame, {"frame_id": 1}, callback)
done.wait(timeout=10)

profileModel()

Runs synthetic inference for a fixed duration and returns JSON-encoded KPI metrics.

kpi_json_str = executor.profileModel(
    duration_seconds=30,          # int (default: 30)
    output_directory="/var/tmp",  # str (default: "/var/tmp")
    run_synchronous=False,        # bool — False = async mode (default)
)
# returns: str — JSON-encoded results

import json
kpi = json.loads(kpi_json_str)
# keys: total_frames, throughput_fps,
#       latency_min_ms, latency_max_ms, latency_avg_ms,
#       latency_p50_ms, latency_p95_ms, latency_p99_ms