Python API
Module: simaai_model_executor
import simaai_model_executor as me
executor = me.ModelExecutor()
Constants
Name |
Value |
Description |
|---|---|---|
|
|
Default profiling duration in seconds |
|
|
Default output directory for profiling results |
Enumerations
Enum |
Values |
|---|---|
|
|
|
|
Input Format
The Python binding accepts inputs as either:
A single
numpy.ndarray— for single-input models.A
dict[str, numpy.ndarray]— for multi-input models, where keys are input tensor names.
The dtype must match initialization:
float32— wheninit()was called withoutmean/stddev.uint8— wheninit()was called withmeanandstddev; normalization is applied internally.
Arrays must be C-contiguous and native byte order.
Methods
init()
executor.init(
tarGzFilePath, # str — path to .tar.gz model archive
kernelType=me.KernelType.EV74, # me.KernelType.EV74 or me.KernelType.A65
mean=[], # list[float] — per-channel means (empty = skip)
stddev=[], # list[float] — per-channel std devs (empty = skip)
interpolationType=1, # int — 1=BILINEAR, 2=BICUBIC, 3=NEAREST, 4=AREA
resizePreservingAspectRatio=False, # bool
paddingPosition=0, # int — 0=CENTER, 1=TOP, 2=BOTTOM
)
initBoxdecode()
Use for models with on-device NMS and top-k post-processing.
executor.initBoxdecode(
tarGzFilePath, # str
kernelType=me.KernelType.EV74,
mean=[],
stddev=[],
interpolationType=1,
resizePreservingAspectRatio=False,
paddingPosition=0,
decodeType="", # str — "yolov5", "ssd", "" = auto-detect
topk=0, # int — max detections after NMS (0 = no limit)
numClasses=0, # int
detectionThreshold=-1.0, # float — negative = use model default
nmsIouThreshold=-1.0, # float — negative = use model default
originalWidth=0, # int — 0 = use tensor width
originalHeight=0, # int — 0 = use tensor height
sigmoidOnProbabilities=-1, # int — 1=yes, 0=no, -1=auto
)
runSynchronous()
Blocks until inference completes. Returns the first output tensor.
output = executor.runSynchronous(inputs)
# inputs: numpy.ndarray (float32 or uint8) or dict[str, numpy.ndarray]
# returns: numpy.ndarray (float32) — first output tensor only
Note
Only the first output tensor is returned. Use runAsynchronous() or the C++ API
to retrieve all outputs from multi-output models.
Example:
import numpy as np
frame = np.random.rand(1, 224, 224, 3).astype(np.float32)
output = executor.runSynchronous(frame)
# Multi-input model
output = executor.runSynchronous({"input_image": frame, "mask": mask_array})
runAsynchronous()
Non-blocking. Returns immediately after enqueuing. The callback is invoked on a dedicated worker thread.
pushed = executor.runAsynchronous(
inputs, # numpy.ndarray or dict[str, numpy.ndarray]
metaData, # None, bool, int, float, str, list, or dict
callback, # callable(output, metaData, ok) -> None
)
# returns: bool — True if enqueued, False if executor is stopping
The callback receives:
output—numpy.ndarray(single output) orlist[numpy.ndarray](multiple outputs)metaData— the value passed torunAsynchronous(), converted back to Pythonok—bool,Falseon failure
Example:
import threading
done = threading.Event()
result = {}
def callback(output, meta, ok):
if ok:
result["output"] = output
result["meta"] = meta
done.set()
executor.runAsynchronous(frame, {"frame_id": 1}, callback)
done.wait(timeout=10)
profileModel()
Runs synthetic inference for a fixed duration and returns JSON-encoded KPI metrics.
kpi_json_str = executor.profileModel(
duration_seconds=30, # int (default: 30)
output_directory="/var/tmp", # str (default: "/var/tmp")
run_synchronous=False, # bool — False = async mode (default)
)
# returns: str — JSON-encoded results
import json
kpi = json.loads(kpi_json_str)
# keys: total_frames, throughput_fps,
# latency_min_ms, latency_max_ms, latency_avg_ms,
# latency_p50_ms, latency_p95_ms, latency_p99_ms