afe.apis.loaded_net

Attributes

GroundTruth

Classes

LoadedNet

Functions

load_model(→ LoadedNet)

Load a machine learning model into the SiMa Model SDK for further processing such as quantization or compilation.

Module Contents

afe.apis.loaded_net.GroundTruth

class afe.apis.loaded_net.LoadedNet(mod: afe._tvm._defines.TVMIRModule, layout: str, target: sima_utils.common.Platform, *, output_labels: list[str] | None, model_path: str | None)

execute(inputs: afe.apis.defines.InputValues, *, log_level: int = logging.NOTSET) → list[numpy.ndarray]

Execute the loaded network using a software implementation of operators.

This method runs the network with a single set of input tensor values and returns the corresponding output tensor values. The execution does not simulate processor behavior but instead uses TVM operators for both FP32 and quantized models. Input and output tensors are automatically transposed if the model layout requires it.

Parameters:

inputs (InputValues) – A dictionary mapping input names to their corresponding tensor data. Input tensors must be in channel-last layout (e.g., NHWC or NDHWC).
log_level (Optional[int], optional) – Sets the logging level for this API call. Defaults to logging.NOTSET.

Returns:

A list of output tensors resulting from the model execution.

Return type:

list[np.ndarray]

Raises:

UserFacingException – If an error occurs during the execution process.

Execution Details:

Inputs are automatically transposed to match the model’s expected layout if necessary.
Outputs are also transposed back to channel-last layout for consistency with API requirements.
Supports 4D (NCHW/NHWC) and 5D (NCDHW/NDHWC) tensor formats.

quantize(calibration_data: Iterable[afe.apis.defines.InputValues], quantization_config: afe.apis.defines.QuantizationParams, *, automatic_layout_conversion: bool = False, arm_only: bool = False, simulated_arm: bool = False, model_name: str | None = None, log_level: int = logging.NOTSET) → afe.apis.model.Model

Quantize the loaded neural network model using the provided calibration data and quantization configuration.

If arm_only is False, the model is calibrated and quantized for efficient execution on the SiMa MLSoC.

If arm_only is True, quantization is skipped, and the model is compiled for ARM execution—useful for testing.

Parameters:

calibration_data (Iterable[InputValues]) – Dataset for calibration. Each sample is a dictionary mapping input names to calibration data.
quantization_config (QuantizationParams) – Parameters controlling the calibration and quantization process.
automatic_layout_conversion (bool, optional) – Enable automatic layout conversion during processing. Defaults to False.
arm_only (bool, optional) – Skip quantization and compile for ARM. Useful for testing. Defaults to False.
simulated_arm (bool, optional) – Reserved for internal use. Simulates ARM backend behavior without compilation. Defaults to False.
model_name (Optional[str], optional) – Name for the returned quantized model. Defaults to None.
log_level (int, optional) – Logging level for this API call. Defaults to logging.NOTSET.

Returns:

The quantized model instance or an ARM-prepared model if arm_only is True.

Return type:

Model

Raises:

ValueError – If an invalid combination of parameters is provided (e.g., both arm_only and simulated_arm set to True).
UserFacingException – If an error occurs during calibration or quantization.

Example

# Load pre-processed calibration data
dataset_f = np.load('preprocessed_data.npz')
data = dataset_f['x']

# Prepare calibration data as a list of dictionaries
calib_data = []
calib_images = 100
for i in range(calib_images):
    inputs = {'input_1': data[i]}
    calib_data.append(inputs)

# Quantize the model
quant_model = loaded_net.quantize(
    calibration_data=calib_data,
    quantization_config=default_quantization,
    model_name='quantized_model'
)

quantize_with_accuracy_feedback(calibration_data: Iterable[afe.apis.defines.InputValues], evaluation_data: Iterable[tuple[afe.apis.defines.InputValues, GroundTruth]], quantization_config: afe.apis.defines.QuantizationParams, *, accuracy_score: afe.driver.statistic.Statistic[tuple[list[numpy.ndarray], GroundTruth], float], target_accuracy: float, automatic_layout_conversion: bool = False, max_optimization_steps: int | None = None, model_name: str | None = None, log_level: int = logging.NOTSET) → afe.apis.model.Model

Quantizes the model with accuracy feedback using a mixed-precision approach.

This method performs quantization with iterative accuracy feedback to ensure the final model meets the specified target accuracy. The process involves calibrating the model, evaluating its accuracy, and adjusting precision through multiple optimization steps if necessary.

Parameters:

calibration_data (Iterable[InputValues]) – Required. The dataset used for model calibration. Each sample is a dictionary mapping input names to corresponding calibration data.
evaluation_data (Iterable[tuple[InputValues, GroundTruth]]) – Required. The dataset used to evaluate model accuracy, where each element is a tuple containing input data and corresponding ground truth.
quantization_config (QuantizationParams) – Required. Configuration parameters that define how the quantization process is performed.
accuracy_score (Statistic[tuple[list[np.ndarray], GroundTruth], float]) – Required. The evaluation metric used to calculate accuracy during the quantization process.
target_accuracy (float) – Required. The target accuracy value that the quantized model must achieve.
automatic_layout_conversion (bool, optional) – Enables automatic layout conversion during processing. Defaults to False.
max_optimization_steps (Optional[int], optional) – Maximum number of optimization steps for mixed-precision quantization. Must be greater than 1. Defaults to _MIXED_PRECISION_SEARCH_LIMIT if not specified.
model_name (Optional[str], optional) – The name for the resulting quantized model. Defaults to None.
log_level (Optional[int], optional) – Sets the logging level for the process. Defaults to logging.NOTSET.

Returns:

The quantized model along with its corresponding floating-point model.

Return type:

Model

Raises:

UserFacingException –

If activation quantization parameters are unsupported (only 8-bit precision is supported).
If max_optimization_steps is less than or equal to 1.
If an error occurs during the mixed-precision quantization process.

convert_to_sima_quantization(*, requantization_mode: afe.ir.defines.RequantizationMode = RequantizationMode.sima, model_name: str | None = None, log_level: int = logging.NOTSET) → afe.apis.model.Model

afe.apis.loaded_net.load_model(params: afe.load.importers.general_importer.ImporterParams, *, target: sima_utils.common.Platform = gen1_target, log_level: int = logging.NOTSET) → LoadedNet

Load a machine learning model into the SiMa Model SDK for further processing such as quantization or compilation.

This function validates the input parameters, detects the model format from the provided file paths, and ensures that the required fields (like input shapes, input names, output names) are populated according to the model type. If the model is successfully validated and imported, a LoadedNet instance is returned for downstream use.

Parameters:

params (ImporterParams) – Import parameters including model file paths, input shapes, input types, names, and other configurations.
target (Platform, optional) – Target platform for which the model should be loaded. Defaults to gen1_target.
log_level (int, optional) – Logging level for the loading process. Defaults to logging.NOTSET.

Returns:

An object representing the successfully loaded model, ready for quantization, compilation, or other SDK operations.

Return type:

LoadedNet

Raises:

UserFacingException –

If no model file paths are provided.
If the detected model format does not match the expected format.
If required parameters for the detected model format are missing or invalid.
If the model format is unsupported.

Supported Model Formats and Required Parameters:

ONNX, TFLite, Caffe, Caffe2:
Requires non-empty input_types and input_shapes.
PyTorch:
Requires non-empty input_names and input_shapes.
TensorFlow (v1 & v2):
Requires non-empty output_names and input_shapes.
Keras:
Requires non-empty input_shapes.

Example

>>> params = ImporterParams(
>>>     file_paths=["model.onnx"],
>>>     input_shapes={"input_1": (1, 3, 224, 224)},
>>>     input_types={"input_1": "float32"}
>>> )
>>> loaded_model = load_model(params)