Compile Your Model

As a developer, you can use the ModelSDK to prepare machine learning models for deployment on the MLSoC. The process of preparing a model includes converting it to use lower-precision data types on which the MLSoC can compute much more efficiently. Developers have several options to do this conversion, depending on the computational performance and numerical accuracy they want their model to attain. Post-Training Quantization (PTQ) is an efficient and straightforward method to reduce model size and improve inference speed while maintaining minimal accuracy loss.

The PTQ workflow involves:

  • Loading a model

  • Quantizing it to int8 or int16

  • Evaluating its accuracy

  • Compiling it for execution

To achieve this, you will need to write a Python script to perform these steps.

The following example demonstrates step-by-step how to optimize a ResNet-50 model using PTQ.

Prerequisites

Download Palette

The developer will need a modern Ubuntu 22.04+ or Windows 11 Pro machine to install and run Palette. For more information on system requirements and installation procedure, refer to Software Installation.

Note

Please install or upgrade sima-cli before continuing. This guide is intended for use with the latest sima-cli version.

Download And Run Demo Script

Download The Demo

(.env)sima-user@docker-image-id:/home/$ cd /home/docker/sima-cli/
(.env)sima-user@docker-image-id:/home/$ sima-cli install assets/demos/compile-resnet50-model

Compile the Model

(.env)sima-user@docker-image-id:/home/docker/sima-cli/ptq-example/src/modelsdk_quantize_model$ python3 resnet50_quant.py
    Model SDK version: 1.6.0
    Running Calibration ...DONE
    ... ... ...
    Inference on a happy golden retriever (class 207)  ..
    [5] --> 207: 'golden retriever', / 207  -> 98.82%
    Compiling the model ..

The quantized_rest50_mpk.tar.gz file in the models/compiled_resnet50 is the result of the quantization process. Use this file with the mpk project create command to generate the skeleton of an MPK project. Refer to this article for a detailed explanation of the process.

If you have access to Edgematic, import this file directly into the Edgematic platform to create an application. For more information, refer to the Edgematic documentation.

To learn more about how the resnet50_quant.py script works, continue reading the following sections.

The first step of PTQ is to load an ONNX ResNet50 model into Palette for further processing. The following code snippet demonstrates how to do this.

from afe.apis.loaded_net import load_model
from afe.load.importers.general_importer import onnx_source
from afe.ir.tensor_type import ScalarType

MODEL_PATH = "resnet50_model.onnx"

# Model information
input_name, input_shape, input_type = ("input", (1, 3, 224, 224), ScalarType.float32)
input_shapes_dict = {input_name: input_shape}
input_types_dict = {input_name: input_type}

# Load the ONNX model
importer_params = onnx_source(str(MODEL_PATH), input_shapes_dict, input_types_dict)
loaded_net = load_model(importer_params)

The script defines the model path and input metadata. The variable MODEL_PATH specifies the location of the ONNX model file. The input tensor is identified by the name "input" and is given a shape of (1, 3, 224, 224), representing a batch size of one, three color channels, and an image resolution of 224x224 pixels. The input type is set as ScalarType.float32, indicating that the model expects floating-point values.

A dictionary, input_shapes_dict, maps input names to their respective shapes, while input_types_dict associates input names with their data types. These dictionaries are passed to onnx_source, which creates a description of how to load the model from the ONNX file. The actual model file remains unchanged. The model is later loaded and converted into a format compatible with the SiMa.ai SDK by the load_model function.

Finally, load_model(importer_params) is called to load the prepared model into memory. This step ensures the model is ready for subsequent operations such as quantization, optimization, or inference on SiMa.ai’s MLSoC.