Compile Your Model

As a developer, you can use the ModelSDK to prepare machine learning models for deployment on the MLSoC. The process of preparing a model includes converting it to use lower-precision data types on which the MLSoC can compute much more efficiently. Developers have several options to do this conversion, depending on the computational performance and numerical accuracy they want their model to attain. Post-Training Quantization (PTQ) is an efficient and straightforward method to reduce model size and improve inference speed while maintaining minimal accuracy loss.

The PTQ workflow involves:

  • Loading a model

  • Quantizing it to int8 or int16

  • Evaluating its accuracy

  • Compiling it for execution

To achieve this, you will need to write a Python script to perform these steps.

The following example demonstrates step-by-step how to optimize a ResNet-50 model using PTQ.

Prerequisites

Download Palette

The develpper will need a modern Ubuntu 22.04+ or Windows 11 Pro machine to install and run Palette. For more information on system requirements and installation procedure, refer to Software Installation.

Start by running this sample script in Palette to convert a ResNet-50 model into an optimized version using PTQ.

Download Sample Script

First, uncompress the example in the Palette environment and install necessary dependencies. Assuming the file was copied to the ~/workspace directory on the host, which maps to /home/docker/sima-cli by default.

Grant the RW permission to the pybind11-2.13.6.dist-info & pybind11 package to avoid any permission access issue

sima-user@docker-image-id:/home/$ cd docker/sima-cli

    sudo chmod 755 -R /usr/local/lib/python3.10/site-packages/pybind11-2.13.6.dist-info
    sudo chmod 755 -R /usr/local/lib/python3.10/site-packages/pybind11

Setup the downloaded project and install the required python packages in the virtual environment with access to system-site-packages

sima-user@docker-image-id:/home/$ cd docker/sima-cli
sima-user@docker-image-id:/home/docker/sima-cli$ tar -xvf ptq-example.tar.gz
    ptq-example/
    ptq-example/README.md
    ptq-example/src/
    ptq-example/src/x86_reference_app/
    ptq-example/src/x86_reference_app/resnet50_reference_classification_app.py
    ptq-example/src/modelsdk_quantize_model/
    ptq-example/src/modelsdk_quantize_model/resnet50_quant.py
    ptq-example/models/
    ptq-example/models/download_resnet50.py
    ptq-example/data/
    ptq-example/data/openimages_v7_images_and_labels.pkl
    ptq-example/data/golden_retriever_207.jpg
    ptq-example/data/imagenet_labels.txt
    ptq-example/requirements.txt

sima-user@docker-image-id:/home/$ cd /home/docker/sima-cli/ptq-example
sima-user@docker-image-id:/home/docker/sima-cli/ptq-example$ python3 -m venv --system-site-packages .env
sima-user@docker-image-id:/home/docker/sima-cli/ptq-example$ source .env/bin/activate
(.env)sima-user@docker-image-id:/home/docker/sima-cli/ptq-example$ pip3 install -r requirements.txt

Then, run the download_resnet50.py script to retrieve the official resnet50 ONNX model.

(.env)sima-user@docker-image-id:/home/$ cd /home/docker/sima-cli/ptq-example/models
(.env)sima-user@docker-image-id:/home/docker/sima-cli/ptq-example/models$ python3 download_resnet50.py
    ... ... ...
    Model exported successfully to /home/docker/sima-cli/ptq-example/models/resnet50_export.onnx
    Simplified model saved to /home/docker/sima-cli/ptq-example/models/resnet50_model.onnx

Lastly, run the full model quantization script.

(.env)sima-user@docker-image-id:/home/docker/sima-cli/ptq-example/src/modelsdk_quantize_model$ python3 resnet50_quant.py
    Model SDK version: 1.6.0
    Running Calibration ...DONE
    ... ... ...
    Inference on a happy golden retriever (class 207)  ..
    [5] --> 207: 'golden retriever', / 207  -> 98.82%
    Compiling the model ..

(.env)sima-user@docker-image-id:/home/docker/sima-cli/ptq-example/models$ ls -ail compiled_resnet50/
total 21688
29626653 drwxr-xr-x 2 jim jim     4096 Feb 25 23:47 .
29529959 drwxr-xr-x 3 jim jim     4096 Feb 25 23:46 ..
29626663 -rw-r--r-- 1 jim jim 22198207 Feb 25 23:47 quantized_resnet50_mpk.tar.gz

The quantized_rest50_mpk.tar.gz file in this folder is the result of the quantization process. You can use this file with the mpk project create command to generate the skeleton of an MPK project. Refer to this article for a detailed explanation of the process.

If you have access to Edgematic, import this file directly into the Edgematic platform to create an application. For more information, refer to the Edgematic documentation.

To learn more about how the resnet50_quant.py script works, continue reading the following sections.

The first step of PTQ is to load an ONNX ResNet50 model into Palette for further processing. The following code snippet demonstrates how to do this.

from afe.apis.loaded_net import load_model
from afe.load.importers.general_importer import onnx_source
from afe.ir.tensor_type import ScalarType

MODEL_PATH = "resnet50_model.onnx"

# Model information
input_name, input_shape, input_type = ("input", (1, 3, 224, 224), ScalarType.float32)
input_shapes_dict = {input_name: input_shape}
input_types_dict = {input_name: input_type}

# Load the ONNX model
importer_params = onnx_source(str(MODEL_PATH), input_shapes_dict, input_types_dict)
loaded_net = load_model(importer_params)

The script defines the model path and input metadata. The variable MODEL_PATH specifies the location of the ONNX model file. The input tensor is identified by the name "input" and is given a shape of (1, 3, 224, 224), representing a batch size of one, three color channels, and an image resolution of 224x224 pixels. The input type is set as ScalarType.float32, indicating that the model expects floating-point values.

A dictionary, input_shapes_dict, maps input names to their respective shapes, while input_types_dict associates input names with their data types. These dictionaries are passed to onnx_source, which creates a description of how to load the model from the ONNX file. The actual model file remains unchanged. The model is later loaded and converted into a format compatible with the SiMa.ai SDK by the load_model function.

Finally, load_model(importer_params) is called to load the prepared model into memory. This step ensures the model is ready for subsequent operations such as quantization, optimization, or inference on SiMa.ai’s MLSoC.