Compile Your Model

As a developer, you can use the ModelSDK to prepare machine learning models for deployment on the MLSoC. The process of preparing a model includes converting it to use lower-precision data types on which the MLSoC can compute much more efficiently. Developers have several options to do this conversion, depending on the computational performance and numerical accuracy they want their model to attain. Post-Training Quantization (PTQ) is an efficient and straightforward method to reduce model size and improve inference speed while maintaining minimal accuracy loss.

The PTQ workflow involves:

  • Loading a model

  • Quantizing it to int8 or int16

  • Evaluating its accuracy

  • Compiling it for execution

To achieve this, you will need to write a Python script to perform these steps.

The following example demonstrates step-by-step how to optimize a ResNet-50 model using PTQ.

Note

This example uses the Resnet50 classification model, created by Microsoft. The model adheres to the Apache 2.0 License. Please also follow the same licensing guidelines for this example.

Prerequisites

Note

Please install or upgrade sima-cli before continuing. This guide is intended for use with the latest sima-cli version.

Note

Please install or upgrade Software Installation before continuing. This guide is intended for use with the latest SDK version of the containers.

Download And Run Demo Script

Access the ModelSDK Container

sima-user@sima-user-machine:~$ sima-cli sdk model

Download The Demo

sima-user@vdp-cli-modelsdk-2:/home/$ cd /home/docker/sima-cli/
sima-user@vdp-cli-modelsdk-2:/home/docker/sima-cli/$ sima-cli install assets/demos/compile-resnet50-model

Quantize & Compile the Model

sima-user@vdp-cli-modelsdk-2:/home/docker/sima-cli/$ source ptq-example/.env/bin/activate
(.env) sima-user@vdp-cli-modelsdk-2:/home/docker/sima-cli/ptq-example/$ cd ptq-example/src/modelsdk_quantize_model
(.env) sima-user@vdp-cli-modelsdk-2:/home/docker/sima-cli/ptq-example/src/modelsdk_quantize_model$ python3 resnet50_quant.py --boardtype {mlsoc,modalix}
... ... ...
***** SiMa.ai Resnet50 Model Compilation Example *****
----------------------------------------
ModelSDK VERSION: 2.0.0
BOARD TYPE: {BOARD_TYPE}
----------------------------------------

***** Quantization & Calibration *****
Running Calibration ...DONE
... ... ...

***** Test Inference on a Golden Retriever (Class 207) *****
[5] --> 207: 'golden retriever', / 207  -> 98.82%

***** Compiling Model for {BOARD_TYPE} *****
... ... ...

***** Compiled Model at /home/docker/sima-cli/ptq-example/src/modelsdk_quantize_model/../../models/compiled_resnet50 *****
(.env) sima-user@vdp-cli-modelsdk-2:/home/docker/sima-cli/ptq-example/src/modelsdk_quantize_model$ ls ../../models/compiled_resnet50
quantized_resnet50_mpk.tar.gz

The quantized_rest50_mpk.tar.gz file in the models/compiled_resnet50 is the result of the quantization process. Use this file with the mpk project create command to generate the skeleton of an MPK project. Refer to this article for a detailed explanation of the process.

If you have access to Edgematic, import this file directly into the Edgematic platform to create an application. For more information, refer to the Edgematic documentation.

To learn more about how the resnet50_quant.py script works, continue reading the following sections.

The first step of PTQ is to load an ONNX ResNet50 model into Palette for further processing. The following code snippet demonstrates how to do this.

from afe.apis.loaded_net import load_model
from afe.apis.defines import gen1_target, gen2_target
from afe.load.importers.general_importer import onnx_source
from afe.ir.tensor_type import ScalarType

MODEL_PATH = "resnet50_model.onnx"
TARGET = gen1_target

# Model information
input_name, input_shape, input_type = ("input", (1, 3, 224, 224), ScalarType.float32)
input_shapes_dict = {input_name: input_shape}
input_types_dict = {input_name: input_type}

# Load the ONNX model
importer_params = onnx_source(str(MODEL_PATH), input_shapes_dict, input_types_dict)
loaded_net = load_model(importer_params,target=TARGET)

The script defines the model path and input metadata. The variable MODEL_PATH specifies the location of the ONNX model file. The input tensor is identified by the name "input" and is given a shape of (1, 3, 224, 224), representing a batch size of one, three color channels, and an image resolution of 224x224 pixels. The input type is set as ScalarType.float32, indicating that the model expects floating-point values.

A dictionary, input_shapes_dict, maps input names to their respective shapes, while input_types_dict associates input names with their data types. These dictionaries are passed to onnx_source, which creates a description of how to load the model from the ONNX file. The actual model file remains unchanged. The model is later loaded and converted into a format compatible with the SiMa.ai SDK by the load_model function.

Finally, load_model(importer_params,target=TARGET) is called to load the prepared model into memory. This step ensures the model is ready for subsequent operations such as quantization, optimization, or inference on SiMa.ai’s MLSoC. The variable TARGET specifies what platform to load the model for, either MLSoC (gen1_target) or Modalix (gen2_target).

After following this tutorial and compiling this model, you can use it to build your first pipeline with Palette.

Build with Your First Pipeline with Palette