.. _optimize_model: ================== Compile Your Model ================== As a developer, you can use the :ref:`ModelSDK ` to prepare machine learning models for deployment on the MLSoC. The process of preparing a model includes converting it to use lower-precision data types on which the MLSoC can compute much more efficiently. Developers have several options to do this conversion, depending on the computational performance and numerical accuracy they want their model to attain. :ref:`Post-Training Quantization (PTQ) ` is an efficient and straightforward method to reduce model size and improve inference speed while maintaining minimal accuracy loss. The PTQ workflow involves: - Loading a model - Quantizing it to ``int8`` or ``int16`` - Evaluating its accuracy - Compiling it for execution To achieve this, you will need to write a Python script to perform these steps. The following example demonstrates step-by-step how to optimize a ResNet-50 model using PTQ. .. dropdown:: Prerequisites :animate: fade-in :color: secondary .. button-link:: https://docs.sima.ai/pkg_downloads/SDK1.6.0/1.6.0_Palette_SDK_master_B202.zip :color: primary :shadow: Download Palette The develpper will need a modern Ubuntu 22.04+ or Windows 11 Pro machine to install and run Palette. For more information on system requirements and installation procedure, refer to :ref:`Palette installation`. .. tabs:: .. tab:: Run Sample Script Start by running this sample script in Palette to convert a ResNet-50 model into an optimized version using PTQ. .. button-link:: https://docs.sima.ai/assets/ptq-example.tar.gz :color: primary :shadow: Download Sample Script First, uncompress the example in the Palette environment and install necessary dependencies. Assuming the file was copied to the ``~/workspace`` directory on the host, which maps to ``/home/docker/sima-cli`` by default. Grant the RW permission to the pybind11-2.13.6.dist-info & pybind11 package to avoid any permission access issue .. code-block:: console sima-user@docker-image-id:/home/$ cd docker/sima-cli sudo chmod 755 -R /usr/local/lib/python3.10/site-packages/pybind11-2.13.6.dist-info sudo chmod 755 -R /usr/local/lib/python3.10/site-packages/pybind11 Setup the downloaded project and install the required python packages in the virtual environment with access to system-site-packages .. code-block:: console sima-user@docker-image-id:/home/$ cd docker/sima-cli sima-user@docker-image-id:/home/docker/sima-cli$ tar -xvf ptq-example.tar.gz ptq-example/ ptq-example/README.md ptq-example/src/ ptq-example/src/x86_reference_app/ ptq-example/src/x86_reference_app/resnet50_reference_classification_app.py ptq-example/src/modelsdk_quantize_model/ ptq-example/src/modelsdk_quantize_model/resnet50_quant.py ptq-example/models/ ptq-example/models/download_resnet50.py ptq-example/data/ ptq-example/data/openimages_v7_images_and_labels.pkl ptq-example/data/golden_retriever_207.jpg ptq-example/data/imagenet_labels.txt ptq-example/requirements.txt sima-user@docker-image-id:/home/$ cd /home/docker/sima-cli/ptq-example sima-user@docker-image-id:/home/docker/sima-cli/ptq-example$ python3 -m venv --system-site-packages .env sima-user@docker-image-id:/home/docker/sima-cli/ptq-example$ source .env/bin/activate (.env)sima-user@docker-image-id:/home/docker/sima-cli/ptq-example$ pip3 install -r requirements.txt Then, run the ``download_resnet50.py`` script to retrieve the official resnet50 ONNX model. .. code-block:: console (.env)sima-user@docker-image-id:/home/$ cd /home/docker/sima-cli/ptq-example/models (.env)sima-user@docker-image-id:/home/docker/sima-cli/ptq-example/models$ python3 download_resnet50.py ... ... ... Model exported successfully to /home/docker/sima-cli/ptq-example/models/resnet50_export.onnx Simplified model saved to /home/docker/sima-cli/ptq-example/models/resnet50_model.onnx Lastly, run the full model quantization script. .. code-block:: console (.env)sima-user@docker-image-id:/home/docker/sima-cli/ptq-example/src/modelsdk_quantize_model$ python3 resnet50_quant.py Model SDK version: 1.6.0 Running Calibration ...DONE ... ... ... Inference on a happy golden retriever (class 207) .. [5] --> 207: 'golden retriever', / 207 -> 98.82% Compiling the model .. (.env)sima-user@docker-image-id:/home/docker/sima-cli/ptq-example/models$ ls -ail compiled_resnet50/ total 21688 29626653 drwxr-xr-x 2 jim jim 4096 Feb 25 23:47 . 29529959 drwxr-xr-x 3 jim jim 4096 Feb 25 23:46 .. 29626663 -rw-r--r-- 1 jim jim 22198207 Feb 25 23:47 quantized_resnet50_mpk.tar.gz The ``quantized_rest50_mpk.tar.gz`` file in this folder is the result of the quantization process. You can use this file with the ``mpk project create`` command to generate the skeleton of an MPK project. Refer to this :ref:`article ` for a detailed explanation of the process. If you have access to Edgematic, import this file directly into the Edgematic platform to create an application. For more information, refer to the :ref:`Edgematic documentation `. To learn more about how the ``resnet50_quant.py`` script works, continue reading the following sections. .. tabs:: .. tab:: Load Model The first step of PTQ is to :ref:`load ` an ONNX ResNet50 model into Palette for further processing. The following code snippet demonstrates how to do this. .. code-block:: python from afe.apis.loaded_net import load_model from afe.load.importers.general_importer import onnx_source from afe.ir.tensor_type import ScalarType MODEL_PATH = "resnet50_model.onnx" # Model information input_name, input_shape, input_type = ("input", (1, 3, 224, 224), ScalarType.float32) input_shapes_dict = {input_name: input_shape} input_types_dict = {input_name: input_type} # Load the ONNX model importer_params = onnx_source(str(MODEL_PATH), input_shapes_dict, input_types_dict) loaded_net = load_model(importer_params) The script defines the model path and input metadata. The variable ``MODEL_PATH`` specifies the location of the ONNX model file. The input tensor is identified by the name ``"input"`` and is given a shape of ``(1, 3, 224, 224)``, representing a batch size of one, three color channels, and an image resolution of 224x224 pixels. The input type is set as ``ScalarType.float32``, indicating that the model expects floating-point values. A dictionary, ``input_shapes_dict``, maps input names to their respective shapes, while ``input_types_dict`` associates input names with their data types. These dictionaries are passed to ``onnx_source``, which creates a description of how to load the model from the ONNX file. The actual model file remains unchanged. The model is later loaded and converted into a format compatible with the SiMa.ai SDK by the ``load_model`` function. Finally, ``load_model(importer_params)`` is called to load the prepared model into memory. This step ensures the model is ready for subsequent operations such as quantization, optimization, or inference on SiMa.ai's MLSoC. .. tab:: Prepare Calibration Dataset A calibration dataset is needed for quantization to determine optimal scaling factors when converting model weights and activations from floating-point (FP32) to lower precision (e.g., INT8). Since integer representations have a limited range, calibration helps map FP32 values efficiently, minimizing precision loss and avoiding issues like clipping or compression. By analyzing real input distributions, it ensures per-layer adjustments, preserving important activations while optimizing for reduced computational cost. Without calibration, direct quantization could degrade accuracy, making the model less reliable for inference. .. code-block:: python from afe import DataGenerator LABELS_PATH = "imagenet_labels.txt" CALIBRATION_SET_PATH = "openimages_v7_images_and_labels.pkl" def preprocess(image, skip_transpose=True, input_shape: tuple = (224, 224), scale_factor: tuple = 255.0): ''' Resizes an image to 224x224, normalizes pixel values to [0,1], and applies mean subtraction and standard deviation normalization ''' mean = [0.485, 0.456, 0.406] stddv = [0.229, 0.224, 0.225] # val224 images come in CHW format, need to transpose to HWC format if not skip_transpose: image = image.transpose(1, 2, 0) # Resize, color convert, scale, normalize image = cv2.resize(image, input_shape) image = image / scale_factor image = (image - mean) / stddv return image # Dataset and preprocessing # def create_imagenet_dataset(num_samples: int = 1) -> Dict[str, DataGenerator]: """ Creates a data generator with the structure { 'images': DataGenerator of image arrays 'labels': DataGenerators of labels } """ dataset_path = CALIBRATION_SET_PATH with open(dataset_path, 'rb') as f: dataset = pkl.load(f) images_and_labels = {'images': dataset['data'][:num_samples], 'labels': dataset['target'][:num_samples]} return images_and_labels # Create the calibration dataset images_and_labels = create_imagenet_dataset(num_samples=MAX_DATA_SAMPLES) # Create a datagenerator from it and map the preprocessing function images_generator = DataGenerator({MODEL_INPUT_NAME: images_and_labels["images"]}) images_generator.map({MODEL_INPUT_NAME: preprocess}) The code loads a pre-saved dataset (openimages_v7_images_and_labels.pkl) that contains images and their corresponding labels. It selects a specific number of samples (num_samples) and returns them as a dictionary with two parts: - ``images`` : a list of image arrays. - ``labels`` : a list of corresponding labels. Then, it uses the DataGenerator utility to create a dataset from these images so they can be processed in batches. Before using the data, it applies a preprocessing function (preprocess) to prepare the images in the right format for the model. This dataset is used for calibrating the model when converting it from high precision (FP32) to lower precision (INT8). .. tab:: Quantize After the model is loaded into the ``loaded_net`` object and calibration dataset is prepared, the following snippet shows how to quantize the model. .. code-block:: python from afe.apis.defines import QuantizationParams, quantization_scheme, CalibrationMethod from afe.core.utils import convert_data_generator_to_iterable # Define quantization parameters quant_configs: QuantizationParams = QuantizationParams( calibration_method=CalibrationMethod.from_str('min_max'), activation_quantization_scheme=quantization_scheme( asymmetric=True, per_channel=False, bits=8 ), weight_quantization_scheme=quantization_scheme( asymmetric=False, per_channel=True, bits=8 ) ) # Perform quantization using MSE and INT8 sdk_net = loaded_net.quantize( convert_data_generator_to_iterable(images_generator), quant_configs, model_name="quantized_resnet50", arm_only=False ) This code first defines the quantization parameters and then applies quantization to a loaded neural network model. - The :py:meth:`afe.apis.defines.QuantizationParams` object is created to specify how the model should be quantized. - The :py:meth:`afe.apis.defines.CalibrationMethod` is set to ``min_max`` to compute the quantization parameters. - The :py:meth:`afe.apis.defines.quantization_scheme` is configured to be asymmetric with per-channel quantization disabled, using 8-bit precision. - The weight quantization scheme is configured to be symmetric with per-channel quantization enabled, also using 8-bit precision. After defining the quantization parameters, the :py:meth:`afe.apis.loaded_net.LoadedNet.quantize` method is called on the :py:meth:`afe.apis.loaded_net.LoadedNet` object. It takes the calibration dataset generated in the previous step, convert to an iterable using ``convert_data_generator_to_iterable(images_generator)``, as input. The function then applies the specified quantization configurations and outputs the quantized model. .. tab:: Performance Validation Before compiling the model for deployment, it is crucial to validate its performance. This ensures that the model functions correctly after preprocessing and quantization. The :py:meth:`afe.apis.loaded_net.LoadedNet.execute` method allows quantized models to be executed in software with user-supplied input data so that the results can be evaluated. **Running Inference on Multiple Samples** - The following sample code implements a loop to run inference on a list of images, compares the predicted labels with the reference (ground truth) labels, and prints each result along with the model’s confidence score, helping verify the model's output accuracy. - The ``postprocess_output`` function processes the raw output from the model and extracts the most likely class along with its confidence score. .. code-block:: python import cv2 import numpy as np def postprocess_output(output: np.ndarray): probabilities = output[0][0] max_idx = np.argmax(probabilities) return max_idx, probabilities[max_idx] with open(LABELS_PATH, "r") as f: imagenet_labels = [line.strip() for line in f.readlines()] for idx in range(6): sdk_net_output = sdk_net.execute(inputs={"input": images_generator[idx]["input"]}) inference_label, inference_result = postprocess_output(sdk_net_output) reference_label = images_and_labels["labels"][idx] print(f"[{idx}] --> {imagenet_labels[inference_label]} / {reference_label} -> {inference_result:.2%}") **Validating with a Known Class (Golden Retriever - Class 207)** - The following code runs inference on a specific, well-known image helps confirm that the model correctly classifies familiar objects. - Any misclassification can signal preprocessing issues, quantization inaccuracies, or the need for model tuning. .. code-block:: python import cv2 import numpy as np print("Inference on a happy golden retriever (class 207) ..") dog_image = cv2.imread(str(DATA_PATH/"golden_retriever_207.jpg")) dog_image = cv2.cvtColor(dog_image, cv2.COLOR_BGR2RGB) pp_dog_image = np.expand_dims(preprocess(dog_image), axis=0).astype(np.float32) sdk_net_output = sdk_net.execute(inputs={"input": pp_dog_image}) inference_label, inference_result = postprocess_output(sdk_net_output) print(f"[{idx}] --> {imagenet_labels[inference_label]} / 207 -> {inference_result:.2%}") By performing these checks, we ensure the model maintains expected performance before proceeding with deployment. .. tab:: Compile Once you are satisfied with the performance validation result, save and ``compile`` the model to get ready for SiMa MLA. .. code-block:: python # Save model sdk_net.save(model_name="quantized_resnet50", output_directory=str(MODELS_PATH)) # Compile the quantized net and generate LM file and MPK JSON file to get ready for deployment sdk_net.compile(output_path=str(MODELS_PATH/"compiled_resnet50")) The output of a compiled model in the ModelSDK is a `tar.gz` model that contains the compiled model, metadata in the form of a ``_mpk.json`` file and a stats file. Both the ``.lm`` compiled models and the ``*_mpk.json`` files will be used throughout this guide as you build the pipeline. For more information please refer to the :ref:`model_sdk` section. .. image:: ../../building_apps/developing_gstreamer_apps/media/modelsdk_output_files.jpg :align: center :scale: 55% | To better understand the contents of the ``*_mpk.json`` file, click the button below. While editing this file is typically unnecessary, reviewing its content can provide insight into the internals of the inferencing pipeline. .. button-link:: ../../../_static/pipeline-visualizer/mpkjson.html?data=mpk.json :color: info :shadow: View