.. _optimize_model:

==================
Compile Your Model
==================

As a developer, you can use the :ref:`ModelSDK <model_sdk>` to prepare machine learning models for deployment on the MLSoC.  The process of preparing a model includes converting it to use 
lower-precision data types on which the MLSoC can compute much more efficiently.  Developers have several options to do this conversion, depending on the 
computational performance and numerical accuracy they want their model to attain. :ref:`Post-Training Quantization (PTQ) <post_training_quantization>` is an efficient and straightforward method to reduce model 
size and improve inference speed while maintaining minimal accuracy loss.

The PTQ workflow involves:

- Loading a model
- Quantizing it to ``int8`` or ``int16``
- Evaluating its accuracy
- Compiling it for execution  

To achieve this, you will need to write a Python script to perform these steps.  

The following example demonstrates step-by-step how to optimize a ResNet-50 model using PTQ. 

.. dropdown:: Prerequisites
    :animate: fade-in
    :color: secondary

    .. button-link:: https://docs.sima.ai/pkg_downloads/SDK1.6.0/1.6.0_Palette_SDK_master_B202.zip
        :color: primary
        :shadow:

        Download Palette


    The develpper will need a modern Ubuntu 22.04+ or Windows 11 Pro machine to install and run Palette. 
    For more information on system requirements and installation procedure, refer to :ref:`Palette installation`.

.. tabs::

   .. tab:: Run Sample Script

        Start by running this sample script in Palette to convert a ResNet-50 model into an optimized version using PTQ.

        .. button-link:: https://docs.sima.ai/assets/ptq-example.tar.gz
            :color: primary
            :shadow:

            Download Sample Script

        
        First, uncompress the example in the Palette environment and install necessary dependencies. 
        Assuming the file was copied to the ``~/workspace`` directory on the host, which maps to ``/home/docker/sima-cli`` by default.


        Grant the RW permission to the pybind11-2.13.6.dist-info & pybind11 package to avoid any permission access issue

        .. code-block:: console

            sima-user@docker-image-id:/home/$ cd docker/sima-cli
            
                sudo chmod 755 -R /usr/local/lib/python3.10/site-packages/pybind11-2.13.6.dist-info
                sudo chmod 755 -R /usr/local/lib/python3.10/site-packages/pybind11

        Setup the downloaded project and install the required python packages in the virtual environment with access to system-site-packages

        .. code-block:: console

            sima-user@docker-image-id:/home/$ cd docker/sima-cli
            sima-user@docker-image-id:/home/docker/sima-cli$ tar -xvf ptq-example.tar.gz 
                ptq-example/
                ptq-example/README.md
                ptq-example/src/
                ptq-example/src/x86_reference_app/
                ptq-example/src/x86_reference_app/resnet50_reference_classification_app.py
                ptq-example/src/modelsdk_quantize_model/
                ptq-example/src/modelsdk_quantize_model/resnet50_quant.py
                ptq-example/models/
                ptq-example/models/download_resnet50.py
                ptq-example/data/
                ptq-example/data/openimages_v7_images_and_labels.pkl
                ptq-example/data/golden_retriever_207.jpg
                ptq-example/data/imagenet_labels.txt
                ptq-example/requirements.txt
            
            sima-user@docker-image-id:/home/$ cd /home/docker/sima-cli/ptq-example
            sima-user@docker-image-id:/home/docker/sima-cli/ptq-example$ python3 -m venv --system-site-packages .env
            sima-user@docker-image-id:/home/docker/sima-cli/ptq-example$ source .env/bin/activate
            (.env)sima-user@docker-image-id:/home/docker/sima-cli/ptq-example$ pip3 install -r requirements.txt
            
            
        Then, run the ``download_resnet50.py`` script to retrieve the official resnet50 ONNX model.


        .. code-block:: console

            (.env)sima-user@docker-image-id:/home/$ cd /home/docker/sima-cli/ptq-example/models
            (.env)sima-user@docker-image-id:/home/docker/sima-cli/ptq-example/models$ python3 download_resnet50.py 
                ... ... ... 
                Model exported successfully to /home/docker/sima-cli/ptq-example/models/resnet50_export.onnx
                Simplified model saved to /home/docker/sima-cli/ptq-example/models/resnet50_model.onnx


        Lastly, run the full model quantization script.

        .. code-block:: console

            (.env)sima-user@docker-image-id:/home/docker/sima-cli/ptq-example/src/modelsdk_quantize_model$ python3 resnet50_quant.py 
                Model SDK version: 1.6.0
                Running Calibration ...DONE
                ... ... ... 
                Inference on a happy golden retriever (class 207)  ..
                [5] --> 207: 'golden retriever', / 207  -> 98.82%
                Compiling the model ..

            (.env)sima-user@docker-image-id:/home/docker/sima-cli/ptq-example/models$ ls -ail compiled_resnet50/
            total 21688
            29626653 drwxr-xr-x 2 jim jim     4096 Feb 25 23:47 .
            29529959 drwxr-xr-x 3 jim jim     4096 Feb 25 23:46 ..
            29626663 -rw-r--r-- 1 jim jim 22198207 Feb 25 23:47 quantized_resnet50_mpk.tar.gz


        The ``quantized_rest50_mpk.tar.gz`` file in this folder is the result of the quantization process. You can use this file with the ``mpk project create`` command to generate the skeleton of an MPK project. Refer to this :ref:`article <building_first_pipeline_with_palette>` for a detailed explanation of the process.

        If you have access to Edgematic, import this file directly into the Edgematic platform to create an application. For more information, refer to the :ref:`Edgematic documentation <building_first_pipeline_with_edgematic>`.

To learn more about how the ``resnet50_quant.py`` script works, continue reading the following sections.

.. tabs::

   .. tab:: Load Model

        The first step of PTQ is to :ref:`load <Model Loading>` an ONNX ResNet50 model into Palette for further processing. The following code snippet demonstrates how to do this. 

        .. code-block:: python

            from afe.apis.loaded_net import load_model
            from afe.load.importers.general_importer import onnx_source
            from afe.ir.tensor_type import ScalarType

            MODEL_PATH = "resnet50_model.onnx"

            # Model information
            input_name, input_shape, input_type = ("input", (1, 3, 224, 224), ScalarType.float32)
            input_shapes_dict = {input_name: input_shape}
            input_types_dict = {input_name: input_type}

            # Load the ONNX model
            importer_params = onnx_source(str(MODEL_PATH), input_shapes_dict, input_types_dict)
            loaded_net = load_model(importer_params)

        The script defines the model path and input metadata. The variable ``MODEL_PATH`` specifies the location of the ONNX model file. The input tensor is identified 
        by the name ``"input"`` and is given a shape of ``(1, 3, 224, 224)``, representing a batch size of one, three color channels, and an image resolution of 224x224 pixels. 
        The input type is set as ``ScalarType.float32``, indicating that the model expects floating-point values.

        A dictionary, ``input_shapes_dict``, maps input names to their respective shapes, while ``input_types_dict`` associates input names with their data types.
        These dictionaries are passed to ``onnx_source``, which creates a description of how to load the model from the ONNX file. The actual model file remains unchanged.
        The model is later loaded and converted into a format compatible with the SiMa.ai SDK by the ``load_model`` function.

        Finally, ``load_model(importer_params)`` is called to load the prepared model into memory. This step ensures the model is ready for subsequent operations such as quantization, 
        optimization, or inference on SiMa.ai's MLSoC.

   .. tab:: Prepare Calibration Dataset

        A calibration dataset is needed for quantization to determine optimal scaling factors when converting model weights and activations from floating-point (FP32) to lower 
        precision (e.g., INT8). Since integer representations have a limited range, calibration helps map FP32 values efficiently, minimizing precision loss and avoiding issues 
        like clipping or compression. By analyzing real input distributions, it ensures per-layer adjustments, preserving important activations while optimizing for reduced 
        computational cost. Without calibration, direct quantization could degrade accuracy, making the model less reliable for inference.

        .. code-block:: python

            from afe import DataGenerator

            LABELS_PATH = "imagenet_labels.txt"
            CALIBRATION_SET_PATH = "openimages_v7_images_and_labels.pkl"

            def preprocess(image, skip_transpose=True, input_shape: tuple = (224, 224), scale_factor: tuple = 255.0):
                '''
                Resizes an image to 224x224, normalizes pixel values to [0,1], and 
                applies mean subtraction and standard deviation normalization
                '''
                mean = [0.485, 0.456, 0.406]
                stddv = [0.229, 0.224, 0.225]
                
                # val224 images come in CHW format, need to transpose to HWC format
                if not skip_transpose:
                    image = image.transpose(1, 2, 0)
                
                # Resize, color convert, scale, normalize
                image = cv2.resize(image, input_shape)
                image = image / scale_factor
                image = (image - mean) / stddv
                
                return image

            # Dataset and preprocessing #
            def create_imagenet_dataset(num_samples: int = 1) -> Dict[str, DataGenerator]:
                """
                Creates a data generator with the structure
                { 'images': DataGenerator of image arrays
                'labels': DataGenerators of labels }
                """
                dataset_path = CALIBRATION_SET_PATH

                with open(dataset_path, 'rb') as f:
                    dataset = pkl.load(f)

                images_and_labels = {'images': dataset['data'][:num_samples], 
                                    'labels': dataset['target'][:num_samples]}
                
                return images_and_labels

            # Create the calibration dataset
            images_and_labels = create_imagenet_dataset(num_samples=MAX_DATA_SAMPLES)

            # Create a datagenerator from it and map the preprocessing function
            images_generator = DataGenerator({MODEL_INPUT_NAME: images_and_labels["images"]})
            images_generator.map({MODEL_INPUT_NAME: preprocess})

        The code loads a pre-saved dataset (openimages_v7_images_and_labels.pkl) that contains images and their corresponding labels. It selects a specific number of samples 
        (num_samples) and returns them as a dictionary with two parts:

        - ``images`` : a list of image arrays.
        - ``labels`` : a list of corresponding labels.

        Then, it uses the DataGenerator utility to create a dataset from these images so they can be processed in batches. Before using the data, it applies a preprocessing function
        (preprocess) to prepare the images in the right format for the model. This dataset is used for calibrating the model when converting it from high precision (FP32) to lower 
        precision (INT8).

   .. tab:: Quantize

        After the model is loaded into the ``loaded_net`` object and calibration dataset is prepared, the following snippet shows how to quantize the model. 

        .. code-block:: python

            from afe.apis.defines import QuantizationParams, quantization_scheme, CalibrationMethod
            from afe.core.utils import convert_data_generator_to_iterable

            # Define quantization parameters
            quant_configs: QuantizationParams = QuantizationParams(
                calibration_method=CalibrationMethod.from_str('min_max'),
                activation_quantization_scheme=quantization_scheme(
                    asymmetric=True, 
                    per_channel=False, 
                    bits=8
                ),
                weight_quantization_scheme=quantization_scheme(
                    asymmetric=False, 
                    per_channel=True, 
                    bits=8
                )
            )

            # Perform quantization using MSE and INT8
            sdk_net = loaded_net.quantize(
                convert_data_generator_to_iterable(images_generator),
                quant_configs,
                model_name="quantized_resnet50",
                arm_only=False
            )

        This code first defines the quantization parameters and then applies quantization to a loaded neural network model.

        - The :py:meth:`afe.apis.defines.QuantizationParams` object is created to specify how the model should be quantized. 
        - The :py:meth:`afe.apis.defines.CalibrationMethod` is set to ``min_max`` to compute the quantization parameters. 
        - The :py:meth:`afe.apis.defines.quantization_scheme` is configured to be asymmetric with per-channel quantization disabled, using 8-bit precision. 
        - The weight quantization scheme is configured to be symmetric with per-channel quantization enabled, also using 8-bit precision.

        After defining the quantization parameters, the :py:meth:`afe.apis.loaded_net.LoadedNet.quantize` method is called on the :py:meth:`afe.apis.loaded_net.LoadedNet` object. 
        It takes the calibration dataset generated in the previous step, convert to an iterable using ``convert_data_generator_to_iterable(images_generator)``, as input. The function 
        then applies the specified quantization configurations and outputs the quantized model. 

   .. tab:: Performance Validation

        Before compiling the model for deployment, it is crucial to validate its performance. This ensures that the model functions correctly after preprocessing and quantization.

        The :py:meth:`afe.apis.loaded_net.LoadedNet.execute` method allows quantized models to be executed in software with user-supplied input data so that the results can be evaluated.

        **Running Inference on Multiple Samples**

        - The following sample code implements a loop to run inference on a list of images, compares the predicted labels with the reference (ground truth) labels, and prints each result along with the model’s confidence score, helping verify the model's output accuracy.
        - The ``postprocess_output`` function processes the raw output from the model and extracts the most likely class along with its confidence score.

        .. code-block:: python

            import cv2
            import numpy as np            

            def postprocess_output(output: np.ndarray):
                probabilities = output[0][0]
                max_idx = np.argmax(probabilities)
                return max_idx, probabilities[max_idx]

            with open(LABELS_PATH, "r") as f:
                imagenet_labels = [line.strip() for line in f.readlines()]
            
            for idx in range(6):
                sdk_net_output = sdk_net.execute(inputs={"input": images_generator[idx]["input"]})
                inference_label, inference_result = postprocess_output(sdk_net_output)
                reference_label = images_and_labels["labels"][idx]
                
                print(f"[{idx}] --> {imagenet_labels[inference_label]} / {reference_label} -> {inference_result:.2%}")


        **Validating with a Known Class (Golden Retriever - Class 207)**

        - The following code runs inference on a specific, well-known image helps confirm that the model correctly classifies familiar objects.
        - Any misclassification can signal preprocessing issues, quantization inaccuracies, or the need for model tuning.

        .. code-block:: python

            import cv2
            import numpy as np            

            print("Inference on a happy golden retriever (class 207)  ..")

            dog_image = cv2.imread(str(DATA_PATH/"golden_retriever_207.jpg"))
            dog_image = cv2.cvtColor(dog_image, cv2.COLOR_BGR2RGB)
            pp_dog_image = np.expand_dims(preprocess(dog_image), axis=0).astype(np.float32)
            sdk_net_output = sdk_net.execute(inputs={"input": pp_dog_image})
            inference_label, inference_result = postprocess_output(sdk_net_output)

            print(f"[{idx}] --> {imagenet_labels[inference_label]} / 207  -> {inference_result:.2%}")

        By performing these checks, we ensure the model maintains expected performance before proceeding with deployment.
    
   .. tab:: Compile

    Once you are satisfied with the performance validation result, save and ``compile`` the model to get ready for SiMa MLA.

    .. code-block:: python

        # Save model
        sdk_net.save(model_name="quantized_resnet50", output_directory=str(MODELS_PATH))

        # Compile the quantized net and generate LM file and MPK JSON file to get ready for deployment
        sdk_net.compile(output_path=str(MODELS_PATH/"compiled_resnet50"))

    The output of a compiled model in the ModelSDK is a `tar.gz` model that contains the compiled model, metadata in the form of a ``_mpk.json`` file and a stats file.
    Both the ``.lm`` compiled models and the ``*_mpk.json`` files will be used throughout this guide as you build the pipeline. For more information please refer to the :ref:`model_sdk` section.

    .. image:: ../../building_apps/developing_gstreamer_apps/media/modelsdk_output_files.jpg
        :align: center
        :scale: 55%
    
    |
    
    To better understand the contents of the ``*_mpk.json`` file, click the button below. While editing this file is typically unnecessary, reviewing its content can provide insight into the internals of the inferencing pipeline.

    .. button-link:: ../../../_static/pipeline-visualizer/mpkjson.html?data=mpk.json
        :color: info
        :shadow:   

        View