.. _Get Accuracy:

Get Model Accuracy
==================

The fastest way of getting accuracy numbers using your already compiled model is by using our **Model Accelerator** mode and following the steps below.


Architecture
------------

.. image:: media/ModelAccelerator.svg
       :align: center
       :alt: Model Accelerator Diagram

Overview
--------

#. On the host machine, load the dataset and run preprocessing for the data.

#. Run the compiled model on our board, we will send the preprocessed data to the board using Ethernet or PCIe and then run the model. We must tesselate for our internal memory and because our MLA uses int8 to run the operation we must quantize. Then we perform the inverse operation and we send the prediction back to the host.

#. Run any postprocessing on the raw predictions so we can either display it or classify it. By doing this, you can just copy the pre and post processing blocks from your existing pipeline and check that you get the same results with minimal effort.

The primary goal of this accelerator mode is to debug your application, and not necessarily to get high FPS numbers. We are a System-on-Chip and not an accelerator. Therefore, we have not focused our efforts in maximizing the performance of this mode.

Before jumping into the SiMa-specific code let us work on a well known model, ResNet50, using a well known framework, PyTorch.


Prepare Dataset 
---------------


ResNet50 has been trained using ``ImageNet1000``. However, this dataset is no longer publicly available. Therefore, for ease-of-use of this tutorial we attached a subset (500 images) from the validation set in a pickled file.
To load the dataset we simply have to load it using ``pickle``.

.. code-block:: python 

    def get_dataset(path):
        with open(path, 'rb') as f:
            dataset = pickle.load(f)
        return dataset['data'], dataset['target']

``PyTorch`` includes a ready to go preprocessing ``Transform`` with all the needed preprocessing for the images. Therefore, we will be using that.

We have added a class variable to set the output to ``numpy`` because ``PyTorch`` uses ``PyTorch.Tensor`` and SiMa APIs use ``numpy.arrays`` and we want to use the same ``Processing`` class.
Also, our APIs expect ``NHWC`` format, so we need to transpose the array.

.. code-block:: python 

    class Processing():
        """
        Handles image preprocessing and prediction postprocessing for ResNet50 model inference.
        Supports both PyTorch tensor and numpy array formats.
        """
        def __init__(self, numpy=False, resnet_type='IMAGENET1K_V2'):
            """
            Initialize the processing pipeline with appropriate transforms.
            
            Args:
                numpy (bool): If True, outputs numpy arrays; if False, outputs PyTorch tensors
                resnet_type (str): Version of ResNet weights to use for preprocessing transforms
            """
            # Set up preprocessing transforms based on the ResNet model version
            if resnet_type == 'IMAGENET1K_V2':
                self.preprocessing_transforms = models.ResNet50_Weights.IMAGENET1K_V2.transforms()
            elif resnet_type == 'IMAGENET1K_V1':
                self.preprocessing_transforms = models.ResNet50_Weights.IMAGENET1K_V1.transforms()
            self.numpy = numpy

        def set_numpy_outputs(self):
            """Switch output format to numpy arrays (used for hardware acceleration)."""
            self.numpy = True

        def set_torch_outputs(self):
            """Switch output format to PyTorch tensors (used for standard PyTorch inference)."""
            self.numpy = False

        def preprocessing(self, img):
            """
            Convert raw image data to format expected by ResNet50 model.
            
            Args:
                img: Raw image array in HWC format (Height, Width, Channels)
                
            Returns:
                Preprocessed image ready for model inference
            """
            # Convert numpy array to PIL Image, transpose from HWC to WHC format, and resize to 224x224
            img = Image.fromarray(img.transpose((1, 2, 0)), "RGB").resize((224, 224))
            # Apply ResNet preprocessing transforms (normalization, etc.) and add batch dimension
            preprocessed_img = torch.unsqueeze(self.preprocessing_transforms(img), dim=0)

            # Convert to numpy format if requested (needed for hardware accelerator)
            if self.numpy:
                preprocessed_img = preprocessed_img.detach().numpy().transpose(0, 2, 3, 1)

            return preprocessed_img
        
        def postprocessing(self, prediction):
            """
            Convert model output to predicted class index.
            
            Args:
                prediction: Raw model output (logits for each class)
                
            Returns:
                Predicted class index (integer)
            """
            # Find the class with highest probability/logit value
            if self.numpy:
                prediction = np.argmax(prediction)
            else:
                prediction = torch.argmax(prediction)

            return prediction

PyTorch Inference 
-----------------

The inferencing code for ``PyTorch`` is quite simple, we load the model, that will download automatically.
We set it to evaluation and start using our images and labels.

Then we simply run preprocessing, the inference of the model, postprocessing, and analysis of the accuracy.
Finally, we save the model and we want to save the input node name because we will use it for our SiMa compilation.

.. code-block:: python 


    def pytorch_example(dataset, processing, model_path, resnet_type):
        """
        Run inference using standard PyTorch on CPU/GPU to establish baseline accuracy.
        This serves as the reference implementation for comparison with hardware acceleration.
        
        Args:
            dataset: Tuple of (images, labels) for inference
            processing: Processing object for pre/post processing
            model_path: Path where to save the trained model
            resnet_type: Type of ResNet weights to load
            
        Returns:
            tuple: (input_name, accuracy) - model input layer name and achieved accuracy
        """
        # Load pre-trained ResNet50 model with specified weights
        model = models.resnet50(weights=resnet_type)
        model.eval()  # Set to evaluation mode (disables dropout, batch norm training mode)

        images, labels = dataset
        total_images = len(images)

        print("Inferencing on pytorch...")
        accurate_predictions = 0
        # Process each image and compare prediction with ground truth label
        for img, label in tqdm(zip(images, labels), total=total_images):
            # Convert raw image to model input format
            preprocessed_img = processing.preprocessing(img)

            # Run forward pass through the model
            prediction = model(preprocessed_img)

            # Convert model output to predicted class
            prediction = processing.postprocessing(prediction)

            # Count correct predictions
            accurate_predictions += int(prediction == label)

        # Calculate and display accuracy metrics
        accuracy = (accurate_predictions/total_images) * 100
        print("Correct predictions:", accurate_predictions)
        print("Accuracy:", accuracy, "%")

        # Save the model for later compilation to hardware format
        torch.save(model, model_path)
        # Get the input layer name (needed for hardware compilation)
        name, _ = next(model.named_children())
        input_name = name
        print("Model", model_path, "saved with input name", input_name)

        return input_name, accuracy


Compile Model
-------------

This step is required to convert the PyTorch model to a quantized model that can be run on the SiMa board. Refer to :ref:`ModelSDK` for more information.

.. code-block:: python 


    def compile_pytorch_resnet50(dataset, processing, model_path, input_name, target):
        """
        Compile PyTorch model to run on SiMa.ai hardware accelerator.
        This involves quantization (reducing precision) and compilation to hardware-specific format.
        
        Args:
            dataset: Calibration dataset for quantization
            processing: Processing object for data preparation  
            model_path: Path to the saved PyTorch model
            input_name: Name of the model's input layer
            target: Hardware generation ("gen1" or "gen2")
            
        Returns:
            str: Path to the compiled model folder
        """
        # Import SiMa.ai specific compilation tools
        from afe.apis.defines import default_quantization, gen1_target, gen2_target
        from afe.apis.loaded_net import load_model
        from afe.core.utils import convert_data_generator_to_iterable
        from afe.load.importers.general_importer import pytorch_source
        from sima_utils.data.data_generator import DataGenerator
        

        # Select hardware target platform (different generations have different capabilities)
        assert target in ("gen1", "gen2")
        target = gen1_target if target == "gen1" else gen2_target
        print(f"Hardware target platform: {target}")   

        # Define input shape: batch_size=1, channels=3 (RGB), height=224, width=224
        input_shape = (1, 3, 224, 224)
        # Set up model importer with PyTorch source and input specifications
        importer_params = pytorch_source(model_path, input_names=[input_name], input_shapes=[input_shape])

        # Load the model for hardware compilation
        loaded_net = load_model(importer_params, target=target)
        images, _ = dataset
        n_calib_samples = len(images)

        # Switch to numpy output format (required for hardware compilation)
        processing.set_numpy_outputs()

        # Prepare calibration samples for quantization (converting from float32 to lower precision)
        samples = np.empty((n_calib_samples, 224, 224, 3), dtype=np.float32)
        for i in range(n_calib_samples):
            preprocessed_img = processing.preprocessing(images[i])
            samples[i] = preprocessed_img #.transpose(0, 2, 3, 1)

        # Create data generator for calibration process
        input_generator = DataGenerator({input_name: samples})
        calibration_data = convert_data_generator_to_iterable(input_generator)

        # Quantize the model using calibration samples (reduces precision for faster hardware execution)
        model_sdk_net = loaded_net.quantize(calibration_data,
                                            default_quantization,
                                            model_name=model_path,
                                            arm_only=False)

        # Create output directory and save/compile the quantized model
        compiled_folder = "compiled_model/"
        os.makedirs(compiled_folder, exist_ok=True)
        model_sdk_net.save(model_name=model_path, output_directory=compiled_folder)
        model_sdk_net.compile(output_path=compiled_folder, log_level=logging.INFO)

        # Extract the compiled model from tar.gz archive
        import tarfile 

        model = tarfile.open(compiled_folder + model_path + "_mpk.tar.gz") 
        model.extractall(compiled_folder) 
        model.close() 

        print("Compilation done")

        return compiled_folder


SiMa Inference
--------------

Follow the steps below to run the model on our board; this is specific to SiMa's Palette software.
**Steps**

#. Send the ``.elf`` file to the board.

    .. code-block:: python 

        def send_model(args):
            """
            Transfer the compiled model file to the remote hardware device via SSH/SCP.
            Sets up SSH tunnel if needed and copies model files to the target device.
            
            Args:
                args: Command line arguments containing connection details
            """
            print("Sending model...")
            password = ''  # Password for SSH connection (empty means key-based auth)
            max_attempts = 10  # Maximum retry attempts for connection
            
            # Create SSH tunnel unless bypassed (tunnel allows secure connection to remote device)
            if not args.bypass_tunnel:
                ssh_connection, local_port = create_forward_tunnel(args, password, max_attempts)
                
                if ssh_connection is None:
                    logging.debug(f'Failed to forward local port after {max_attempts}')
                    sys.exit(-1)

                # Use the tunneled local port for subsequent connections
                args.dv_port = local_port

            # Copy the compiled model file (.elf or .tar.gz) to the remote device
            scp_file = scp_files_to_remote(args, args.model_file_path, password, "/home/sima", max_attempts)
            if scp_file is None:
                logging.error(f'Failed to scp the model file after {max_attempts}')
                sys.exit(-1)


#. Set up your ``Pipeline``. As described above, the Accelerator runs the pre and post processing in the host and then the model on the board.

    However, the data before getting into the quantized model has to go through ``tesselation`` and ``quantization`` and before running the postprocessing it has to go through ``detesselation`` and ``dequantization``. 
    Also, we must specify the destination of the board in the network since we will be using an Ethernet connection.

    We have the parameters for these operations in the ``.json`` file and the ``Pipeline`` sets the parameter for all these operations.

    Then we set the processing to ``numpy`` outputs and we start our inference loop. The main difference lies in calling ``pipeline.quantize_tesselate`` and ``pipeline.detesselate_dequantize`` for the previously mentioned reasons.

    .. code-block:: python 

        def model_accelerator_example(dataset, processing, args):
            """
            Run inference using SiMa.ai hardware accelerator and measure accuracy.
            This tests the quantized model performance on actual hardware.
            
            Args:
                dataset: Tuple of (images, labels) for inference
                processing: Processing object for pre/post processing
                args: Command line arguments with hardware connection details
                
            Returns:
                float: Accuracy achieved on hardware accelerator
            """
            # Send compiled model to remote hardware device
            send_model(args)
            # Initialize hardware pipeline for inference
            pipeline = Pipeline(args.model_file_path, args.mpk_json_path, devkit_ip=args.dv_host, local_port=args.dv_port, mlsoc_lm_folder="/home/sima")
            print("Model sent and ready!")

            # Switch to numpy format for hardware compatibility
            processing.set_numpy_outputs()

            images, labels = dataset
            total_images = len(images)

            accurate_predictions, i = 0, 0
            print("Inferencing using model accelerator...")
            # Process each image through the hardware accelerator
            for img, label in tqdm(zip(images, labels), total=total_images):
                # Skip any corrupted/missing images
                if img is None:
                    continue
                # Convert image to model input format
                preprocessed_frame = processing.preprocessing(img)
                
                # Apply quantization and tessellation (splitting into tiles) for hardware processing
                preprocessed_frame =  pipeline.quantize_tesselate(preprocessed_frame)

                # Run inference on the hardware accelerator
                prediction = pipeline.run_inference(preprocessed_frame=preprocessed_frame[0], fcounter=i)
                
                # Reverse tessellation and quantization to get final result
                prediction = pipeline.detesselate_dequantize(prediction)

                # Convert output to predicted class
                prediction = processing.postprocessing(prediction)    

                # Count correct predictions
                accurate_predictions += int(prediction == label)
                i += 1

            # Calculate and display accuracy metrics
            accuracy = (accurate_predictions/total_images) * 100
            print("Correct predictions:", accurate_predictions)
            print("Accuracy:", accuracy, "%")
            # Clean up hardware resources
            pipeline.release()
            return accuracy


    As you can see the code is what you would expect by running any machine learning framework.
    It should also be useful as a template to run your own models and get the accuracy numbers.


Example
-------

.. button-link:: https://docs.sima.ai/pkg_downloads/SDK1.7.0/tools/get_accuracy.zip
    :color: primary
    :shadow:

    Download the Example

**Steps**


    Download the Example

#. Unzip to a local directory and move the unzipped folder ``get_accuracy`` under your ``workspace`` directory:

    .. code-block:: console

        sima-user@sima-user-machine:~$ cd ~/Downloads
        sima-user@sima-user-machine:~/Downloads$ unzip get_accuracy.zip
        sima-user@sima-user-machine:~/Downloads$ mv get_accuracy ~/workspace/

#. Go to the directory ``/home/docker/sima-cli/get_accuracy/`` within the SDK container.

    .. code-block:: console

        sima-user@docker-image-id:/home# cd /home/docker/sima-cli/get_accuracy
        sima-user@docker-image-id:/home/docker/sima-cli/get_accuracy# chown <YOUR_USERNAME> ../get_accuracy
        sima-user@docker-image-id:/home/docker/sima-cli/get_accuracy# sudo apt-get update
        sima-user@docker-image-id:/home/docker/sima-cli/get_accuracy# sudo apt-get install sshpass


#. Run the application.

    
    Compile the model 

    .. code-block:: console 

        sima-user@docker-image-id:/home/docker/sima-cli/get_accuracy# python get_accuracy.py --model_file resnet50.pt --dv_host <BOARD_IP_ADDRESS> --run_pytorch_inference --compile_pytorch_model
            Inferencing on pytorch...
            100%|███████████████████████████████████████████████████████████████████████████████████████████████████| 465/465 [00:38<00:00, 12.21it/s]
            Correct predictions: 378
            Accuracy: 81.29032258064515 %
            Model resnet50.pt saved with input name conv1
            ...
            Inferencing using model accelerator...
            100%|███████████████████████████████████████████████████████████████████████████████████████████████████| 465/465 [00:34<00:00, 13.43it/s]
            Correct predictions: 379
            Accuracy: 81.50537634408602 %
            Accuracy lost due to quantization: -0.21505376344086358

    .. note::
        Target Type:

        - ``gen1`` : It is a default option to compile for ``MLSoC`` target if you don't specify ``--target``
        - ``gen2`` : ``--target gen2`` to compile for ``Modalix`` target

        Example: ``python get_accuracy.py --model_file resnet50.pt --dv_host <BOARD_IP_ADDRESS> --run_pytorch_inference --compile_pytorch_model --target gen2``

    Inference the Model 

    .. code-block:: console

        sima-user@docker-image-id:/home/docker/sima-cli/get_accuracy# python get_accuracy.py --model_file resnet50.pt --dv_host <BOARD_IP_ADDRESS>
            Sending model...
            Creating the Forwarding from host
            sima@192.168.135.30's password: 
            Copying the model files to DevKit
            sima@192.168.135.30's password: 
            Model sent and ready!
            Inferencing using model accelerator...
            100%|███████████████████████████████████████████████████████████████████████████████████████████████████| 465/465 [00:36<00:00, 12.87it/s]
            Correct predictions: 379
            Accuracy: 81.50537634408602 %


.. toctree::
   :maxdepth: 2
   :caption: GetAccuracy