Get Model Accuracy

The fastest way of getting accuracy numbers using your already compiled model is by using our Model Accelerator mode and following the steps below.

Architecture

Overview

On the host machine, load the dataset and run preprocessing for the data.
Run the compiled model on our board, we will send the preprocessed data to the board using Ethernet or PCIe and then run the model. We must tesselate for our internal memory and because our MLA uses int8 to run the operation we must quantize. Then we perform the inverse operation and we send the prediction back to the host.
Run any postprocessing on the raw predictions so we can either display it or classify it. By doing this, you can just copy the pre and post processing blocks from your existing pipeline and check that you get the same results with minimal effort.

The primary goal of this accelerator mode is to debug your application, and not necessarily to get high FPS numbers. We are a System-on-Chip and not an accelerator. Therefore, we have not focused our efforts in maximizing the performance of this mode.

Before jumping into the SiMa-specific code let us work on a well known model, ResNet50, using a well known framework, PyTorch.

Prepare Dataset

ResNet50 has been trained using ImageNet1000. However, this dataset is no longer publicly available. Therefore, for ease-of-use of this tutorial we attached a subset (500 images) from the validation set in a pickled file. To load the dataset we simply have to load it using pickle.

def get_dataset(path):
    with open(path, 'rb') as f:
        dataset = pickle.load(f)
    return dataset['data'], dataset['target']

PyTorch includes a ready to go preprocessing Transform with all the needed preprocessing for the images. Therefore, we will be using that.

We have added a class variable to set the output to numpy because PyTorch uses PyTorch.Tensor and SiMa APIs use numpy.arrays and we want to use the same Processing class. Also, our APIs expect NHWC format, so we need to transpose the array.

class Processing():
    """
    Handles image preprocessing and prediction postprocessing for ResNet50 model inference.
    Supports both PyTorch tensor and numpy array formats.
    """
    def __init__(self, numpy=False, resnet_type='IMAGENET1K_V2'):
        """
        Initialize the processing pipeline with appropriate transforms.

        Args:
            numpy (bool): If True, outputs numpy arrays; if False, outputs PyTorch tensors
            resnet_type (str): Version of ResNet weights to use for preprocessing transforms
        """
        # Set up preprocessing transforms based on the ResNet model version
        if resnet_type == 'IMAGENET1K_V2':
            self.preprocessing_transforms = models.ResNet50_Weights.IMAGENET1K_V2.transforms()
        elif resnet_type == 'IMAGENET1K_V1':
            self.preprocessing_transforms = models.ResNet50_Weights.IMAGENET1K_V1.transforms()
        self.numpy = numpy

    def set_numpy_outputs(self):
        """Switch output format to numpy arrays (used for hardware acceleration)."""
        self.numpy = True

    def set_torch_outputs(self):
        """Switch output format to PyTorch tensors (used for standard PyTorch inference)."""
        self.numpy = False

    def preprocessing(self, img):
        """
        Convert raw image data to format expected by ResNet50 model.

        Args:
            img: Raw image array in HWC format (Height, Width, Channels)

        Returns:
            Preprocessed image ready for model inference
        """
        # Convert numpy array to PIL Image, transpose from HWC to WHC format, and resize to 224x224
        img = Image.fromarray(img.transpose((1, 2, 0)), "RGB").resize((224, 224))
        # Apply ResNet preprocessing transforms (normalization, etc.) and add batch dimension
        preprocessed_img = torch.unsqueeze(self.preprocessing_transforms(img), dim=0)

        # Convert to numpy format if requested (needed for hardware accelerator)
        if self.numpy:
            preprocessed_img = preprocessed_img.detach().numpy().transpose(0, 2, 3, 1)

        return preprocessed_img

    def postprocessing(self, prediction):
        """
        Convert model output to predicted class index.

        Args:
            prediction: Raw model output (logits for each class)

        Returns:
            Predicted class index (integer)
        """
        # Find the class with highest probability/logit value
        if self.numpy:
            prediction = np.argmax(prediction)
        else:
            prediction = torch.argmax(prediction)

        return prediction

PyTorch Inference

The inferencing code for PyTorch is quite simple, we load the model, that will download automatically. We set it to evaluation and start using our images and labels.

Then we simply run preprocessing, the inference of the model, postprocessing, and analysis of the accuracy. Finally, we save the model and we want to save the input node name because we will use it for our SiMa compilation.

def pytorch_example(dataset, processing, model_path, resnet_type):
    """
    Run inference using standard PyTorch on CPU/GPU to establish baseline accuracy.
    This serves as the reference implementation for comparison with hardware acceleration.

    Args:
        dataset: Tuple of (images, labels) for inference
        processing: Processing object for pre/post processing
        model_path: Path where to save the trained model
        resnet_type: Type of ResNet weights to load

    Returns:
        tuple: (input_name, accuracy) - model input layer name and achieved accuracy
    """
    # Load pre-trained ResNet50 model with specified weights
    model = models.resnet50(weights=resnet_type)
    model.eval()  # Set to evaluation mode (disables dropout, batch norm training mode)

    images, labels = dataset
    total_images = len(images)

    print("Inferencing on pytorch...")
    accurate_predictions = 0
    # Process each image and compare prediction with ground truth label
    for img, label in tqdm(zip(images, labels), total=total_images):
        # Convert raw image to model input format
        preprocessed_img = processing.preprocessing(img)

        # Run forward pass through the model
        prediction = model(preprocessed_img)

        # Convert model output to predicted class
        prediction = processing.postprocessing(prediction)

        # Count correct predictions
        accurate_predictions += int(prediction == label)

    # Calculate and display accuracy metrics
    accuracy = (accurate_predictions/total_images) * 100
    print("Correct predictions:", accurate_predictions)
    print("Accuracy:", accuracy, "%")

    # Save the model for later compilation to hardware format
    torch.save(model, model_path)
    # Get the input layer name (needed for hardware compilation)
    name, _ = next(model.named_children())
    input_name = name
    print("Model", model_path, "saved with input name", input_name)

    return input_name, accuracy

Compile Model

This step is required to convert the PyTorch model to a quantized model that can be run on the SiMa board. Refer to ModelSDK for more information.

def compile_pytorch_resnet50(dataset, processing, model_path, input_name, target):
    """
    Compile PyTorch model to run on SiMa.ai hardware accelerator.
    This involves quantization (reducing precision) and compilation to hardware-specific format.

    Args:
        dataset: Calibration dataset for quantization
        processing: Processing object for data preparation
        model_path: Path to the saved PyTorch model
        input_name: Name of the model's input layer
        target: Hardware generation ("gen1" or "gen2")

    Returns:
        str: Path to the compiled model folder
    """
    # Import SiMa.ai specific compilation tools
    from afe.apis.defines import default_quantization, gen1_target, gen2_target
    from afe.apis.loaded_net import load_model
    from afe.core.utils import convert_data_generator_to_iterable
    from afe.load.importers.general_importer import pytorch_source
    from sima_utils.data.data_generator import DataGenerator


    # Select hardware target platform (different generations have different capabilities)
    assert target in ("gen1", "gen2")
    target = gen1_target if target == "gen1" else gen2_target
    print(f"Hardware target platform: {target}")

    # Define input shape: batch_size=1, channels=3 (RGB), height=224, width=224
    input_shape = (1, 3, 224, 224)
    # Set up model importer with PyTorch source and input specifications
    importer_params = pytorch_source(model_path, input_names=[input_name], input_shapes=[input_shape])

    # Load the model for hardware compilation
    loaded_net = load_model(importer_params, target=target)
    images, _ = dataset
    n_calib_samples = len(images)

    # Switch to numpy output format (required for hardware compilation)
    processing.set_numpy_outputs()

    # Prepare calibration samples for quantization (converting from float32 to lower precision)
    samples = np.empty((n_calib_samples, 224, 224, 3), dtype=np.float32)
    for i in range(n_calib_samples):
        preprocessed_img = processing.preprocessing(images[i])
        samples[i] = preprocessed_img #.transpose(0, 2, 3, 1)

    # Create data generator for calibration process
    input_generator = DataGenerator({input_name: samples})
    calibration_data = convert_data_generator_to_iterable(input_generator)

    # Quantize the model using calibration samples (reduces precision for faster hardware execution)
    model_sdk_net = loaded_net.quantize(calibration_data,
                                        default_quantization,
                                        model_name=model_path,
                                        arm_only=False)

    # Create output directory and save/compile the quantized model
    compiled_folder = "compiled_model/"
    os.makedirs(compiled_folder, exist_ok=True)
    model_sdk_net.save(model_name=model_path, output_directory=compiled_folder)
    model_sdk_net.compile(output_path=compiled_folder, log_level=logging.INFO)

    # Extract the compiled model from tar.gz archive
    import tarfile

    model = tarfile.open(compiled_folder + model_path + "_mpk.tar.gz")
    model.extractall(compiled_folder)
    model.close()

    print("Compilation done")

    return compiled_folder

SiMa Inference

Follow the steps below to run the model on our board; this is specific to SiMa’s Palette software. Steps

Send the .elf file to the board.

def send_model(args):
    """
    Transfer the compiled model file to the remote hardware device via SSH/SCP.
    Sets up SSH tunnel if needed and copies model files to the target device.

    Args:
        args: Command line arguments containing connection details
    """
    print("Sending model...")
    password = ''  # Password for SSH connection (empty means key-based auth)
    max_attempts = 10  # Maximum retry attempts for connection

    # Create SSH tunnel unless bypassed (tunnel allows secure connection to remote device)
    if not args.bypass_tunnel:
        ssh_connection, local_port = create_forward_tunnel(args, password, max_attempts)

        if ssh_connection is None:
            logging.debug(f'Failed to forward local port after {max_attempts}')
            sys.exit(-1)

        # Use the tunneled local port for subsequent connections
        args.dv_port = local_port

    # Copy the compiled model file (.elf or .tar.gz) to the remote device
    scp_file = scp_files_to_remote(args, args.model_file_path, password, "/home/sima", max_attempts)
    if scp_file is None:
        logging.error(f'Failed to scp the model file after {max_attempts}')
        sys.exit(-1)

Set up your Pipeline. As described above, the Accelerator runs the pre and post processing in the host and then the model on the board.

However, the data before getting into the quantized model has to go through tesselation and quantization and before running the postprocessing it has to go through detesselation and dequantization. Also, we must specify the destination of the board in the network since we will be using an Ethernet connection.

We have the parameters for these operations in the .json file and the Pipeline sets the parameter for all these operations.

Then we set the processing to numpy outputs and we start our inference loop. The main difference lies in calling pipeline.quantize_tesselate and pipeline.detesselate_dequantize for the previously mentioned reasons.

def model_accelerator_example(dataset, processing, args):
    """
    Run inference using SiMa.ai hardware accelerator and measure accuracy.
    This tests the quantized model performance on actual hardware.

    Args:
        dataset: Tuple of (images, labels) for inference
        processing: Processing object for pre/post processing
        args: Command line arguments with hardware connection details

    Returns:
        float: Accuracy achieved on hardware accelerator
    """
    # Send compiled model to remote hardware device
    send_model(args)
    # Initialize hardware pipeline for inference
    pipeline = Pipeline(args.model_file_path, args.mpk_json_path, devkit_ip=args.dv_host, local_port=args.dv_port, mlsoc_lm_folder="/home/sima")
    print("Model sent and ready!")

    # Switch to numpy format for hardware compatibility
    processing.set_numpy_outputs()

    images, labels = dataset
    total_images = len(images)

    accurate_predictions, i = 0, 0
    print("Inferencing using model accelerator...")
    # Process each image through the hardware accelerator
    for img, label in tqdm(zip(images, labels), total=total_images):
        # Skip any corrupted/missing images
        if img is None:
            continue
        # Convert image to model input format
        preprocessed_frame = processing.preprocessing(img)

        # Apply quantization and tessellation (splitting into tiles) for hardware processing
        preprocessed_frame =  pipeline.quantize_tesselate(preprocessed_frame)

        # Run inference on the hardware accelerator
        prediction = pipeline.run_inference(preprocessed_frame=preprocessed_frame[0], fcounter=i)

        # Reverse tessellation and quantization to get final result
        prediction = pipeline.detesselate_dequantize(prediction)

        # Convert output to predicted class
        prediction = processing.postprocessing(prediction)

        # Count correct predictions
        accurate_predictions += int(prediction == label)
        i += 1

    # Calculate and display accuracy metrics
    accuracy = (accurate_predictions/total_images) * 100
    print("Correct predictions:", accurate_predictions)
    print("Accuracy:", accuracy, "%")
    # Clean up hardware resources
    pipeline.release()
    return accuracy

As you can see the code is what you would expect by running any machine learning framework. It should also be useful as a template to run your own models and get the accuracy numbers.

Example

Download the Example

Steps

Download the Example

Unzip to a local directory and move the unzipped folder get_accuracy under your workspace directory:

sima-user@sima-user-machine:~$ cd ~/Downloads
sima-user@sima-user-machine:~/Downloads$ unzip get_accuracy.zip
sima-user@sima-user-machine:~/Downloads$ mv get_accuracy ~/workspace/

Go to the directory /home/docker/sima-cli/get_accuracy/ within the SDK container.

sima-user@docker-image-id:/home# cd /home/docker/sima-cli/get_accuracy
sima-user@docker-image-id:/home/docker/sima-cli/get_accuracy# chown <YOUR_USERNAME> ../get_accuracy
sima-user@docker-image-id:/home/docker/sima-cli/get_accuracy# sudo apt-get update
sima-user@docker-image-id:/home/docker/sima-cli/get_accuracy# sudo apt-get install sshpass

Run the application.

Compile the model

sima-user@docker-image-id:/home/docker/sima-cli/get_accuracy# python get_accuracy.py --model_file resnet50.pt --dv_host <BOARD_IP_ADDRESS> --run_pytorch_inference --compile_pytorch_model
    Inferencing on pytorch...
    100%|███████████████████████████████████████████████████████████████████████████████████████████████████| 465/465 [00:38<00:00, 12.21it/s]
    Correct predictions: 378
    Accuracy: 81.29032258064515 %
    Model resnet50.pt saved with input name conv1
    ...
    Inferencing using model accelerator...
    100%|███████████████████████████████████████████████████████████████████████████████████████████████████| 465/465 [00:34<00:00, 13.43it/s]
    Correct predictions: 379
    Accuracy: 81.50537634408602 %
    Accuracy lost due to quantization: -0.21505376344086358

Note

Target Type:

gen1 : It is a default option to compile for MLSoC target if you don’t specify --target
gen2 : --target gen2 to compile for Modalix target

Example: python get_accuracy.py --model_file resnet50.pt --dv_host <BOARD_IP_ADDRESS> --run_pytorch_inference --compile_pytorch_model --target gen2

Inference the Model

sima-user@docker-image-id:/home/docker/sima-cli/get_accuracy# python get_accuracy.py --model_file resnet50.pt --dv_host <BOARD_IP_ADDRESS>
    Sending model...
    Creating the Forwarding from host
    sima@192.168.135.30's password:
    Copying the model files to DevKit
    sima@192.168.135.30's password:
    Model sent and ready!
    Inferencing using model accelerator...
    100%|███████████████████████████████████████████████████████████████████████████████████████████████████| 465/465 [00:36<00:00, 12.87it/s]
    Correct predictions: 379
    Accuracy: 81.50537634408602 %