.. _Get Accuracy: Get Model Accuracy ================== The fastest way of getting accuracy numbers using your already compiled model is by using our **Model Accelerator** mode and following the steps below. Architecture ------------ .. image:: media/ModelAccelerator.svg :align: center :alt: Model Accelerator Diagram Overview -------- #. On the host machine, load the dataset and run preprocessing for the data. #. Run the compiled model on our board, we will send the preprocessed data to the board using Ethernet or PCIe and then run the model. We must tesselate for our internal memory and because our MLA uses int8 to run the operation we must quantize. Then we perform the inverse operation and we send the prediction back to the host. #. Run any postprocessing on the raw predictions so we can either display it or classify it. By doing this, you can just copy the pre and post processing blocks from your existing pipeline and check that you get the same results with minimal effort. The primary goal of this accelerator mode is to debug your application, and not necessarily to get high FPS numbers. We are a System-on-Chip and not an accelerator. Therefore, we have not focused our efforts in maximizing the performance of this mode. Before jumping into the SiMa-specific code let us work on a well known model, ResNet50, using a well known framework, PyTorch. Prepare Dataset --------------- ResNet50 has been trained using ``ImageNet1000``. However, this dataset is no longer publicly available. Therefore, for ease-of-use of this tutorial we attached a subset (500 images) from the validation set in a pickled file. To load the dataset we simply have to load it using ``pickle``. .. code-block:: python def get_dataset(path): with open(path, 'rb') as f: dataset = pickle.load(f) return dataset['data'], dataset['target'] ``PyTorch`` includes a ready to go preprocessing ``Transform`` with all the needed preprocessing for the images. Therefore, we will be using that. We have added a class variable to set the output to ``numpy`` because ``PyTorch`` uses ``PyTorch.Tensor`` and SiMa APIs use ``numpy.arrays`` and we want to use the same ``Processing`` class. Also, our APIs expect ``NHWC`` format, so we need to transpose the array. .. code-block:: python class Processing(): """ Handles image preprocessing and prediction postprocessing for ResNet50 model inference. Supports both PyTorch tensor and numpy array formats. """ def __init__(self, numpy=False, resnet_type='IMAGENET1K_V2'): """ Initialize the processing pipeline with appropriate transforms. Args: numpy (bool): If True, outputs numpy arrays; if False, outputs PyTorch tensors resnet_type (str): Version of ResNet weights to use for preprocessing transforms """ # Set up preprocessing transforms based on the ResNet model version if resnet_type == 'IMAGENET1K_V2': self.preprocessing_transforms = models.ResNet50_Weights.IMAGENET1K_V2.transforms() elif resnet_type == 'IMAGENET1K_V1': self.preprocessing_transforms = models.ResNet50_Weights.IMAGENET1K_V1.transforms() self.numpy = numpy def set_numpy_outputs(self): """Switch output format to numpy arrays (used for hardware acceleration).""" self.numpy = True def set_torch_outputs(self): """Switch output format to PyTorch tensors (used for standard PyTorch inference).""" self.numpy = False def preprocessing(self, img): """ Convert raw image data to format expected by ResNet50 model. Args: img: Raw image array in HWC format (Height, Width, Channels) Returns: Preprocessed image ready for model inference """ # Convert numpy array to PIL Image, transpose from HWC to WHC format, and resize to 224x224 img = Image.fromarray(img.transpose((1, 2, 0)), "RGB").resize((224, 224)) # Apply ResNet preprocessing transforms (normalization, etc.) and add batch dimension preprocessed_img = torch.unsqueeze(self.preprocessing_transforms(img), dim=0) # Convert to numpy format if requested (needed for hardware accelerator) if self.numpy: preprocessed_img = preprocessed_img.detach().numpy().transpose(0, 2, 3, 1) return preprocessed_img def postprocessing(self, prediction): """ Convert model output to predicted class index. Args: prediction: Raw model output (logits for each class) Returns: Predicted class index (integer) """ # Find the class with highest probability/logit value if self.numpy: prediction = np.argmax(prediction) else: prediction = torch.argmax(prediction) return prediction PyTorch Inference ----------------- The inferencing code for ``PyTorch`` is quite simple, we load the model, that will download automatically. We set it to evaluation and start using our images and labels. Then we simply run preprocessing, the inference of the model, postprocessing, and analysis of the accuracy. Finally, we save the model and we want to save the input node name because we will use it for our SiMa compilation. .. code-block:: python def pytorch_example(dataset, processing, model_path, resnet_type): """ Run inference using standard PyTorch on CPU/GPU to establish baseline accuracy. This serves as the reference implementation for comparison with hardware acceleration. Args: dataset: Tuple of (images, labels) for inference processing: Processing object for pre/post processing model_path: Path where to save the trained model resnet_type: Type of ResNet weights to load Returns: tuple: (input_name, accuracy) - model input layer name and achieved accuracy """ # Load pre-trained ResNet50 model with specified weights model = models.resnet50(weights=resnet_type) model.eval() # Set to evaluation mode (disables dropout, batch norm training mode) images, labels = dataset total_images = len(images) print("Inferencing on pytorch...") accurate_predictions = 0 # Process each image and compare prediction with ground truth label for img, label in tqdm(zip(images, labels), total=total_images): # Convert raw image to model input format preprocessed_img = processing.preprocessing(img) # Run forward pass through the model prediction = model(preprocessed_img) # Convert model output to predicted class prediction = processing.postprocessing(prediction) # Count correct predictions accurate_predictions += int(prediction == label) # Calculate and display accuracy metrics accuracy = (accurate_predictions/total_images) * 100 print("Correct predictions:", accurate_predictions) print("Accuracy:", accuracy, "%") # Save the model for later compilation to hardware format torch.save(model, model_path) # Get the input layer name (needed for hardware compilation) name, _ = next(model.named_children()) input_name = name print("Model", model_path, "saved with input name", input_name) return input_name, accuracy Compile Model ------------- This step is required to convert the PyTorch model to a quantized model that can be run on the SiMa board. Refer to :ref:`ModelSDK` for more information. .. code-block:: python def compile_pytorch_resnet50(dataset, processing, model_path, input_name, target): """ Compile PyTorch model to run on SiMa.ai hardware accelerator. This involves quantization (reducing precision) and compilation to hardware-specific format. Args: dataset: Calibration dataset for quantization processing: Processing object for data preparation model_path: Path to the saved PyTorch model input_name: Name of the model's input layer target: Hardware generation ("gen1" or "gen2") Returns: str: Path to the compiled model folder """ # Import SiMa.ai specific compilation tools from afe.apis.defines import default_quantization, gen1_target, gen2_target from afe.apis.loaded_net import load_model from afe.core.utils import convert_data_generator_to_iterable from afe.load.importers.general_importer import pytorch_source from sima_utils.data.data_generator import DataGenerator # Select hardware target platform (different generations have different capabilities) assert target in ("gen1", "gen2") target = gen1_target if target == "gen1" else gen2_target print(f"Hardware target platform: {target}") # Define input shape: batch_size=1, channels=3 (RGB), height=224, width=224 input_shape = (1, 3, 224, 224) # Set up model importer with PyTorch source and input specifications importer_params = pytorch_source(model_path, input_names=[input_name], input_shapes=[input_shape]) # Load the model for hardware compilation loaded_net = load_model(importer_params, target=target) images, _ = dataset n_calib_samples = len(images) # Switch to numpy output format (required for hardware compilation) processing.set_numpy_outputs() # Prepare calibration samples for quantization (converting from float32 to lower precision) samples = np.empty((n_calib_samples, 224, 224, 3), dtype=np.float32) for i in range(n_calib_samples): preprocessed_img = processing.preprocessing(images[i]) samples[i] = preprocessed_img #.transpose(0, 2, 3, 1) # Create data generator for calibration process input_generator = DataGenerator({input_name: samples}) calibration_data = convert_data_generator_to_iterable(input_generator) # Quantize the model using calibration samples (reduces precision for faster hardware execution) model_sdk_net = loaded_net.quantize(calibration_data, default_quantization, model_name=model_path, arm_only=False) # Create output directory and save/compile the quantized model compiled_folder = "compiled_model/" os.makedirs(compiled_folder, exist_ok=True) model_sdk_net.save(model_name=model_path, output_directory=compiled_folder) model_sdk_net.compile(output_path=compiled_folder, log_level=logging.INFO) # Extract the compiled model from tar.gz archive import tarfile model = tarfile.open(compiled_folder + model_path + "_mpk.tar.gz") model.extractall(compiled_folder) model.close() print("Compilation done") return compiled_folder SiMa Inference -------------- Follow the steps below to run the model on our board; this is specific to SiMa's Palette software. **Steps** #. Send the ``.elf`` file to the board. .. code-block:: python def send_model(args): """ Transfer the compiled model file to the remote hardware device via SSH/SCP. Sets up SSH tunnel if needed and copies model files to the target device. Args: args: Command line arguments containing connection details """ print("Sending model...") password = '' # Password for SSH connection (empty means key-based auth) max_attempts = 10 # Maximum retry attempts for connection # Create SSH tunnel unless bypassed (tunnel allows secure connection to remote device) if not args.bypass_tunnel: ssh_connection, local_port = create_forward_tunnel(args, password, max_attempts) if ssh_connection is None: logging.debug(f'Failed to forward local port after {max_attempts}') sys.exit(-1) # Use the tunneled local port for subsequent connections args.dv_port = local_port # Copy the compiled model file (.elf or .tar.gz) to the remote device scp_file = scp_files_to_remote(args, args.model_file_path, password, "/home/sima", max_attempts) if scp_file is None: logging.error(f'Failed to scp the model file after {max_attempts}') sys.exit(-1) #. Set up your ``Pipeline``. As described above, the Accelerator runs the pre and post processing in the host and then the model on the board. However, the data before getting into the quantized model has to go through ``tesselation`` and ``quantization`` and before running the postprocessing it has to go through ``detesselation`` and ``dequantization``. Also, we must specify the destination of the board in the network since we will be using an Ethernet connection. We have the parameters for these operations in the ``.json`` file and the ``Pipeline`` sets the parameter for all these operations. Then we set the processing to ``numpy`` outputs and we start our inference loop. The main difference lies in calling ``pipeline.quantize_tesselate`` and ``pipeline.detesselate_dequantize`` for the previously mentioned reasons. .. code-block:: python def model_accelerator_example(dataset, processing, args): """ Run inference using SiMa.ai hardware accelerator and measure accuracy. This tests the quantized model performance on actual hardware. Args: dataset: Tuple of (images, labels) for inference processing: Processing object for pre/post processing args: Command line arguments with hardware connection details Returns: float: Accuracy achieved on hardware accelerator """ # Send compiled model to remote hardware device send_model(args) # Initialize hardware pipeline for inference pipeline = Pipeline(args.model_file_path, args.mpk_json_path, devkit_ip=args.dv_host, local_port=args.dv_port, mlsoc_lm_folder="/home/sima") print("Model sent and ready!") # Switch to numpy format for hardware compatibility processing.set_numpy_outputs() images, labels = dataset total_images = len(images) accurate_predictions, i = 0, 0 print("Inferencing using model accelerator...") # Process each image through the hardware accelerator for img, label in tqdm(zip(images, labels), total=total_images): # Skip any corrupted/missing images if img is None: continue # Convert image to model input format preprocessed_frame = processing.preprocessing(img) # Apply quantization and tessellation (splitting into tiles) for hardware processing preprocessed_frame = pipeline.quantize_tesselate(preprocessed_frame) # Run inference on the hardware accelerator prediction = pipeline.run_inference(preprocessed_frame=preprocessed_frame[0], fcounter=i) # Reverse tessellation and quantization to get final result prediction = pipeline.detesselate_dequantize(prediction) # Convert output to predicted class prediction = processing.postprocessing(prediction) # Count correct predictions accurate_predictions += int(prediction == label) i += 1 # Calculate and display accuracy metrics accuracy = (accurate_predictions/total_images) * 100 print("Correct predictions:", accurate_predictions) print("Accuracy:", accuracy, "%") # Clean up hardware resources pipeline.release() return accuracy As you can see the code is what you would expect by running any machine learning framework. It should also be useful as a template to run your own models and get the accuracy numbers. Example ------- .. button-link:: https://docs.sima.ai/pkg_downloads/SDK1.7.0/tools/get_accuracy.zip :color: primary :shadow: Download the Example **Steps** Download the Example #. Unzip to a local directory and move the unzipped folder ``get_accuracy`` under your ``workspace`` directory: .. code-block:: console sima-user@sima-user-machine:~$ cd ~/Downloads sima-user@sima-user-machine:~/Downloads$ unzip get_accuracy.zip sima-user@sima-user-machine:~/Downloads$ mv get_accuracy ~/workspace/ #. Go to the directory ``/home/docker/sima-cli/get_accuracy/`` within the SDK container. .. code-block:: console sima-user@docker-image-id:/home# cd /home/docker/sima-cli/get_accuracy sima-user@docker-image-id:/home/docker/sima-cli/get_accuracy# chown ../get_accuracy sima-user@docker-image-id:/home/docker/sima-cli/get_accuracy# sudo apt-get update sima-user@docker-image-id:/home/docker/sima-cli/get_accuracy# sudo apt-get install sshpass #. Run the application. Compile the model .. code-block:: console sima-user@docker-image-id:/home/docker/sima-cli/get_accuracy# python get_accuracy.py --model_file resnet50.pt --dv_host --run_pytorch_inference --compile_pytorch_model Inferencing on pytorch... 100%|███████████████████████████████████████████████████████████████████████████████████████████████████| 465/465 [00:38<00:00, 12.21it/s] Correct predictions: 378 Accuracy: 81.29032258064515 % Model resnet50.pt saved with input name conv1 ... Inferencing using model accelerator... 100%|███████████████████████████████████████████████████████████████████████████████████████████████████| 465/465 [00:34<00:00, 13.43it/s] Correct predictions: 379 Accuracy: 81.50537634408602 % Accuracy lost due to quantization: -0.21505376344086358 .. note:: Target Type: - ``gen1`` : It is a default option to compile for ``MLSoC`` target if you don't specify ``--target`` - ``gen2`` : ``--target gen2`` to compile for ``Modalix`` target Example: ``python get_accuracy.py --model_file resnet50.pt --dv_host --run_pytorch_inference --compile_pytorch_model --target gen2`` Inference the Model .. code-block:: console sima-user@docker-image-id:/home/docker/sima-cli/get_accuracy# python get_accuracy.py --model_file resnet50.pt --dv_host Sending model... Creating the Forwarding from host sima@192.168.135.30's password: Copying the model files to DevKit sima@192.168.135.30's password: Model sent and ready! Inferencing using model accelerator... 100%|███████████████████████████████████████████████████████████████████████████████████████████████████| 465/465 [00:36<00:00, 12.87it/s] Correct predictions: 379 Accuracy: 81.50537634408602 % .. toctree:: :maxdepth: 2 :caption: GetAccuracy