Get Model Accuracyο
The fastest way of getting accuracy numbers using your already compiled model is by using our Model Accelerator mode and following the steps below.
Architectureο
Overviewο
On the host machine, load the dataset and run preprocessing for the data.
Run the compiled model on our board, we will send the preprocessed data to the board using Ethernet or PCIe and then run the model. We must tesselate for our internal memory and because our MLA uses int8 to run the operation we must quantize. Then we perform the inverse operation and we send the prediction back to the host.
Run any postprocessing on the raw predictions so we can either display it or classify it. By doing this, you can just copy the pre and post processing blocks from your existing pipeline and check that you get the same results with minimal effort.
The primary goal of this accelerator mode is to debug your application, and not necessarily to get high FPS numbers. We are a System-on-Chip and not an accelerator. Therefore, we have not focused our efforts in maximizing the performance of this mode.
Before jumping into the SiMa-specific code let us work on a well known model, ResNet50, using a well known framework, PyTorch.
Prepare Datasetο
ResNet50 has been trained using ImageNet1000
. However, this dataset is no longer publicly available. Therefore, for ease-of-use of this tutorial we attached a subset (500 images) from the validation set in a pickled file.
To load the dataset we simply have to load it using pickle
.
def get_dataset(path):
with open(path, 'rb') as f:
dataset = pickle.load(f)
return dataset['data'], dataset['target']
PyTorch
includes a ready to go preprocessing Transform
with all the needed preprocessing for the images. Therefore, we will be using that.
We have added a class variable to set the output to numpy
because PyTorch
uses PyTorch.Tensor
and SiMa APIs use numpy.arrays
and we want to use the same Processing
class.
Also, our APIs expect NHWC
format, so we need to transpose the array.
class Processing():
"""
Handles image preprocessing and prediction postprocessing for ResNet50 model inference.
Supports both PyTorch tensor and numpy array formats.
"""
def __init__(self, numpy=False, resnet_type='IMAGENET1K_V2'):
"""
Initialize the processing pipeline with appropriate transforms.
Args:
numpy (bool): If True, outputs numpy arrays; if False, outputs PyTorch tensors
resnet_type (str): Version of ResNet weights to use for preprocessing transforms
"""
# Set up preprocessing transforms based on the ResNet model version
if resnet_type == 'IMAGENET1K_V2':
self.preprocessing_transforms = models.ResNet50_Weights.IMAGENET1K_V2.transforms()
elif resnet_type == 'IMAGENET1K_V1':
self.preprocessing_transforms = models.ResNet50_Weights.IMAGENET1K_V1.transforms()
self.numpy = numpy
def set_numpy_outputs(self):
"""Switch output format to numpy arrays (used for hardware acceleration)."""
self.numpy = True
def set_torch_outputs(self):
"""Switch output format to PyTorch tensors (used for standard PyTorch inference)."""
self.numpy = False
def preprocessing(self, img):
"""
Convert raw image data to format expected by ResNet50 model.
Args:
img: Raw image array in HWC format (Height, Width, Channels)
Returns:
Preprocessed image ready for model inference
"""
# Convert numpy array to PIL Image, transpose from HWC to WHC format, and resize to 224x224
img = Image.fromarray(img.transpose((1, 2, 0)), "RGB").resize((224, 224))
# Apply ResNet preprocessing transforms (normalization, etc.) and add batch dimension
preprocessed_img = torch.unsqueeze(self.preprocessing_transforms(img), dim=0)
# Convert to numpy format if requested (needed for hardware accelerator)
if self.numpy:
preprocessed_img = preprocessed_img.detach().numpy().transpose(0, 2, 3, 1)
return preprocessed_img
def postprocessing(self, prediction):
"""
Convert model output to predicted class index.
Args:
prediction: Raw model output (logits for each class)
Returns:
Predicted class index (integer)
"""
# Find the class with highest probability/logit value
if self.numpy:
prediction = np.argmax(prediction)
else:
prediction = torch.argmax(prediction)
return prediction
PyTorch Inferenceο
The inferencing code for PyTorch
is quite simple, we load the model, that will download automatically.
We set it to evaluation and start using our images and labels.
Then we simply run preprocessing, the inference of the model, postprocessing, and analysis of the accuracy. Finally, we save the model and we want to save the input node name because we will use it for our SiMa compilation.
def pytorch_example(dataset, processing, model_path, resnet_type):
"""
Run inference using standard PyTorch on CPU/GPU to establish baseline accuracy.
This serves as the reference implementation for comparison with hardware acceleration.
Args:
dataset: Tuple of (images, labels) for inference
processing: Processing object for pre/post processing
model_path: Path where to save the trained model
resnet_type: Type of ResNet weights to load
Returns:
tuple: (input_name, accuracy) - model input layer name and achieved accuracy
"""
# Load pre-trained ResNet50 model with specified weights
model = models.resnet50(weights=resnet_type)
model.eval() # Set to evaluation mode (disables dropout, batch norm training mode)
images, labels = dataset
total_images = len(images)
print("Inferencing on pytorch...")
accurate_predictions = 0
# Process each image and compare prediction with ground truth label
for img, label in tqdm(zip(images, labels), total=total_images):
# Convert raw image to model input format
preprocessed_img = processing.preprocessing(img)
# Run forward pass through the model
prediction = model(preprocessed_img)
# Convert model output to predicted class
prediction = processing.postprocessing(prediction)
# Count correct predictions
accurate_predictions += int(prediction == label)
# Calculate and display accuracy metrics
accuracy = (accurate_predictions/total_images) * 100
print("Correct predictions:", accurate_predictions)
print("Accuracy:", accuracy, "%")
# Save the model for later compilation to hardware format
torch.save(model, model_path)
# Get the input layer name (needed for hardware compilation)
name, _ = next(model.named_children())
input_name = name
print("Model", model_path, "saved with input name", input_name)
return input_name, accuracy
Compile Modelο
This step is required to convert the PyTorch model to a quantized model that can be run on the SiMa board. Refer to ModelSDK for more information.
def compile_pytorch_resnet50(dataset, processing, model_path, input_name, target):
"""
Compile PyTorch model to run on SiMa.ai hardware accelerator.
This involves quantization (reducing precision) and compilation to hardware-specific format.
Args:
dataset: Calibration dataset for quantization
processing: Processing object for data preparation
model_path: Path to the saved PyTorch model
input_name: Name of the model's input layer
target: Hardware generation ("gen1" or "gen2")
Returns:
str: Path to the compiled model folder
"""
# Import SiMa.ai specific compilation tools
from afe.apis.defines import default_quantization, gen1_target, gen2_target
from afe.apis.loaded_net import load_model
from afe.core.utils import convert_data_generator_to_iterable
from afe.load.importers.general_importer import pytorch_source
from sima_utils.data.data_generator import DataGenerator
# Select hardware target platform (different generations have different capabilities)
assert target in ("gen1", "gen2")
target = gen1_target if target == "gen1" else gen2_target
print(f"Hardware target platform: {target}")
# Define input shape: batch_size=1, channels=3 (RGB), height=224, width=224
input_shape = (1, 3, 224, 224)
# Set up model importer with PyTorch source and input specifications
importer_params = pytorch_source(model_path, input_names=[input_name], input_shapes=[input_shape])
# Load the model for hardware compilation
loaded_net = load_model(importer_params, target=target)
images, _ = dataset
n_calib_samples = len(images)
# Switch to numpy output format (required for hardware compilation)
processing.set_numpy_outputs()
# Prepare calibration samples for quantization (converting from float32 to lower precision)
samples = np.empty((n_calib_samples, 224, 224, 3), dtype=np.float32)
for i in range(n_calib_samples):
preprocessed_img = processing.preprocessing(images[i])
samples[i] = preprocessed_img #.transpose(0, 2, 3, 1)
# Create data generator for calibration process
input_generator = DataGenerator({input_name: samples})
calibration_data = convert_data_generator_to_iterable(input_generator)
# Quantize the model using calibration samples (reduces precision for faster hardware execution)
model_sdk_net = loaded_net.quantize(calibration_data,
default_quantization,
model_name=model_path,
arm_only=False)
# Create output directory and save/compile the quantized model
compiled_folder = "compiled_model/"
os.makedirs(compiled_folder, exist_ok=True)
model_sdk_net.save(model_name=model_path, output_directory=compiled_folder)
model_sdk_net.compile(output_path=compiled_folder, log_level=logging.INFO)
# Extract the compiled model from tar.gz archive
import tarfile
model = tarfile.open(compiled_folder + model_path + "_mpk.tar.gz")
model.extractall(compiled_folder)
model.close()
print("Compilation done")
return compiled_folder
SiMa Inferenceο
Follow the steps below to run the model on our board; this is specific to SiMaβs Palette software. Steps
Send the
.elf
file to the board.def send_model(args): """ Transfer the compiled model file to the remote hardware device via SSH/SCP. Sets up SSH tunnel if needed and copies model files to the target device. Args: args: Command line arguments containing connection details """ print("Sending model...") password = '' # Password for SSH connection (empty means key-based auth) max_attempts = 10 # Maximum retry attempts for connection # Create SSH tunnel unless bypassed (tunnel allows secure connection to remote device) if not args.bypass_tunnel: ssh_connection, local_port = create_forward_tunnel(args, password, max_attempts) if ssh_connection is None: logging.debug(f'Failed to forward local port after {max_attempts}') sys.exit(-1) # Use the tunneled local port for subsequent connections args.dv_port = local_port # Copy the compiled model file (.elf or .tar.gz) to the remote device scp_file = scp_files_to_remote(args, args.model_file_path, password, "/home/sima", max_attempts) if scp_file is None: logging.error(f'Failed to scp the model file after {max_attempts}') sys.exit(-1)
Set up your
Pipeline
. As described above, the Accelerator runs the pre and post processing in the host and then the model on the board.However, the data before getting into the quantized model has to go through
tesselation
andquantization
and before running the postprocessing it has to go throughdetesselation
anddequantization
. Also, we must specify the destination of the board in the network since we will be using an Ethernet connection.We have the parameters for these operations in the
.json
file and thePipeline
sets the parameter for all these operations.Then we set the processing to
numpy
outputs and we start our inference loop. The main difference lies in callingpipeline.quantize_tesselate
andpipeline.detesselate_dequantize
for the previously mentioned reasons.def model_accelerator_example(dataset, processing, args): """ Run inference using SiMa.ai hardware accelerator and measure accuracy. This tests the quantized model performance on actual hardware. Args: dataset: Tuple of (images, labels) for inference processing: Processing object for pre/post processing args: Command line arguments with hardware connection details Returns: float: Accuracy achieved on hardware accelerator """ # Send compiled model to remote hardware device send_model(args) # Initialize hardware pipeline for inference pipeline = Pipeline(args.model_file_path, args.mpk_json_path, devkit_ip=args.dv_host, local_port=args.dv_port, mlsoc_lm_folder="/home/sima") print("Model sent and ready!") # Switch to numpy format for hardware compatibility processing.set_numpy_outputs() images, labels = dataset total_images = len(images) accurate_predictions, i = 0, 0 print("Inferencing using model accelerator...") # Process each image through the hardware accelerator for img, label in tqdm(zip(images, labels), total=total_images): # Skip any corrupted/missing images if img is None: continue # Convert image to model input format preprocessed_frame = processing.preprocessing(img) # Apply quantization and tessellation (splitting into tiles) for hardware processing preprocessed_frame = pipeline.quantize_tesselate(preprocessed_frame) # Run inference on the hardware accelerator prediction = pipeline.run_inference(preprocessed_frame=preprocessed_frame[0], fcounter=i) # Reverse tessellation and quantization to get final result prediction = pipeline.detesselate_dequantize(prediction) # Convert output to predicted class prediction = processing.postprocessing(prediction) # Count correct predictions accurate_predictions += int(prediction == label) i += 1 # Calculate and display accuracy metrics accuracy = (accurate_predictions/total_images) * 100 print("Correct predictions:", accurate_predictions) print("Accuracy:", accuracy, "%") # Clean up hardware resources pipeline.release() return accuracy
As you can see the code is what you would expect by running any machine learning framework. It should also be useful as a template to run your own models and get the accuracy numbers.
Exampleο
Steps
Download the Example
Unzip to a local directory and move the unzipped folder
get_accuracy
under yourworkspace
directory:sima-user@sima-user-machine:~$ cd ~/Downloads sima-user@sima-user-machine:~/Downloads$ unzip get_accuracy.zip sima-user@sima-user-machine:~/Downloads$ mv get_accuracy ~/workspace/
Go to the directory
/home/docker/sima-cli/get_accuracy/
within the SDK container.sima-user@docker-image-id:/home# cd /home/docker/sima-cli/get_accuracy sima-user@docker-image-id:/home/docker/sima-cli/get_accuracy# chown <YOUR_USERNAME> ../get_accuracy sima-user@docker-image-id:/home/docker/sima-cli/get_accuracy# sudo apt-get update sima-user@docker-image-id:/home/docker/sima-cli/get_accuracy# sudo apt-get install sshpass
Run the application.
Compile the model
sima-user@docker-image-id:/home/docker/sima-cli/get_accuracy# python get_accuracy.py --model_file resnet50.pt --dv_host <BOARD_IP_ADDRESS> --run_pytorch_inference --compile_pytorch_model Inferencing on pytorch... 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 465/465 [00:38<00:00, 12.21it/s] Correct predictions: 378 Accuracy: 81.29032258064515 % Model resnet50.pt saved with input name conv1 ... Inferencing using model accelerator... 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 465/465 [00:34<00:00, 13.43it/s] Correct predictions: 379 Accuracy: 81.50537634408602 % Accuracy lost due to quantization: -0.21505376344086358
Note
Target Type:
gen1
: It is a default option to compile forMLSoC
target if you donβt specify--target
gen2
:--target gen2
to compile forModalix
target
Example:
python get_accuracy.py --model_file resnet50.pt --dv_host <BOARD_IP_ADDRESS> --run_pytorch_inference --compile_pytorch_model --target gen2
Inference the Model
sima-user@docker-image-id:/home/docker/sima-cli/get_accuracy# python get_accuracy.py --model_file resnet50.pt --dv_host <BOARD_IP_ADDRESS> Sending model... Creating the Forwarding from host sima@192.168.135.30's password: Copying the model files to DevKit sima@192.168.135.30's password: Model sent and ready! Inferencing using model accelerator... 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 465/465 [00:36<00:00, 12.87it/s] Correct predictions: 379 Accuracy: 81.50537634408602 %