Get Model Accuracy
The fastest way of getting accuracy numbers using your already compiled model is by using our Model Accelerator mode and following the steps below.
Steps
On the host machine, load the dataset and run preprocessing for the data.
Run the compiled model on our board, we will send the preprocessed data to the board using Ethernet or PCIe and then run the model. We must tesselate for our internal memory and because our MLA uses int8 to run the operation we must quantize. Then we perform the inverse operation and we send the prediction back to the host.
Run any postprocessing on the raw predictions so we can either display it or classify it. By doing this, you can just copy the pre and post processing blocks from your existing pipeline and check that you get the same results with minimal effort.
The primary goal of this accelerator mode is to debug your application, and not necessarily to get high FPS numbers. We are a System-on-Chip and not an accelerator. Therefore, we have not focused our efforts in maximizing the performance of this mode.
Before jumping into the SiMa-specific code let us work on a well known model, ResNet50, using a well known framework, PyTorch.
PyTorch ResNet50-1.5v
Preprocessing and Dataset
ResNet50 has been trained using ImageNet1000
. However, this dataset is no longer publicly available. Therefore, for ease-of-use of this tutorial we attached a subset (500 images) from the validation set in a pickled file.
To load the dataset we simply have to load it using pickle
.
def get_dataset(path):
with open(path, 'rb') as f:
dataset = pickle.load(f)
return dataset['data'], dataset['target']
PyTorch
includes a ready to go preprocessing Transform
with all the needed preprocessing for the images. Therefore, we will be using that.
We have added a class variable to set the output to numpy
because PyTorch
uses PyTorch.Tensor
and SiMa APIs use numpy.arrays
and we want to use the same Processing
class.
Also, our APIs expect NHWC
format, so we need to transpose the array.
class Processing():
def __init__(self, numpy=False, resnet_type='IMAGENET1K_V2'):
if resnet_type == 'IMAGENET1K_V2':
self.preprocessing_transforms = models.ResNet50_Weights.IMAGENET1K_V2.transforms()
elif resnet_type == 'IMAGENET1K_V1':
self.preprocessing_transforms = models.ResNet50_Weights.IMAGENET1K_V1.transforms()
self.numpy = numpy
def set_numpy_outputs(self):
self.numpy = True
def set_torch_outputs(self):
self.numpy = False
def preprocessing(self, img):
img = Image.fromarray(img.transpose((1, 2, 0)), "RGB").resize((224, 224))
preprocessed_img = torch.unsqueeze(self.preprocessing_transforms(img), dim=0)
if self.numpy:
preprocessed_img = preprocessed_img.detach().numpy().transpose(0, 2, 3, 1)
return preprocessed_img
def postprocessing(self, prediction):
if self.numpy:
prediction = np.argmax(prediction)
else:
prediction = torch.argmax(prediction)
return prediction
PyTorch Inferencing
The inferencing code for PyTorch
is quite simple, we load the model, that will download automatically.
We set it to evaluation and start using our images and labels.
Then we simply run preprocessing, the inference of the model, postprocessing, and analysis of the accuracy. Finally, we save the model and we want to save the input node name because we will use it for our SiMa compilation.
def pytorch_example(dataset, processing, model_path, resnet_type):
# Load the model from pytorch
model = models.resnet50(weights=resnet_type)
model.eval()
images, labels = dataset
total_images = len(images)
print("Inferencing on pytorch...")
accurate_predictions = 0
for img, label in tqdm(zip(images, labels), total=total_images):
preprocessed_img = processing.preprocessing(img)
prediction = model(preprocessed_img)
prediction = processing.postprocessing(prediction)
accurate_predictions += int(prediction == label)
accuracy = (accurate_predictions/total_images) * 100
print("Correct predictions:", accurate_predictions)
print("Accuracy:", accuracy, "%")
torch.save(model, model_path)
name, _ = next(model.named_children())
input_name = name
print("Model", model_path, "saved with input name", input_name)
return input_name, accuracy
Compiling ResNet50-1.5v
We assume you are familiar with this simple compilation process. Learn more about this compilation process from the ModelSDK topic in this document.
def compile_pytorch_resnet50(dataset, processing, model_path, input_name):
from afe.apis.defines import default_quantization
from afe.apis.loaded_net import load_model
from afe.core.utils import convert_data_generator_to_iterable
from afe.load.importers.general_importer import pytorch_source
from sima_utils.data.data_generator import DataGenerator
# Input shape in NCHW format (N = 1).
input_shape = (1, 3, 224, 224)
importer_params = pytorch_source(model_path, input_names=[input_name], input_shapes=[input_shape])
loaded_net = load_model(importer_params)
images, _ = dataset
n_calib_samples = len(images)
processing.set_numpy_outputs()
samples = np.empty((n_calib_samples, 224, 224, 3), dtype=np.float32)
for i in range(n_calib_samples):
preprocessed_img = processing.preprocessing(images[i])
samples[i] = preprocessed_img #.transpose(0, 2, 3, 1)
input_generator = DataGenerator({input_name: samples})
calibration_data = convert_data_generator_to_iterable(input_generator)
# Quantize the model using 35 samples from the calibration dataset.
model_sdk_net = loaded_net.quantize(calibration_data,
default_quantization,
model_name=model_path,
arm_only=False)
compiled_folder = "compiled_model/"
os.makedirs(compiled_folder, exist_ok=True)
model_sdk_net.save(model_name=model_path, output_directory=compiled_folder)
model_sdk_net.compile(output_path=compiled_folder, log_level=logging.INFO)
import tarfile
model = tarfile.open(compiled_folder + model_path + "_mpk.tar.gz")
model.extractall(compiled_folder)
model.close()
return compiled_folder
Model Accelerator ResNet50-1.5v
Follow the steps below to run the model on our board; this is specific to SiMa’s Palette software. Steps
Send the
.lm
file to the board.def send_model(args): print("Sending model...") password = '' max_attempts = 10 if not args.bypass_tunnel: ssh_connection, local_port = create_forward_tunnel(args, password, max_attempts) if ssh_connection is None: logging.debug(f'Failed to forward local port after {max_attempts}') sys.exit(-1) # we start to work with the local_port from now on args.dv_port = local_port # Copy the .lm or .tar.gz model file to the board. scp_file = scp_files_to_remote(args, args.model_file_path, password, "/home/sima", max_attempts) if scp_file is None: logging.error(f'Failed to scp the model file after {max_attempts}') sys.exit(-1)
Set up your
Pipeline
. As described above, the Accelerator runs the pre and post processing in the host and then the model on the board.However, the data before getting into the quantized model has to go through
tesselation
andquantization
and before running the postprocessing it has to go throughdetesselation
anddequantization
. Also, we must specify the destination of the board in the network since we will be using an Ethernet connection.We have the parameters for these operations in the
.json
file and thePipeline
sets the parameter for all these operations.Then we set the processing to
numpy
outputs and we start our inference loop. The main difference lies in callingpipeline.quantize_tesselate
andpipeline.detesselate_dequantize
for the previously mentioned reasons.def model_accelerator_example(dataset, processing, args): send_model(args) pipeline = Pipeline(args.model_file_path, args.mpk_json_path, devkit_ip=args.dv_host, local_port=args.dv_port, mlsoc_lm_folder="/home/sima") print("Model sent and ready!") processing.set_numpy_outputs() images, labels = dataset total_images = len(images) accurate_predictions, i = 0, 0 print("Inferencing using model accelerator...") for img, label in tqdm(zip(images, labels), total=total_images): # Preprocess the frame preprocessed_frame = processing.preprocessing(img) preprocessed_frame = pipeline.quantize_tesselate(preprocessed_frame) # Run the inference on the preprocessed frame - returns output feature map as bytes (ofm_bytes) prediction = pipeline.run_inference(preprocessed_frame=preprocessed_frame, fcounter=i) prediction = pipeline.detesselate_dequantize(prediction) prediction = processing.postprocessing(prediction) accurate_predictions += int(prediction == label) i += 1 accuracy = (accurate_predictions/total_images) * 100 print("Correct predictions:", accurate_predictions) print("Accuracy:", accuracy, "%") return accuracy
As you can see the code is what you would expect by running any machine learning framework. It should also be useful as a template to run your own models and get the accuracy numbers.
Running the Example
Steps
Move or copy the downloaded code to the shared directory between the Palette SDK and your host system.
sima-user@sima-user-machine:~$ cd ~/Downloads sima-user@sima-user-machine:~/Downloads$ unzip get_accuracy.zip sima-user@sima-user-machine:~/Downloads$ mv get_accuracy ~/workspace/
Start your Palette software container.
Move to the
get_accuracy
folder and install the necessary packages.sima-user@docker-image-id:/home# cd /home/docker/sima-cli/get_accuracy sima-user@docker-image-id:/home/docker/sima-cli/get_accuracy# chown <YOUR_USERNAME> ../get_accuracy sima-user@docker-image-id:/home/docker/sima-cli/get_accuracy# sudo apt-get update sima-user@docker-image-id:/home/docker/sima-cli/get_accuracy# sudo apt-get install sshpass sima-user@docker-image-id:/home/docker/sima-cli/get_accuracy# python3 -m venv venv_accuracy --system-site-packages sima-user@docker-image-id:/home/docker/sima-cli/get_accuracy# source venv_accuracy/bin/activate sima-user@docker-image-id:/home/docker/sima-cli/get_accuracy# pip install rpyc
Run the application.
sima-user@docker-image-id:/home/docker/sima-cli/get_accuracy# python get_accuracy.py --model_file resnet50.pt --dv_host <BOARD_IP_ADDRESS> --run_pytorch_inference --compile_pytorch_model Inferencing on pytorch... 100%|█████████████████████████████████████████████████████████████████████████████████████████████| 465/465 [00:16<00:00, 28.43it/s] Correct predictions: 378 Accuracy: 81.29032258064515 % Model resnet50.pt saved with input name conv1 Running calibration ...DONE 2024-01-05 10:54:18,948 - afe.ir.quantization_utils - WARNING - Quantized bias was clipped, resulting in precision loss. Model may need retraining. Running quantization ...DONE ... 2024-01-05 10:55:06,842 - mlc.test_util.test_context - INFO - Code generation done Compilation done Sending model... Creating the Forwarding from host sima@10.42.0.241's password: Copying the model files to DevKit sima@10.42.0.241's password: Model sent and ready! Inferencing using model accelerator... 100%|█████████████████████████████████████████████████████████████████████████████████████████████| 465/465 [00:15<00:00, 29.54it/s] Correct predictions: 373 Accuracy: 80.21505376344086 % Accuracy lost due to quantization: 1.0752688172042895
Note
There is a known issue where the card might not start inferencing, it will look like the following:
...
Inferencing using model accelerator...
0%| | 1/465 [00:00<00:53, 8.75it/s]
0%| | 1/465 [00:13<1:44:11, 13.47s/it]
You can simply reset the board and run the Python script as follows:
sima-user@docker-image-id:/home/docker/sima-cli/get_accuracy# python get_accuracy.py --model_file resnet50.pt --dv_host <BOARD_IP_ADDRESS>
Sending model...
Creating the Forwarding from host
sima@10.42.0.240's password:
Copying the model files to DevKit
sima@10.42.0.240's password:
Model sent and ready!
Inferencing using model accelerator...
100%|█████████████████████████████████████████| 465/465 [00:12<00:00, 37.08it/s]
Correct predictions: 371
Accuracy: 79.78494623655914 %