Application Pipeline

We will start by breaking down the key components of a Machine Learning (ML) application and defining the tasks used throughout this guide. Each stage of the application maps to different hardware functional blocks within the MLSoC, and requires multiple steps to set up the app. A solid understanding of the architecture is essential for efficient development, let us outline a basic pipeline architecture.

A Typical ML Application

If we were to look at a very simple classification ML application, it would look something like this:

../../../_images/generic_application.jpg

Input Source: Any input into the application, could be an RTSP stream, a video file, image files, audio stream, and so on.
ML Inference: Running the ML model to classify the input in some way.
Business Logic: Logic in response to the ML inference such as saving/displaying information, PLC communication, network storage, and so on.

ML Preprocessing and Postprocessing

In reality, ML models require very specific preprocessing and postprocessing steps. If we break this generic case down a bit more, we get the following:

../../../_images/generic_application_ml_pre_and_post.jpg

ML Preprocess: Is the specific preprocessing required for the ML model being deployed (resizing, normalization, etc.).
ML Inference: Is the actual neural network model inference processing.
ML Postprocess: Is the post processing required for the output of the ML Inference step (softmax, argmax, box-decoding, etc.).

ResNet50 Classification Application

Let’s take ResNet50 as an example to showcase the workflow. This simple classification application will take an image as input and output the detected object type along with a confidence score.

../../../_images/resnet50_application_with_images.jpg

This application pipeline steps will be as follows:

Image(s) Source: Will read all images from a local directory.
ResNet50 Preprocessing: Will resize the image to 224x224 (height x width), scale and normalize the image values to be between 0.0 -> 1.0.
ResNet50 Inference: Will run the resized and scaled image through ResNet50 neural network inference.
Resnet50 Postprocessing: Will perform argmax on all the output values and pick the highest confidence category.
Business Logic: Will print the category, and the confidence score for each result received.

Mapping the ResNet50 Application to the MLSoC

Now that we understand the end-to-end application and it’s components, we can review how to map the different application stages to the MLSoC hardware function blocks.

../../../_images/resnet50_application_hw.jpg

Image(s) Source: Will map to the Arm A65 will read images from storage.
ResNet50 Preprocessing: Will use one of SiMa.ai’s CVU Graph Library (CVU Graphs) to perform resize, scaling and normalization in one go on the EV74 CVU.
ResNet50 Inference: Will run the resized and scaled image through the MLA for performant and efficient inference.
Resnet50 Postprocessing: Will use one of SiMa.ai’s CVU Graph Library (CVU Graphs) to perform detesselation and dequantization (more on this later) on the EV74 CVU.
Business Logic: Will print the output on Arm A65 using regular std::cout.

Tesselation and Quantization

Tessellation refers to breaking an image into smaller tiles or patches to improve processing efficiency. Think of it like cutting a large puzzle into smaller pieces so each part can be analyzed separately. On the MLA, tessellation specifically refers to converting input and output feature maps into the required memory layout for efficient computation.

Quantization reduces numerical precision (e.g., from 32-bit floating point to 8-bit integer) to improve computation speed and reduce memory usage while minimizing accuracy loss. The MLA processes data in integer format, so inputs and outputs must go through quantization (floating point to integer for MLA input) or dequantization (integer to floating point for MLA output) to ensure compatibility with the hardware.

The MLA on the MLSoC has specific preprocessing and postprocessing requirements, including quantization and tessellation. SiMa.ai’s EV74 CVU graphs are designed to handle these operations efficiently, simplifying development. Understanding these processes is essential for debugging and optimizing end-to-end applications.

../../../_images/generic_application_ml_tessquant.jpg

Now that we have a clear understanding of the application we want to run on the SiMa MLSoC, we are now ready to take steps to try some demos and then build an end-to-end application.