PePPi Prototyping

Introduction

PePPi is a SiMa.ai solution for helping developers to get started with creating pipelines using the Python framework. The PePPi platform uses Python to run pipelines entirely on the board. This provides a high level of flexibility since our users can install any Python external library and combine them with our optimized sima kernels for pre and post processing operations while the model runs entirely on our Accelerated hardware. All the external libraries like matplotlib will target our ARM A65. The different kernels will run sequentially. These two factors lead to performance limitations in the FPS count.

Running your First Example

Requirements

Have a board available to run the PePPi examples.
Your board must be using the latest software from SiMa.ai. To upgrade board software, see Firmware and Board Software Update.
Your Palette SDK must be using the latest version, check Software Installation.
Your board must be connected with your machine over ethernet.

Note

This example assumes an Ubuntu host. However, the steps should be easy to follow in any other OS.

ResNet-50 Example

The first example that we will be running is a classic. This example should give you enough details to be able to run your own models.

Download the Example

Copy the files to the shared folder between the Palette CLI SDK and your host machine, for the commands we will be using the default folder called workspace, if you changed the name of your shared folder replace it:

sima-user@sima-user-machine:~$
sima-user@sima-user-machine:~$ cd ~/Downloads/
sima-user@sima-user-machine:~/Downloads$ unzip peppi_examples.zip
sima-user@sima-user-machine:~/Downloads$ mv peppi_examples ~/workspace/

Start your Palette CLI SDK by running the start.py command. Check out Running/Launching Software if you don’t remember how.

Let us go to the folder and assign the right permissions.

sima-user@docker-image-id:/home# cd /home/docker/sima-cli/peppi_examples
sima-user@docker-image-id:/home/docker/sima-cli/peppi_examples# sudo chown <YOUR_USERNAME> ../peppi_examples

Now we will proceed to compile a ResNet-50 model using ModelSDK. If you have already performed this step just copy the file to this folder and jump to the next step. Make sure that the compile model is in the base folder.

sima-user@docker-image-id:/home/docker/sima-cli/peppi_examples# python compile_resnet50.py
Compiling model resnet50_v1_opt.onnx with arm_only=False
Running calibration ...DONE
Running quantization ...DONE
...
2024-01-08 08:44:02,506 - mlc.test_util.test_context - INFO - Code generation done
2024-01-08 08:44:14,197 - te_compiler - INFO - Using injective.arm_cpu for cast based on highest priority (10)

Create a .tar.gz file with the required files:

sima-user@docker-image-id:/home/docker/sima-cli/peppi_examples# mv output/resnet50_v1_opt_mpk.tar.gz .
sima-user@docker-image-id:/home/docker/sima-cli/peppi_examples# tar -cvzf test_resnet.tar.gz resnet50_v1_opt_mpk.tar.gz test_resnet50_v1.py imagenet_img imagenet1000_clsidx_to_labels.txt
output/resnet50_v1_opt_mpk.tar.gz
test_resnet50_v1.py
imagenet_img/
imagenet_img/img_49.jpg
imagenet_img/img_14.jpg
...
imagenet_img/img_112.jpg
imagenet1000_clsidx_to_labels.txt

Running the PePPi pipeline

remote-runner

The remote-runner is a tool that allows you to run PePPi pipelines directly from the host to reduce going back-and-forth from your host where you do your development and your board. To use this tool, let us create a virtual environment and install the requirements for Python within the Palette CLI: You will have to introduce the password for the ssh sima user edgeai three times if you have not set up a passwordless connection. It will take a few minutes to run this step.

sima-user@docker-image-id:/home/docker/sima-cli/peppi_examples# python3 -m venv test_venv --system-site-packages
sima-user@docker-image-id:/home/docker/sima-cli/peppi_examples# source test_venv/bin/activate
sima-user@docker-image-id:/home/docker/sima-cli/peppi_examples# pip3 install -r requirements.txt
sima-user@docker-image-id:/home/docker/sima-cli/peppi_examples# python remote_runner.py --dv_host {devkit IP Address} --model_file_path test_resnet.tar.gz --model_command "test_resnet50_v1.py --image_width 1280 --image_height 720 --max_frames 10 --model_tgz resnet50_v1_opt_mpk.tar.gz" --run_time 60
    Creating the Forwarding from host
    WARNING: Import named "sima" not found locally. Trying to resolve it at the PyPI server.
    WARNING: Import named "sima" was resolved to "sima:1.3.2" package (https://pypi.org/project/sima/).
    Please, verify manually the final list of requirements.txt to avoid possible dependency confusions.
    INFO: Successfully saved requirements file in sima_temp/requirements.txt
    Copying the model files to DevKit
    Copying the model files to DevKit
    Successfully created virtual env
    returncode: 255
    stdout: Running frame imagenet_img/img_66.jpg
    Image imagenet_img/img_66.jpg is  water snake,

    Running frame imagenet_img/img_267.jpg
    Image imagenet_img/img_267.jpg is  collie,

    Running frame imagenet_img/img_203.jpg
    Image imagenet_img/img_203.jpg is  crane,

    Running frame imagenet_img/img_364.jpg
    Image imagenet_img/img_364.jpg is  cocker spaniel, English cocker spaniel, cocker,

    Running frame imagenet_img/img_162.jpg
    Image imagenet_img/img_162.jpg is  Christmas stocking,

    Running frame imagenet_img/img_217.jpg
    Image imagenet_img/img_217.jpg is  mosquito net,

    Running frame imagenet_img/img_439.jpg
    Image imagenet_img/img_439.jpg is  bolo tie, bolo, bola tie, bola,

    Running frame imagenet_img/img_288.jpg
    Image imagenet_img/img_288.jpg is  lighter, light, igniter, ignitor,

    Running frame imagenet_img/img_370.jpg
    Image imagenet_img/img_370.jpg is  giant panda, panda, panda bear, coon bear, Ailuropoda melanoleuca,

    Running frame imagenet_img/img_28.jpg
    Image imagenet_img/img_28.jpg is  seashore, coast, seacoast, sea-coast,

    mla_free_handle: Run Frames:10
    Total Elapsed:: time 2778.035ms

Running the PePPi pipeline on the Board

If you prefer to run your PePPi pipelines directly on the board, then let’s copy the required files to our board to run the example:
sima-user@docker-image-id:/home/docker/sima-cli/peppi_examples# scp test_resnet.tar.gz sima@{board_ip_address}:/home/sima/

Let’s log in into the board and run the example, for this step we will need to log in as root.

Note

Remember that the password for the user root is commitanddeliver by default.

sima-user@docker-image-id:/home/docker/sima-cli/peppi_examples# ssh root@{board_ip_address}
The authenticity of host '192.168.1.20 (192.168.1.20)' can't be established.
ECDSA key fingerprint is SHA256:GE2O0nsfBw8tpPUANpa7JyOXmMAhQPuKht+N6Nu6HDI.
Are you sure you want to continue connecting (yes/no/[fingerprint])? yes
Warning: Permanently added '192.168.1.20' (ECDSA) to the list of known hosts.
root@192.168.1.20's password:
Last login: Mon Nov 27 14:49:45 2023

root@davinci:~# cd /home/sima
root@davinci:/home/sima# tar -xvzf test_resnet.tar.gz
root@davinci:/home/sima# export PYTHONPATH='/home/sima/accelerator_pipeline:/usr/lib/python3.10/site-packages/spy:/usr/lib/python3.10/site-packages/partitioned_model_manager:/usr/lib/python3.10/site-packages/mla_rt_service:/usr/lib/python3.10/site-packages/sima'
root@davinci:/home/sima# python3 test_resnet50_v1.py --image_width 1280 --image_height 720 --max_frames 10 --model_tgz resnet50_v1_opt_mpk.tar.gz
    Running frame imagenet_img/img_66.jpg
    [ 4085.757863] simaai-memory simaai,memory-manager: Allocate memory block id: 0(
    0x0) size: 151552 p_addr: 0xa0640000
    [ 4085.772727] simaai-memory simaai,memory-manager: Allocate memory block id: 1(
    0x1) size: 602112 p_addr: 0xa0700000
    [ 4085.861441] simaai-memory simaai,memory-manager: Allocate memory block id: 2(
    0x2) size: 151552 p_addr: 0xa0680000
    [ 4085.874176] simaai-memory simaai,memory-manager: Allocate memory block id: 3(
    0x3) size: 151552 p_addr: 0xa06c0000
    [ 4087.660859] simaai-memory simaai,memory-manager: Allocate memory block id: 4(
    0x4) size: 4096 p_addr: 0xc1c00000
    [ 4087.672727] simaai-memory simaai,memory-manager: Allocate memory block id: 5(
    0x5) size: 4096 p_addr: 0xc1c01000
    Image imagenet_img/img_66.jpg is  water snake,

    Running frame imagenet_img/img_267.jpg
    Image imagenet_img/img_267.jpg is  collie,

    Running frame imagenet_img/img_203.jpg
    Image imagenet_img/img_203.jpg is  crane,

    Running frame imagenet_img/img_364.jpg
    Image imagenet_img/img_364.jpg is  cocker spaniel, English cocker spaniel, cocke
    r,

    Running frame imagenet_img/img_162.jpg
    Image imagenet_img/img_162.jpg is  Christmas stocking,

    Running frame imagenet_img/img_217.jpg
    Image imagenet_img/img_217.jpg is  mosquito net,

    Running frame imagenet_img/img_439.jpg
    Image imagenet_img/img_439.jpg is  bolo tie, bolo, bola tie, bola,

    Running frame imagenet_img/img_288.jpg
    Image imagenet_img/img_288.jpg is  lighter, light, igniter, ignitor,

    Running frame imagenet_img/img_370.jpg
    Image imagenet_img/img_370.jpg is  giant panda, panda, panda bear, coon bear, Ai
    luropoda melanoleuca,

    Running frame imagenet_img/img_28.jpg
    Image imagenet_img/img_28.jpg is  seashore, coast, seacoast, sea-coast,

    Releasing....
    mla_free_handle: Run Frames:10
    Total Elapsed:: time 2791.703ms
    [ 4088.767440] simaai-memory simaai,memory-manager: Delete memory block id: 3(0x
    1) size: 151552 p_addr: 0xa06c0000
    [ 4088.777793] simaai-memory simaai,memory-manager: Delete memory block id: 5(0x
    1) size: 4096 p_addr: 0xc1c01000
    [ 4088.788049] simaai-memory simaai,memory-manager: Delete memory block id: 1(0x
    1) size: 602112 p_addr: 0xa0700000
    [ 4088.822707] simaai-memory simaai,memory-manager: Delete memory block id: 0(0x
    1) size: 151552 p_addr: 0xa0640000
    [ 4088.833149] simaai-memory simaai,memory-manager: Delete memory block id: 2(0x
    1) size: 151552 p_addr: 0xa0680000
    [ 4088.843510] simaai-memory simaai,memory-manager: Delete memory block id: 4(0x
    1) size: 4096 p_addr: 0xc1c00000

There is currently a known issue where running directly on the board will make the script run on the .release() call. Simply kill the pipeline using Ctrl+C.

You have succesfully run your first PePPi pipeline!

Analysis of the PePPi resnet50 File

In this section we will walk together through the test_resnet50_v1.py code. Let us start directly on the main which is contained in the run_test(args) function.

def run_test(args):
    # Create an MLSoCRT object.
    sess = sima.MLSoCRT(args.model_tgz, args.image_height, args.image_width)

The first SiMa-specific command is sima.MLSoCRT. This command creates a session for our model accessing its tar.gz file. Checkout PePPi APIs for more information.

# Load all image files.
if not os.path.exists(args.images_path):
    raise Exception(f"[ ERROR ] Path to images {args.images_path} is invalid")

images_list = get_all_files(args.images_path, include_type="image")
if (len(images_list) <= 0):
    raise Exception(f"[ ERROR ] No image files found in {args.images_path}")

# Load the label dictionary to be used for prediction.
id_label_dict = get_id_label_dict(os.path.join(args.images_labels_path, CLSIDX_TO_LABELS_PATH))
if not id_label_dict:
    raise Exception(f"[ ERROR ] Dictionary mapping class indices to labels is empty")

# Determine the total number of frames to process.
if args.max_frames < len(images_list):
    num_frames = args.max_frames
else:
    num_frames = len(images_list)

In this section we are using checks to make sure that the pipeline will not break.

for i in range (0, num_frames):
    print("Running frame {}".format(images_list[i]))

    # Load an image using cv2.imread(). The image has 3 dimensions and
    # is in BGR format.
    bgr_image = cv2.imread(images_list[i])

We read the images from the disk using the cv2.imread function.

preprocessed_image = image_preprocess(frame=bgr_image,
                                        norm_params = [[255.0, 0.485, 0.229],
                                                    [255.0, 0.456, 0.224],
                                                    [255.0, 0.406, 0.225]])

We perform some preprocessing steps, for this specific model we just need a resize operation as the image_preprocess illustrates here: Note that all the SiMa.ai functions will guarantee the maximum possible performance. Therefore, using sima.normalize will get you a higher FPS than using cv2.normalize().

def image_preprocess(frame: np.ndarray, norm_params: List[Tuple[float, float, float]]) -> np.ndarray:
    """
    This function pre-processes an input frame that is in BGR
    format. Specifically, it normalizes the input
    frame. A normalized 3D output frame, which is
    in BGR format and has an HWC layout, is returned.
    """
    # Scale the input frame.

    # Normalize the scaled input frame.
    tensor_normalized = sima.normalize(frame,
                                    channel_params=norm_params)

    return tensor_normalized

After the preprocessing step, we will run the created session by sima.MLSoCRT using sess.run_session. Checkout PePPi APIs for more information.

# Run inference on the 3D BGR frame and collect the single output
# tensor produced by ResNet50.
output_tensors = sess.run_session([preprocessed_image])

We assume that the model could have multiple inputs, therefore we use a list structure. Since this model has only a single output, we will use this one:

output_tensor = output_tensors[0]

# The output is a single 1-based value represented as a 4D tensor.
# Extract this value and determine the corresponding prediction label.
prediction = output_tensor[0][0][0][0] - 1

We extract our prediction as mentioned in the code comments and classify it.

prediction_label = classify(id_label_dict, prediction)

The classify function:

def classify(id_label_dict: Dict, prediction: int) -> str:
    """
    This function extracts the class label from the given prediction.
    """
    if prediction >= len(id_label_dict.keys()):

        raise Exception(f"[ ERROR ] {prediction} must be less than {len(id_label_dict.keys())}")
    return id_label_dict[prediction]

Simply checks that the prediction is within bounds and extracts the correct ID. Finally we check that the prediction is not None and log it on the terminal.

    if not prediction_label:
        raise Exception(f"[ ERROR ] Failed to get prediction label.")

    print(f"Image {images_list[i]} is {prediction_label}")

# Perform cleanup before exiting.
sess.release()

We can quickly imagine how we could run any model, simply change your tar.gz file and use your own pre and post processing blocks. Using predefined SiMa.ai calls will be more optimal than using external libraries for the same function. All PePPi functions try to mimic python-opencv nomenclature as sima.VideoCapture, sima.resize, sima.sigmoid and so on. Check out our PePPi APIs for more information.

PePPi APIs

Session handling - MLSoCRT

sima.MLSoCRT(model_tgz_file_name: str, image_height: int = -1, image_width: int = -1, verbose: bool = False) → MLSoCRT

This class is used to execute an ML model on the SoC. It is assumed that an ML model has been compiled by the Model SDK into a tar.gz file that contains a set of .lm files that need to execute on the MLA, a set of .so files that need to execute on the ARM APU, and an mpk.json file that specifies how to orchestrate the execution of the .lm and .so files, as well as any SiMa-specific pre-processing and post-processing that must be performed. The caller may optionally specify the height and width of the incoming camera frames. If these values are not specified (default value of -1), they are automatically derived from the first captured frame.

Parameters:

model_tgz_file_name – tar.gz file path associated with the model.
image_height – height of the incoming camera frames, which is used internally for creating json files needed for decoding and encoding. This field will be deprecated in the future.
image_width – width of the incoming camera frames. Also used internally for creating json files needed for decoding and encoding. This field will be deprecated in the future.
verbose – optional parameter to be used for debugging.

Returns:

An initialized MLSoCRT object.

sima.MLSoCRT.run_session(preprocessed_frames: List[np.ndarray]) → List[np.ndarray]

This function runs inference on the passed-in input frames. The input frames are assumed to be in BGR format, and to have a 3D shape with an HWC layout. A list of Numpy arrays that correspond to the output tensor(s) is returned. In general, each output tensor has a 4D shape with an NHWC layout, but this should be confirmed by inspecting the Numpy array. Parameters:

Parameters:: preprocessed_frames – one or more input frames on which inference is to be performed. The input frames are assumed to be in BGR format, and to have a 3D shape with an HWC layout. Additionally, the datatype must be float32.
Returns:: A list of Numpy arrays that correspond to the output tensor(s) of the model.

sima.MLSoCRT.release(): This function cleans up all internal resources associated with the MLSoCRT object.

VideoCapture

sima.VideoCapture(network_src: str, image_height: int = -1, image_width: int = -1) → VideoCapture

This class is used for capturing input frames from a video source. The caller may specify the desired height and width of the captured video frames. If these values are not specified (default value of -1), they are automatically derived from the first captured frame. The video source may be an IP camera or UDP sink. In the former case, we set up an RTSP client that talks to the RTSP server and receives H.264 encoded frames. Each H.264 frame is then decoded using the internal hardware-accelerated decoder, and the result is an NV12 frame. Finally, the NV12 frame is converted to BGR format using hardware acceleration and returned to the caller. An instance of sima.VideoCapture() must be created before that of sima.MLSoCRT.

Parameters:

network_src – network source, which may be an IP camera or UDP sink. If the source is an IP camera, this parameter must be a full RTSP URL of the form rtsp://<camera_ip>/<stream>. Otherwise, this parameter must be a UDP port number.
image_height – desired height of the captured video frame.
image_width – desired width of the captured video frame.

Returns:

An initialized VideoCapture object.

sima.VideoCapture.isOpened() → bool

This function returns True if the video source has been opened.

Returns:: Boolean specifying whether or not the video capture has been opened.

sima.VideoCapture.read() → np.ndarray

This function reads a frame from the input video source and returns it. The frame is assumed to be in BGR format with an underlying datatype of uint8.

Returns:: Captured video frame in BGR format.

sima.VideoCapture.get_height() → int

This function returns the height of the captured video frame. This function should only be called after the first call to read().

Returns:: Height of the captured video frame.

sima.VideoCapture.get_width() → int

This function returns the width of the captured video frame. This function should only be called after the first call to read().

Returns:: Width of the captured video frame.

sima.VideoCapture.release(): This function cleans up all internal resources associated with the VideoCapture object.

StreamVideo

sima.StreamVideo(image_height: int, image_width: int, host_IP: str, gst_port: int | None) → StreamVideo

This class is used for displaying output video on a host. Images to be displayed are assumed to be in BGR format. We instantiate a Gstreamer pipeline, which uses the internal Allegro encoder; the incoming BGR frame is first converted to NV12 before being sent to the encoder which outputs H-264 encoded frame. H.264-encoded video, packages it for real-time transmission as MPEG2 TS stream, and sends it to a specified IP address and port over UDP. To instantiate the GStreamer pipeline, you will need to run the following command on your host: gst-launch <gst_string>.

Parameters:

image_height – height of the image to be displayed.
image_width – width of the image to be displayed.
host_IP – IP address of the host.
gst_port – optional port number.

Returns:

An initialized ShowVideo object.

Users can use the ffplay command or vlc to display the stream on the host you are streaming the video to. Assuming you are sending the stream to port 9000 as in your GStreamer pipeline, you can run ffplay/vlc as follows:

ffplay udp://localhost:9000 or vlc udp://localhost:9000

sima.ShowVideo.imshow(bgr_frame: np.ndarray) → int

This function displays the input frame on the host. The input frame, which is assumed to be in BGR format.

Parameters:: bgr_frame – the video frame to be displayed. Number of channels can only be 3 and that the data type can only be np.uint8.
Returns:: Integer error code.

Image Pre-Processing

PePPi APIs are provided for common image pre-processing functions. These functions, which include scaling and normalization, are applicable to all input images that are in BGR format. Be sure to reference the fp32 pipeline in order to determine the specific image pre-processing that is required.

The pre-processing APIs available to users are specified below:

sima.resize(data: np.ndarray, target_height: int, target_width: int, keep_aspect: bool, deposit_location: str = 'center', method: str = 'linear') → np.ndarray

This function resizes the input frame to the target dimensions, then returns the resized frame. Both the input and output frames are assumed to be in BGR format, and to have a 3D shape with an HWC layout.

Parameters:

data – input frame to be resized in BGR format, with 3D shape (HWC) and datatype of uint8.
target_height – height to which the frame is to be resized.
target_width – width to which the frame is to be resized.
keep_aspect – boolean that specifies whether or not to keep the aspect ratio during the resize operation.
deposit_location – string that specifies where to place the resized image within the padded frame. Permissible values are: center (default), topleff, and bottomright.
method – interpolation algorithm to be used during the resize operation. Supported methods are linear, nearest, area, and cubic.

Returns:

A Numpy array that represents the resized input frame in BGR format, with 3D shape and data type of uint8.

sima.normalize(bgr_frame: np.ndarray, channel_params: List[Tuple[float, float, float]]) → np.ndarray

This function normalizes the input frame with the given channel parameters, then returns the normalized frame. Both the input and output frames are assumed to be in BGR format, and to have a 3D shape with an HWC layout. The channel parameters are assumed to be in RGB order.

Parameters:

bgr_frame – input frame to be normalized in BGR format with 3D shape, HWC layout, and an underlying datatype of uint8.
channel_params – per-channel scale, mean, and sigma values.

Returns:

A Numpy array that represents the normalized input frame in BGR format, with 3D shape, HWC layout, and an underlying datatype of float32.

Output Post-Processing

PePPi APIs are provided for common post-processing functions. In addition to using these APIs, users may also write their own custom post-processing code in Python and invoke this code from their inference scripts. The post-processing APIs available to users are specified below:

sima.sigmoid(data: np.ndarray, save_int16: bool) → np.ndarray

This function performs the sigmoid computation on the input tensor, then returns the output of the computation. Both the input and output tensors are assumed to have a 4D shape with an NHWC layout.

Parameters:

data – input tensor to which sigmoid is to be applied.The shape should be 4D in NHWC layout. The input datatype is expected to be float32.
save_int16 – boolean that specifies whether or not to convert the sigmoid output to an int16 value for reducing bandwidth.

Returns:

Output of the sigmoid computation.

sima.centernet_nms_maxpool(data: np.ndarray, kernel: int) → np.ndarray

This function performs the nms maxpool computation on the input tensor, then returns the output of the computation. Both the input and output tensors are assumed to have a 4D shape with an NHWC layout. This particular operation is specific to the CenterNet model.

Parameters:

data – input tensor to which nms maxpool is to be applied.
kernel – maxpool kernel size, which must be an odd number.

Returns:

Output of the nms maxpool computation.