Build Host App With C++ API
In PCIe mode, the SiMa.ai MLSoC can be paired with a host system through PCIe and a host CPU can offload portions of the ML application to the MLSoC. The APIs are integrated into the host C++ application, which will then communicate with the MLSoC through PCIe.
Note
In PCIe mode, you can currently use the Machine Learning Accelerator (MLA) to run inference tasks (quant
-> NN Model
-> dequant
).
Additionally, with our support, you can manually generate the MPK in the SDK to enable any valid GStreamer PCIe pipeline.
In a future release, this mode will expand to include access to all hardware blocks, such as video codecs, enabling pre- and post-processing operations directly on the MLSoC. This enhancement is part of our ongoing roadmap.
Follow the instructions below to build a sample application that uses the ResNet50 model to classify images.
Follow this instruction to setup the development system in PCIe mode.
Download the test image dataset. This will be used by the host side application.
Download the optimized ResNet50 model and make it available to the Palette container environment. To learn more on how to optimize a standard Resnet50 ONNX model refer to this link.
sima-user@sima-user-machine:~$ mkdir -p ~/workspace/resnet50 sima-user@sima-user-machine:~$ mkdir -p ~/workspace/hostapp sima-user@sima-user-machine:~$ cp resnet50_mpk.tar.gz ~/workspace/resnet50/ sima-user@sima-user-machine:~$ cp test_images.tar.gz ~/workspace/hostapp/
Note
The following command is executed inside the Palette container environment.
Optionally, depends on the input image resolution, edit the
user_cfg.json
file to modify the parameters shown below. The sample_cfg.json file is in/usr/local/simaai/utils/mpk_parser/user_cfg.json
.sima-user@docker-image-id:/home/docker/sima-cli/resnet50$ vi /usr/local/simaai/utils/mpk_parser/user_cfg.json { "img_width": 1920, "img_height": 1080, "input_width": 1920, "input_height": 1080, "output_width": 224, "output_height": 224, "input_depth": 3, "keep_aspect": 0, "norm_channel_params": [[1.0, 0, 0.003921569], [1.0, 0, 0.003921569], [1.0, 0, 0.003921569]], "normalize": 1, "input_type": "RGB", "output_type": "BGR", "scaling_type": "INTER_LINEAR", "padding_type": "CENTER", "ibufname": "input" }
Using the MPK parser tool create an MPK file from the downloaded model
resnet50_mpk.tar.gz
. Run the below commands inside the Palette container environment.sima-user@docker-image-id:/home/docker/sima-cli/resnet50$ mkdir -p /tmp/resnet50_mpk && tar -xzf resnet50_mpk.tar.gz -C /tmp/resnet50_mpk && rm /tmp/resnet50_mpk/preproc.json /tmp/resnet50_mpk/mla.json /tmp/resnet50_mpk/detess_dequant.json && tar -czf resnet50_mpk.tar.gz -C /tmp/resnet50_mpk . && rm -r /tmp/resnet50_mpk sima-user@docker-image-id:/home/docker/sima-cli/resnet50$ python3 /usr/local/simaai/utils/mpk_parser/m_parser.py -targz resnet50_mpk.tar.gz -project /usr/local/simaai/app_zoo/Gstreamer/CPP_API_TestPipeline -cfg /usr/local/simaai/utils/mpk_parser/user_cfg.json File preproc.json copied to /usr/local/simaai/app_zoo/Gstreamer/CPP_API_TestPipeline/plugins/pre_process/cfg File postproc.json copied to /usr/local/simaai/app_zoo/Gstreamer/CPP_API_TestPipeline/plugins/post_process/cfg File mla.json copied to /usr/local/simaai/app_zoo/Gstreamer/CPP_API_TestPipeline/plugins/process_mla/cfg File ./sima_temp/resnet50_stage1_mla.lm copied to /usr/local/simaai/app_zoo/Gstreamer/CPP_API_TestPipeline/plugins/process_mla/res/process.lm File ./sima_temp/resnet50_mpk.json copied to /usr/local/simaai/app_zoo/Gstreamer/CPP_API_TestPipeline/resources/mpk.json ℹ Compiling a65-apps... ✔ a65-apps compiled successfully. ℹ Compiling Plugins... ✔ Plugins Compiled successfully. ℹ Copying Resources... ✔ Resources Copied successfully. ℹ Building Rpm... ✔ Rpm built successfully. ℹ Creating mpk file... ✔ Mpk file created successfully at /home/docker/sima-cli/resnet50/project.mpk . MPK Created successfully.
The generated
project.mpk
file will be used by the host side application later to deploy to the DevKit.
Once the ResNet50 MPK is created, the pipeline is deployed and run on the DevKit. Executing the MPK includes the following tasks:
Read the image → Pre-process (resize, normalise) → MLSoC CPP Sync Inference → post-process (argmax) → Overlay → save output.
Save the output images in the output folder.
Compile the example application on the Host PC outside the Docker SDK.
You can download the Resnet50 example
resnet50_pcie_application.tar.xz
.
Note
The following command is executed on the host side. To avoid permission conflicts with the Palette environment, it is recommended to create a separate project folder dedicated to the host application.
sima-user@sima-user-machine:~/workspace/hostapp$ tar -xvf test_images.tar.xz
sima-user@sima-user-machine:~/workspace/hostapp$ tar -xvf resnet50_pcie_application.tar.xz
CMakeLists.txt
imagenet1000_clsidx_to_labels.txt
main.cpp
resnet50_project.mpk
sima-user@sima-user-machine:~/workspace/hostapp$ mkdir build && cd build
sima-user@sima-user-machine:~/workspace/hostapp/build$ cmake ../
-- The C compiler identification is GNU 11.4.0
-- The CXX compiler identification is GNU 11.4.0
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /usr/bin/cc - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /usr/bin/c++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Found OpenCV: /usr/local (found version "4.6.0")
-- Configuring done
-- Generating done
-- Build files have been written to: /home/sima-user/workspace/hostapp/build
sima-user@sima-user-machine:~/workspace/hostapp/build$ make
[ 50%] Building CXX object CMakeFiles/test_img.dir/main.cpp.o
[100%] Linking CXX executable test_img
[100%] Built target test_img
sima-user@sima-user-machine:~/workspace/hostapp/build$ ./test_img ../../resnet50/project.mpk
Directory created or already exists: ./../output
SiMaDevicePtr for GUID : sima_mla_c0
is : 0x5627ab5bf9b0
sima_send_mgmt_file: File Name: ../../resnet50/project.mpk
sima_send_mgmt_file: File size 20184712
sima_send_mgmt_file: Total Bytes sent 20184712 in 1 seconds
Opening /dev/sima_mla_c0
si_mla_create_data_queues: Data completion queue successfully created
si_mla_create_data_queues: Data work queue successfully created
si_mla_create_data_queues: Data receive queue successfully created
loadModel is successful with modelPtr: 0x5627ab5c8b00
runInferenceSynchronousloadModel_cppsdkpipeline
modelPtr->inputShape.size:1
modelPtr->inputShape:
224 224 3
modelPtr->outputShape.size:1
modelPtr->outputShape:
1 1 1000
total Images:7
./../test_images/000000009448.jpg
starting the run Synchronous Inference
Time taken per iteration: 378 milliseconds
Predicted label: 879: 'umbrella',
Image with predicted label saved to: ./../output/000000009448.jpg
./../test_images/000000007784.jpg
starting the run Synchronous Inference
Time taken per iteration: 4 milliseconds
Predicted label: 701: 'parachute, chute',
Image with predicted label saved to: ./../output/000000007784.jpg
... ... ...
The application workflow includes the following steps:
Enumerating available SiMa devices
Loading and running inference models
Preprocessing images for inference
Handling the inference results and printing them
Device Initialization
Before performing inference, the application initializes and enumerates available SiMa MLSoC devices using the SimaMlsocApi interface:
shared_ptr<simaai::SimaMlsocApi> simaDeviceInst = simaai::SimaMlsocApi::getInstance();
vector<string> guids = simaDeviceInst->enumerateDeviceGuids();
simaDeviceInst->setLogVerbosity(simaai::SiMaLogLevel::debug);
For each device found, the application opens the device and prepares it for inference:
shared_ptr<simaai::SiMaDevice> SiMaDevicePtr = simaDeviceInst->openDevice(guids[i]);
Model Loading
Once a device is initialized, the application loads a pre-trained model onto the MLSoC:
simaai::SiMaModel model;
std::vector<uint32_t> in_shape{224,224,3};
std::vector<uint32_t> out_shape{1,1,1000};
model.numInputTensors = 1;
model.numOutputTensors = 1;
model.outputBatchSize = 1;
model.inputBatchSize = 1;
model.inputShape.emplace_back(in_shape);
model.outputShape.emplace_back(out_shape);
shared_ptr<simaai::SiMaModel> modelPtr = simaDeviceInst->load(SiMaDevicePtr, model_path, model);
Image Preprocessing
As Resnet50 model expects 224x224 input size, images are preprocessed before inference, including resizing, normalization, and channel reordering:
cv::Mat preprocess(const cv::Mat& input_image) {
cv::Size target_size(224, 224);
double width_ratio = static_cast<double>(target_size.width) / input_image.cols;
double height_ratio = static_cast<double>(target_size.height) / input_image.rows;
cv::Mat resized_image;
if (width_ratio < height_ratio) {
int new_height = static_cast<int>(input_image.rows * width_ratio);
cv::resize(input_image, resized_image, cv::Size(target_size.width, new_height));
int top_padding = (target_size.height - new_height) / 2;
int bottom_padding = target_size.height - new_height - top_padding;
cv::copyMakeBorder(resized_image, resized_image, top_padding, bottom_padding, 0, 0, cv::BORDER_CONSTANT, cv::Scalar(0, 0, 0));
} else {
int new_width = static_cast<int>(input_image.cols * height_ratio);
cv::resize(input_image, resized_image, cv::Size(new_width, target_size.height));
int left_padding = (target_size.width - new_width) / 2;
int right_padding = target_size.width - new_width - left_padding;
cv::copyMakeBorder(resized_image, resized_image, 0, 0, left_padding, right_padding, cv::BORDER_CONSTANT, cv::Scalar(0, 0, 0));
}
resized_image.convertTo(resized_image, CV_32FC3, 1.0 / 255.0);
cv::Scalar mean(0.485, 0.456, 0.406);
cv::Scalar std_dev(0.229, 0.224, 0.225);
cv::subtract(resized_image, mean, resized_image);
cv::divide(resized_image, std_dev, resized_image);
cv::cvtColor(resized_image, resized_image, cv::COLOR_BGR2RGB);
return resized_image;
}
Inference Execution
The preprocessed image is loaded into an input tensor, and inference is executed synchronously:
memcpy(inputTensorsList[0].getPtr().get(), preprocessed_image.data, inputTensorsList[0].getSizeInBytes());
simaai::SiMaErrorCode ret = simaDeviceInst->runSynchronous(modelPtr, inputTensorsList, metaData, outputTensorsList);
if (ret != simaai::success) {
cout << "runInference Failure" << endl;
}
Post-processing and Result Handling
The application extracts the classification result from the inference output tensor and saves the image with the predicted label:
cv::Mat output(1, 1000, CV_32FC1, (char*)outputTensorsList[0].getPtr().get());
double max_val;
cv::Point max_loc;
cv::minMaxLoc(output, nullptr, &max_val, nullptr, &max_loc);
int predicted_val = max_loc.x;
if (predicted_val >= 0 && predicted_val < labels.size()) {
std::string predicted_label = labels[predicted_val];
writeTextToImage(image, image_path, predicted_label, outputFolder);
}
Model and Device Cleanup
After inference is complete, the model is unloaded, and the device is disconnected:
simaDeviceInst->unload(modelPtr);
simaDeviceInst->closeDevice(SiMaDevicePtr);