Build Host App With C++ APIο
In PCIe mode, the SiMa.ai MLSoC can be paired with a host system through PCIe and a host CPU can offload portions of the ML application to the MLSoC. The APIs are integrated into the host C++ application, which will then communicate with the MLSoC through PCIe.
Note
In PCIe mode, you can currently use the Machine Learning Accelerator (MLA) to run inference tasks (quant -> NN Model -> dequant).
Additionally, with our support, you can manually generate the MPK in the SDK to enable any valid GStreamer PCIe pipeline.
In a future release, this mode will expand to include access to all hardware blocks, such as video codecs, enabling pre- and post-processing operations directly on the MLSoC. This enhancement is part of our ongoing roadmap.
Follow the instructions below to build a sample application that uses the ResNet50 model to classify images.
Follow this instruction to setup the development system in PCIe mode.
Download the test image dataset. This will be used by the host side application.
Download the optimized `ResNet50 model <pkg_downloads/SDK2.0.0/model_zoo/modalix/resnet_50.tar.gz`_ and make it available to the Palette container environment. To learn more on how to optimize a standard Resnet50 ONNX model refer to this link.
sima-user@sima-user-machine:~$ mkdir -p ~/workspace/resnet50 sima-user@sima-user-machine:~$ mkdir -p ~/workspace/hostapp sima-user@sima-user-machine:~$ cp resnet50_mpk.tar.gz ~/workspace/resnet50/ sima-user@sima-user-machine:~$ cp test_images.tar.gz ~/workspace/hostapp/
Note
The following command is executed inside the Palette mpk container environment.
Create an mpk project directory with simaaipciesrc by using mpk project create
sima-user@sima-user-machine:~$ sima-cli sdk mpk sima-user@dp-cli-mpk_cli_toolset-2:/home/docker/sima-cli$ cd resnet50 sima-user@dp-cli-mpk_cli_toolset-2:/home/docker/sima-cli/resnet50$ mpk project create --model-path ./resnet_50.tar.gz --src-plugin simaaipciesrc --input-width 224 --input-height 224 sima-user@dp-cli-mpk_cli_toolset-2:/home/docker/sima-cli/resnet50$ cd resnet_50_simaaipciesrc
Create an MPK by using mpk create
sima-user@dp-cli-mpk_cli_toolset-2:/home/docker/sima-cli/resnet50/resnet_50_simaaipciesrc$ mpk create -s . -d . --build-target {yocto,elx} --boardtype {mlsoc,modalix} \ --clean
Once the ResNet50 MPK is created, the pipeline is deployed and run on the DevKit. Executing the MPK includes the following tasks:
Read the image β Pre-process (resize, normalise) β MLSoC CPP Sync Inference β post-process (argmax) β Overlay β save output.
Save the output images in the output folder.
Compile the example application on the Host PC outside the Docker SDK.
You can download the Resnet50 host application
resnet50_pcie_application.tar.xz.
Note
The following command is executed on the host side. To avoid permission conflicts with the Palette environment, it is recommended to create a separate project folder dedicated to the host application.
sima-user@sima-user-machine:~/workspace/hostapp$ tar -xvf test_images.tar.xz
sima-user@sima-user-machine:~/workspace/hostapp$ tar -xvf resnet50_pcie_application.tar.xz
CMakeLists.txt
imagenet1000_clsidx_to_labels.txt
main.cpp
sima-user@sima-user-machine:~/workspace/hostapp$ mkdir build && cd build
sima-user@sima-user-machine:~/workspace/hostapp/build$ cmake ../
-- The C compiler identification is GNU 11.4.0
-- The CXX compiler identification is GNU 11.4.0
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /usr/bin/cc - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /usr/bin/c++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Found OpenCV: /usr/local (found version "4.6.0")
-- Configuring done
-- Generating done
-- Build files have been written to: /home/sima-user/workspace/hostapp/build
sima-user@sima-user-machine:~/workspace/hostapp/build$ make
[ 50%] Building CXX object CMakeFiles/test_img.dir/main.cpp.o
[100%] Linking CXX executable test_img
[100%] Built target test_img
sima-user@sima-user-machine:~/workspace/hostapp/build$ ./test_img ../../resnet50/resnet_50_simaaipciesrc/project.mpk
Directory created or already exists: ./../output
SiMaDevicePtr for GUID : sima_mla_c0
is : 0x5627ab5bf9b0
loadModel is successful with modelPtr: 0x5627ab5c8b00
runInferenceSynchronousloadModel_cppsdkpipeline
modelPtr->inputShape.size:1
modelPtr->inputShape:
224 224 3
modelPtr->outputShape.size:1
modelPtr->outputShape:
1 1 1000
total Images:7
./../test_images/000000009448.jpg
starting the run Synchronous Inference
Time taken per iteration: 378 milliseconds
Predicted label: 879: 'umbrella',
Image with predicted label saved to: ./../output/000000009448.jpg
./../test_images/000000007784.jpg
starting the run Synchronous Inference
Time taken per iteration: 4 milliseconds
Predicted label: 701: 'parachute, chute',
Image with predicted label saved to: ./../output/000000007784.jpg
... ... ...
The application workflow includes the following steps:
Enumerating available SiMa devices
Loading and running inference models
Preprocessing images for inference
Handling the inference results and printing them
Device Initialization
Before performing inference, the application initializes and enumerates available SiMa MLSoC devices using the SimaMlsocApi interface:
shared_ptr<simaai::SimaMlsocApi> simaDeviceInst = simaai::SimaMlsocApi::getInstance();
vector<string> guids = simaDeviceInst->enumerateDeviceGuids();
simaDeviceInst->setLogVerbosity(simaai::SiMaLogLevel::debug);
For each device found, the application attempts to open the device by trying multiple GUID format variations (raw, trimmed, with /dev/ prefix, and short-form suffixes) until one succeeds:
std::string guid_raw = guids[i];
std::string guid_trimmed = guid_raw;
guid_trimmed.erase(guid_trimmed.find_last_not_of(" \n\r\t") + 1);
std::vector<std::string> variations = {guid_raw, guid_trimmed, "/dev/" + guid_trimmed};
if (guid_trimmed.find("sima_mla_") == 0)
variations.push_back(guid_trimmed.substr(9));
if (guid_trimmed.find("sima_mla_c") == 0)
variations.push_back(guid_trimmed.substr(10));
shared_ptr<simaai::SiMaDevice> SiMaDevicePtr = nullptr;
std::string guid;
for (const auto& var : variations) {
SiMaDevicePtr = simaDeviceInst->openDevice(var);
if (SiMaDevicePtr) {
guid = var;
break;
}
}
Model Loading
Once a device is initialized, the application loads a pre-trained model onto the MLSoC using a SiMaBundle that describes the modelβs input and output tensor shapes:
simaai::SiMaBundle model;
std::vector<uint32_t> in_shape{224,224,3};
std::vector<uint32_t> out_shape{1,1,1000};
model.numInputTensors = 1;
model.numOutputTensors = 1;
model.outputBatchSize = 1;
model.inputBatchSize = 1;
model.inputShape.emplace_back(in_shape);
model.outputShape.emplace_back(out_shape);
shared_ptr<simaai::SiMaBundle> modelPtr = NULL;
modelPtr = simaDeviceInst->load(SiMaDevicePtr, model_path, model);
Image Preprocessing
As Resnet50 model expects 224x224 input size, images are preprocessed before inference, including resizing, normalization, and channel reordering:
cv::Mat preprocess(const cv::Mat& input_image) {
cv::Size target_size(224, 224);
double width_ratio = static_cast<double>(target_size.width) / input_image.cols;
double height_ratio = static_cast<double>(target_size.height) / input_image.rows;
cv::Mat resized_image;
if (width_ratio < height_ratio) {
int new_height = static_cast<int>(input_image.rows * width_ratio);
cv::resize(input_image, resized_image, cv::Size(target_size.width, new_height));
int top_padding = (target_size.height - new_height) / 2;
int bottom_padding = target_size.height - new_height - top_padding;
cv::copyMakeBorder(resized_image, resized_image, top_padding, bottom_padding, 0, 0, cv::BORDER_CONSTANT, cv::Scalar(0, 0, 0));
} else {
int new_width = static_cast<int>(input_image.cols * height_ratio);
cv::resize(input_image, resized_image, cv::Size(new_width, target_size.height));
int left_padding = (target_size.width - new_width) / 2;
int right_padding = target_size.width - new_width - left_padding;
cv::copyMakeBorder(resized_image, resized_image, 0, 0, left_padding, right_padding, cv::BORDER_CONSTANT, cv::Scalar(0, 0, 0));
}
resized_image.convertTo(resized_image, CV_32FC3, 1.0 / 255.0);
cv::Scalar mean(0.485, 0.456, 0.406);
cv::Scalar std_dev(0.229, 0.224, 0.225);
cv::subtract(resized_image, mean, resized_image);
cv::divide(resized_image, std_dev, resized_image);
cv::cvtColor(resized_image, resized_image, cv::COLOR_BGR2RGB);
return resized_image;
}
Inference Execution
The preprocessed image is loaded into an input tensor, and inference is executed synchronously. Execution time is measured per iteration using std::chrono:
memcpy(inputTensorsList[0].getPtr().get(), preprocessed_image.data, inputTensorsList[0].getSizeInBytes());
auto start = std::chrono::high_resolution_clock::now();
simaai::SiMaErrorCode ret = simaDeviceInst->runSynchronous(modelPtr, inputTensorsList, metaData, outputTensorsList);
auto end = std::chrono::high_resolution_clock::now();
if (ret != simaai::success)
cout << "runInference Failure" << endl;
auto duration_ms = std::chrono::duration_cast<std::chrono::milliseconds>(end - start);
std::cout << "Time taken per iteration: " << duration_ms.count() << " milliseconds" << std::endl;
Post-processing and Result Handling
The application extracts the classification result from the inference output tensor and saves the image with the predicted label:
cv::Mat output(1, 1000, CV_32FC1, (char*)outputTensorsList[0].getPtr().get());
double max_val;
cv::Point max_loc;
cv::minMaxLoc(output, nullptr, &max_val, nullptr, &max_loc);
int predicted_val = max_loc.x;
if (predicted_val >= 0 && predicted_val < labels.size()) {
std::string predicted_label = labels[predicted_val];
writeTextToImage(image, image_path, predicted_label, outputFolder);
}
Model and Device Cleanup
After inference is complete, the model is unloaded and the device is disconnected. Both calls check the returned error code:
ret = simaDeviceInst->unload(modelPtr);
if (ret != simaai::success)
cout << "unloadModel is failed for loaded modelPtr" << modelPtr << endl;
ret = simaDeviceInst->closeDevice(SiMaDevicePtr);
if (ret != simaai::success)
cout << "closeDevice() is failed for GUID" << guids[i] << endl;