.. _build_host_app_with_cpp: Build Host App With C++ API ########################### In PCIe mode, the SiMa.ai MLSoC can be paired with a host system through PCIe and a host CPU can offload portions of the ML application to the MLSoC. The APIs are integrated into the host C++ application, which will then communicate with the MLSoC through PCIe. .. note:: In PCIe mode, you can currently use the Machine Learning Accelerator (MLA) to run inference tasks (``quant`` -> ``NN Model`` -> ``dequant``). Additionally, with our support, you can manually generate the MPK in the SDK to enable any valid GStreamer PCIe pipeline. In a future release, this mode will expand to include access to all hardware blocks, such as video codecs, enabling pre- and post-processing operations directly on the MLSoC. This enhancement is part of our ongoing roadmap. Follow the instructions below to build a sample application that uses the ResNet50 model to classify images. .. tabs:: .. tab:: Prerequisites - Follow this :ref:`instruction ` to setup the development system in PCIe mode. - Download the `test image dataset `_. This will be used by the host side application. - Download the optimized `ResNet50 model `. .. code-block:: console sima-user@sima-user-machine:~$ mkdir -p ~/workspace/resnet50 sima-user@sima-user-machine:~$ mkdir -p ~/workspace/hostapp sima-user@sima-user-machine:~$ cp resnet50_mpk.tar.gz ~/workspace/resnet50/ sima-user@sima-user-machine:~$ cp test_images.tar.gz ~/workspace/hostapp/ .. tab:: Create an MPK .. note:: The following command is executed inside the Palette mpk container environment. - Create an mpk project directory with simaaipciesrc by using `mpk project create` .. code-block:: console sima-user@sima-user-machine:~$ sima-cli sdk mpk sima-user@dp-cli-mpk_cli_toolset-2:/home/docker/sima-cli$ cd resnet50 sima-user@dp-cli-mpk_cli_toolset-2:/home/docker/sima-cli/resnet50$ mpk project create --model-path ./resnet_50.tar.gz --src-plugin simaaipciesrc --input-width 224 --input-height 224 sima-user@dp-cli-mpk_cli_toolset-2:/home/docker/sima-cli/resnet50$ cd resnet_50_simaaipciesrc - Create an MPK by using `mpk create` .. code-block:: console sima-user@dp-cli-mpk_cli_toolset-2:/home/docker/sima-cli/resnet50/resnet_50_simaaipciesrc$ mpk create -s . -d . --build-target {yocto,elx} --boardtype {mlsoc,modalix} \ --clean .. tab:: Create a Host Side Application Once the ResNet50 MPK is created, the pipeline is deployed and run on the DevKit. Executing the MPK includes the following tasks: #. Read the image → Pre-process (resize, normalise) → MLSoC CPP Sync Inference → post-process (argmax) → Overlay → save output. #. Save the output images in the output folder. #. Compile the example application on the Host PC outside the Docker SDK. #. You can download the Resnet50 host application :download:`resnet50_pcie_application.tar.xz `. .. note:: The following command is executed on the host side. To avoid permission conflicts with the Palette environment, it is recommended to create a separate project folder dedicated to the host application. .. code-block:: console sima-user@sima-user-machine:~/workspace/hostapp$ tar -xvf test_images.tar.xz sima-user@sima-user-machine:~/workspace/hostapp$ tar -xvf resnet50_pcie_application.tar.xz CMakeLists.txt imagenet1000_clsidx_to_labels.txt main.cpp sima-user@sima-user-machine:~/workspace/hostapp$ mkdir build && cd build sima-user@sima-user-machine:~/workspace/hostapp/build$ cmake ../ -- The C compiler identification is GNU 11.4.0 -- The CXX compiler identification is GNU 11.4.0 -- Detecting C compiler ABI info -- Detecting C compiler ABI info - done -- Check for working C compiler: /usr/bin/cc - skipped -- Detecting C compile features -- Detecting C compile features - done -- Detecting CXX compiler ABI info -- Detecting CXX compiler ABI info - done -- Check for working CXX compiler: /usr/bin/c++ - skipped -- Detecting CXX compile features -- Detecting CXX compile features - done -- Found OpenCV: /usr/local (found version "4.6.0") -- Configuring done -- Generating done -- Build files have been written to: /home/sima-user/workspace/hostapp/build sima-user@sima-user-machine:~/workspace/hostapp/build$ make [ 50%] Building CXX object CMakeFiles/test_img.dir/main.cpp.o [100%] Linking CXX executable test_img [100%] Built target test_img .. tab:: Execute the Host Side Application .. code-block:: console sima-user@sima-user-machine:~/workspace/hostapp/build$ ./test_img ../../resnet50/resnet_50_simaaipciesrc/project.mpk Directory created or already exists: ./../output SiMaDevicePtr for GUID : sima_mla_c0 is : 0x5627ab5bf9b0 loadModel is successful with modelPtr: 0x5627ab5c8b00 runInferenceSynchronousloadModel_cppsdkpipeline modelPtr->inputShape.size:1 modelPtr->inputShape: 224 224 3 modelPtr->outputShape.size:1 modelPtr->outputShape: 1 1 1000 total Images:7 ./../test_images/000000009448.jpg starting the run Synchronous Inference Time taken per iteration: 378 milliseconds Predicted label: 879: 'umbrella', Image with predicted label saved to: ./../output/000000009448.jpg ./../test_images/000000007784.jpg starting the run Synchronous Inference Time taken per iteration: 4 milliseconds Predicted label: 701: 'parachute, chute', Image with predicted label saved to: ./../output/000000007784.jpg ... ... ... .. tab:: Inside The Host Side App The application workflow includes the following steps: - Enumerating available SiMa devices - Loading and running inference models - Preprocessing images for inference - Handling the inference results and printing them .. dropdown:: Device Initialization :animate: fade-in :color: secondary :open: Before performing inference, the application initializes and `enumerates <../api_reference/pcie_host_apis/cpp_api_references.html#_CPPv4N6simaai12SimaMlsocApi20enumerateDeviceGuidsEv>`_ available SiMa MLSoC devices using the `SimaMlsocApi <../api_reference/pcie_host_apis/cpp_api_references.html#_CPPv4N6simaai12SimaMlsocApiE>`_ interface: .. code:: CPP shared_ptr simaDeviceInst = simaai::SimaMlsocApi::getInstance(); vector guids = simaDeviceInst->enumerateDeviceGuids(); simaDeviceInst->setLogVerbosity(simaai::SiMaLogLevel::debug); For each device found, the application attempts to open the device by trying multiple GUID format variations (raw, trimmed, with ``/dev/`` prefix, and short-form suffixes) until one succeeds: .. code:: CPP std::string guid_raw = guids[i]; std::string guid_trimmed = guid_raw; guid_trimmed.erase(guid_trimmed.find_last_not_of(" \n\r\t") + 1); std::vector variations = {guid_raw, guid_trimmed, "/dev/" + guid_trimmed}; if (guid_trimmed.find("sima_mla_") == 0) variations.push_back(guid_trimmed.substr(9)); if (guid_trimmed.find("sima_mla_c") == 0) variations.push_back(guid_trimmed.substr(10)); shared_ptr SiMaDevicePtr = nullptr; std::string guid; for (const auto& var : variations) { SiMaDevicePtr = simaDeviceInst->openDevice(var); if (SiMaDevicePtr) { guid = var; break; } } .. dropdown:: Model Loading :animate: fade-in :color: secondary :open: Once a device is initialized, the application loads a pre-trained model onto the MLSoC using a ``SiMaBundle`` that describes the model's input and output tensor shapes: .. code:: CPP simaai::SiMaBundle model; std::vector in_shape{224,224,3}; std::vector out_shape{1,1,1000}; model.numInputTensors = 1; model.numOutputTensors = 1; model.outputBatchSize = 1; model.inputBatchSize = 1; model.inputShape.emplace_back(in_shape); model.outputShape.emplace_back(out_shape); shared_ptr modelPtr = NULL; modelPtr = simaDeviceInst->load(SiMaDevicePtr, model_path, model); .. dropdown:: Image Preprocessing :animate: fade-in :color: secondary :open: As Resnet50 model expects 224x224 input size, images are preprocessed before inference, including resizing, normalization, and channel reordering: .. code:: CPP cv::Mat preprocess(const cv::Mat& input_image) { cv::Size target_size(224, 224); double width_ratio = static_cast(target_size.width) / input_image.cols; double height_ratio = static_cast(target_size.height) / input_image.rows; cv::Mat resized_image; if (width_ratio < height_ratio) { int new_height = static_cast(input_image.rows * width_ratio); cv::resize(input_image, resized_image, cv::Size(target_size.width, new_height)); int top_padding = (target_size.height - new_height) / 2; int bottom_padding = target_size.height - new_height - top_padding; cv::copyMakeBorder(resized_image, resized_image, top_padding, bottom_padding, 0, 0, cv::BORDER_CONSTANT, cv::Scalar(0, 0, 0)); } else { int new_width = static_cast(input_image.cols * height_ratio); cv::resize(input_image, resized_image, cv::Size(new_width, target_size.height)); int left_padding = (target_size.width - new_width) / 2; int right_padding = target_size.width - new_width - left_padding; cv::copyMakeBorder(resized_image, resized_image, 0, 0, left_padding, right_padding, cv::BORDER_CONSTANT, cv::Scalar(0, 0, 0)); } resized_image.convertTo(resized_image, CV_32FC3, 1.0 / 255.0); cv::Scalar mean(0.485, 0.456, 0.406); cv::Scalar std_dev(0.229, 0.224, 0.225); cv::subtract(resized_image, mean, resized_image); cv::divide(resized_image, std_dev, resized_image); cv::cvtColor(resized_image, resized_image, cv::COLOR_BGR2RGB); return resized_image; } .. dropdown:: Inference Execution :animate: fade-in :color: secondary :open: The preprocessed image is loaded into an input tensor, and inference is executed `synchronously <../api_reference/pcie_host_apis/cpp_api_references.html#_CPPv4N6simaai12SimaMlsocApi14runSynchronousEK10shared_ptrI9SiMaModelERK14SiMaTensorListRK12SiMaMetaDataRK14SiMaTensorList>`_. Execution time is measured per iteration using ``std::chrono``: .. code:: CPP memcpy(inputTensorsList[0].getPtr().get(), preprocessed_image.data, inputTensorsList[0].getSizeInBytes()); auto start = std::chrono::high_resolution_clock::now(); simaai::SiMaErrorCode ret = simaDeviceInst->runSynchronous(modelPtr, inputTensorsList, metaData, outputTensorsList); auto end = std::chrono::high_resolution_clock::now(); if (ret != simaai::success) cout << "runInference Failure" << endl; auto duration_ms = std::chrono::duration_cast(end - start); std::cout << "Time taken per iteration: " << duration_ms.count() << " milliseconds" << std::endl; .. dropdown:: Post-processing and Result Handling :animate: fade-in :color: secondary :open: The application extracts the classification result from the inference output tensor and saves the image with the predicted label: .. code:: CPP cv::Mat output(1, 1000, CV_32FC1, (char*)outputTensorsList[0].getPtr().get()); double max_val; cv::Point max_loc; cv::minMaxLoc(output, nullptr, &max_val, nullptr, &max_loc); int predicted_val = max_loc.x; if (predicted_val >= 0 && predicted_val < labels.size()) { std::string predicted_label = labels[predicted_val]; writeTextToImage(image, image_path, predicted_label, outputFolder); } .. dropdown:: Model and Device Cleanup :animate: fade-in :color: secondary :open: After inference is complete, the model is unloaded and the device is disconnected. Both calls check the returned error code: .. code:: CPP ret = simaDeviceInst->unload(modelPtr); if (ret != simaai::success) cout << "unloadModel is failed for loaded modelPtr" << modelPtr << endl; ret = simaDeviceInst->closeDevice(SiMaDevicePtr); if (ret != simaai::success) cout << "closeDevice() is failed for GUID" << guids[i] << endl;