.. _build_host_app_with_cpp:

Build Host App With C++ API
###########################

In PCIe mode, the SiMa.ai MLSoC can be paired with a host system through PCIe and a host 
CPU can offload portions of the ML application to the MLSoC. The APIs are integrated into the 
host C++ application, which will then communicate with the MLSoC through PCIe.

.. note::

    In PCIe mode, you can currently use the Machine Learning Accelerator (MLA) to run inference tasks (``quant`` -> ``NN Model`` -> ``dequant``). 
    Additionally, with our support, you can manually generate the MPK in the SDK to enable any valid GStreamer PCIe pipeline. 
    
    In a future release, this mode will expand to include access to all hardware blocks, such as video codecs, enabling pre- and post-processing operations directly on the MLSoC. 
    This enhancement is part of our ongoing roadmap.

Follow the instructions below to build a sample application that uses the ResNet50 model to classify images.

.. tabs::
         
   .. tab:: Prerequisites

      - Follow this :ref:`instruction <Setup_PCIe_Mode>` to setup the development system in PCIe mode.
      - Download the `test image dataset <https://docs.sima.ai/assets/test_images.tar.xz>`_. This will be used by the host side application.
      - Download the optimized `ResNet50 model <pkg_downloads/SDK2.0.0/model_zoo/modalix/resnet_50.tar.gz`_ and make it available to the Palette container environment. To learn more on how to optimize a standard Resnet50 ONNX model refer to this :ref:`link <optimize_model>`.

         .. code-block:: console

            sima-user@sima-user-machine:~$ mkdir -p ~/workspace/resnet50
            sima-user@sima-user-machine:~$ mkdir -p ~/workspace/hostapp
            sima-user@sima-user-machine:~$ cp resnet50_mpk.tar.gz ~/workspace/resnet50/ 
            sima-user@sima-user-machine:~$ cp test_images.tar.gz ~/workspace/hostapp/ 


   .. tab:: Create an MPK

      .. note:: 

         The following command is executed inside the Palette mpk container environment.


      - Create an mpk project directory with simaaipciesrc by using `mpk project create`

         .. code-block:: console
            
            sima-user@sima-user-machine:~$ sima-cli sdk mpk 
            sima-user@dp-cli-mpk_cli_toolset-2:/home/docker/sima-cli$ cd resnet50
            sima-user@dp-cli-mpk_cli_toolset-2:/home/docker/sima-cli/resnet50$ mpk project create --model-path ./resnet_50.tar.gz --src-plugin simaaipciesrc --input-width 224 --input-height 224
            sima-user@dp-cli-mpk_cli_toolset-2:/home/docker/sima-cli/resnet50$ cd resnet_50_simaaipciesrc
            
      - Create an MPK by using `mpk create`

         .. code-block:: console

            sima-user@dp-cli-mpk_cli_toolset-2:/home/docker/sima-cli/resnet50/resnet_50_simaaipciesrc$   mpk create -s . -d . --build-target {yocto,elx} --boardtype {mlsoc,modalix} \
            --clean


   .. tab:: Create a Host Side Application

      Once the ResNet50 MPK is created, the pipeline is deployed and run on the DevKit. 
      Executing the MPK includes the following tasks:

      #. Read the image → Pre-process (resize, normalise) → MLSoC CPP Sync Inference → post-process (argmax) →  Overlay → save output.
      #. Save the output images in the output folder.
      #. Compile the example application on the Host PC outside the Docker SDK.
      #. You can download the Resnet50 host application :download:`resnet50_pcie_application.tar.xz <https://docs.sima.ai/assets/resnet50_pcie_application.tar.xz>`.

      .. note:: 

         The following command is executed on the host side. To avoid permission conflicts with the Palette environment, it is recommended to create a separate project folder dedicated to the host application.

      .. code-block:: console
         
         sima-user@sima-user-machine:~/workspace/hostapp$ tar -xvf test_images.tar.xz
         sima-user@sima-user-machine:~/workspace/hostapp$ tar -xvf resnet50_pcie_application.tar.xz
         CMakeLists.txt
         imagenet1000_clsidx_to_labels.txt
         main.cpp

         sima-user@sima-user-machine:~/workspace/hostapp$ mkdir build && cd build
         sima-user@sima-user-machine:~/workspace/hostapp/build$ cmake ../
         -- The C compiler identification is GNU 11.4.0
         -- The CXX compiler identification is GNU 11.4.0
         -- Detecting C compiler ABI info
         -- Detecting C compiler ABI info - done
         -- Check for working C compiler: /usr/bin/cc - skipped
         -- Detecting C compile features
         -- Detecting C compile features - done
         -- Detecting CXX compiler ABI info
         -- Detecting CXX compiler ABI info - done
         -- Check for working CXX compiler: /usr/bin/c++ - skipped
         -- Detecting CXX compile features
         -- Detecting CXX compile features - done
         -- Found OpenCV: /usr/local (found version "4.6.0") 
         -- Configuring done
         -- Generating done
         -- Build files have been written to: /home/sima-user/workspace/hostapp/build

         sima-user@sima-user-machine:~/workspace/hostapp/build$ make
         [ 50%] Building CXX object CMakeFiles/test_img.dir/main.cpp.o
         [100%] Linking CXX executable test_img
         [100%] Built target test_img

   .. tab:: Execute the Host Side Application

         .. code-block:: console

               sima-user@sima-user-machine:~/workspace/hostapp/build$ ./test_img ../../resnet50/resnet_50_simaaipciesrc/project.mpk

               Directory created or already exists: ./../output
               SiMaDevicePtr for GUID : sima_mla_c0
               is : 0x5627ab5bf9b0
               
               loadModel is successful with modelPtr: 0x5627ab5c8b00
               runInferenceSynchronousloadModel_cppsdkpipeline
               modelPtr->inputShape.size:1
               modelPtr->inputShape:
               224 224 3
               modelPtr->outputShape.size:1
               modelPtr->outputShape:
               1 1 1000
               total Images:7
               ./../test_images/000000009448.jpg
               starting the run Synchronous Inference
               Time taken per iteration: 378 milliseconds
               Predicted label: 879: 'umbrella',
               Image with predicted label saved to: ./../output/000000009448.jpg

               ./../test_images/000000007784.jpg
               starting the run Synchronous Inference
               Time taken per iteration: 4 milliseconds
               Predicted label: 701: 'parachute, chute',
               Image with predicted label saved to: ./../output/000000007784.jpg

               ... ... ... 

   .. tab:: Inside The Host Side App

      The application workflow includes the following steps:

      - Enumerating available SiMa devices
      - Loading and running inference models
      - Preprocessing images for inference
      - Handling the inference results and printing them

      .. dropdown:: Device Initialization
         :animate: fade-in
         :color: secondary
         :open:                     

         Before performing inference, the application initializes and `enumerates <../api_reference/pcie_host_apis/cpp_api_references.html#_CPPv4N6simaai12SimaMlsocApi20enumerateDeviceGuidsEv>`_ available SiMa MLSoC devices using the `SimaMlsocApi <../api_reference/pcie_host_apis/cpp_api_references.html#_CPPv4N6simaai12SimaMlsocApiE>`_ interface:

         .. code:: CPP

            shared_ptr<simaai::SimaMlsocApi> simaDeviceInst = simaai::SimaMlsocApi::getInstance();
            vector<string> guids = simaDeviceInst->enumerateDeviceGuids();
            simaDeviceInst->setLogVerbosity(simaai::SiMaLogLevel::debug);

         For each device found, the application attempts to open the device by trying multiple GUID format variations (raw, trimmed, with ``/dev/`` prefix, and short-form suffixes) until one succeeds:

         .. code:: CPP

            std::string guid_raw = guids[i];
            std::string guid_trimmed = guid_raw;
            guid_trimmed.erase(guid_trimmed.find_last_not_of(" \n\r\t") + 1);

            std::vector<std::string> variations = {guid_raw, guid_trimmed, "/dev/" + guid_trimmed};
            if (guid_trimmed.find("sima_mla_") == 0)
               variations.push_back(guid_trimmed.substr(9));
            if (guid_trimmed.find("sima_mla_c") == 0)
               variations.push_back(guid_trimmed.substr(10));

            shared_ptr<simaai::SiMaDevice> SiMaDevicePtr = nullptr;
            std::string guid;
            for (const auto& var : variations) {
               SiMaDevicePtr = simaDeviceInst->openDevice(var);
               if (SiMaDevicePtr) {
                  guid = var;
                  break;
               }
            }

      .. dropdown:: Model Loading
         :animate: fade-in
         :color: secondary
         :open:                     

         Once a device is initialized, the application loads a pre-trained model onto the MLSoC using a ``SiMaBundle`` that describes the model's input and output tensor shapes:

         .. code:: CPP

            simaai::SiMaBundle model;
            std::vector<uint32_t> in_shape{224,224,3};
            std::vector<uint32_t> out_shape{1,1,1000};

            model.numInputTensors = 1;
            model.numOutputTensors = 1;
            model.outputBatchSize = 1;
            model.inputBatchSize = 1;
            model.inputShape.emplace_back(in_shape);
            model.outputShape.emplace_back(out_shape);

            shared_ptr<simaai::SiMaBundle> modelPtr = NULL;
            modelPtr = simaDeviceInst->load(SiMaDevicePtr, model_path, model);


      .. dropdown:: Image Preprocessing
         :animate: fade-in
         :color: secondary
         :open:            

         As Resnet50 model expects 224x224 input size, images are preprocessed before inference, including resizing, normalization, and channel reordering:

         .. code:: CPP

            cv::Mat preprocess(const cv::Mat& input_image) {
               cv::Size target_size(224, 224);
               double width_ratio = static_cast<double>(target_size.width) / input_image.cols;
               double height_ratio = static_cast<double>(target_size.height) / input_image.rows;
               cv::Mat resized_image;

               if (width_ratio < height_ratio) {
                  int new_height = static_cast<int>(input_image.rows * width_ratio);
                  cv::resize(input_image, resized_image, cv::Size(target_size.width, new_height));
                  int top_padding = (target_size.height - new_height) / 2;
                  int bottom_padding = target_size.height - new_height - top_padding;
                  cv::copyMakeBorder(resized_image, resized_image, top_padding, bottom_padding, 0, 0, cv::BORDER_CONSTANT, cv::Scalar(0, 0, 0));
               } else {
                  int new_width = static_cast<int>(input_image.cols * height_ratio);
                  cv::resize(input_image, resized_image, cv::Size(new_width, target_size.height));
                  int left_padding = (target_size.width - new_width) / 2;
                  int right_padding = target_size.width - new_width - left_padding;
                  cv::copyMakeBorder(resized_image, resized_image, 0, 0, left_padding, right_padding, cv::BORDER_CONSTANT, cv::Scalar(0, 0, 0));
               }
               
               resized_image.convertTo(resized_image, CV_32FC3, 1.0 / 255.0);
               cv::Scalar mean(0.485, 0.456, 0.406);
               cv::Scalar std_dev(0.229, 0.224, 0.225);
               cv::subtract(resized_image, mean, resized_image);
               cv::divide(resized_image, std_dev, resized_image);
               cv::cvtColor(resized_image, resized_image, cv::COLOR_BGR2RGB);
               
               return resized_image;
            }

      .. dropdown:: Inference Execution
         :animate: fade-in
         :color: secondary
         :open:            

         The preprocessed image is loaded into an input tensor, and inference is executed `synchronously <../api_reference/pcie_host_apis/cpp_api_references.html#_CPPv4N6simaai12SimaMlsocApi14runSynchronousEK10shared_ptrI9SiMaModelERK14SiMaTensorListRK12SiMaMetaDataRK14SiMaTensorList>`_. Execution time is measured per iteration using ``std::chrono``:

         .. code:: CPP

            memcpy(inputTensorsList[0].getPtr().get(), preprocessed_image.data, inputTensorsList[0].getSizeInBytes());

            auto start = std::chrono::high_resolution_clock::now();
            simaai::SiMaErrorCode ret = simaDeviceInst->runSynchronous(modelPtr, inputTensorsList, metaData, outputTensorsList);
            auto end = std::chrono::high_resolution_clock::now();

            if (ret != simaai::success)
               cout << "runInference Failure" << endl;

            auto duration_ms = std::chrono::duration_cast<std::chrono::milliseconds>(end - start);
            std::cout << "Time taken per iteration: " << duration_ms.count() << " milliseconds" << std::endl;

      .. dropdown:: Post-processing and Result Handling
         :animate: fade-in
         :color: secondary
         :open:                     

         The application extracts the classification result from the inference output tensor and saves the image with the predicted label:

         .. code:: CPP

            cv::Mat output(1, 1000, CV_32FC1, (char*)outputTensorsList[0].getPtr().get());                
            double max_val;
            cv::Point max_loc;
            cv::minMaxLoc(output, nullptr, &max_val, nullptr, &max_loc);
            int predicted_val = max_loc.x;

            if (predicted_val >= 0 && predicted_val < labels.size()) {
               std::string predicted_label = labels[predicted_val];
               writeTextToImage(image, image_path, predicted_label, outputFolder);
            } 


      .. dropdown:: Model and Device Cleanup
         :animate: fade-in
         :color: secondary
         :open:                           

         After inference is complete, the model is unloaded and the device is disconnected. Both calls check the returned error code:

         .. code:: CPP

            ret = simaDeviceInst->unload(modelPtr);
            if (ret != simaai::success)
               cout << "unloadModel is failed for loaded modelPtr" << modelPtr << endl;

            ret = simaDeviceInst->closeDevice(SiMaDevicePtr);
            if (ret != simaai::success)
               cout << "closeDevice() is failed for GUID" << guids[i] << endl;