.. _build_host_app_with_cpp:

Build Host App With C++ API
###########################

In PCIe mode, the SiMa.ai MLSoC can be paired with a host system through PCIe and a host 
CPU can offload portions of the ML application to the MLSoC. The APIs are integrated into the 
host C++ application, which will then communicate with the MLSoC through PCIe.

.. note::

    In PCIe mode, you can currently use the Machine Learning Accelerator (MLA) to run inference tasks (``quant`` -> ``NN Model`` -> ``dequant``). 
    Additionally, with our support, you can manually generate the MPK in the SDK to enable any valid GStreamer PCIe pipeline. 
    
    In a future release, this mode will expand to include access to all hardware blocks, such as video codecs, enabling pre- and post-processing operations directly on the MLSoC. 
    This enhancement is part of our ongoing roadmap.

Follow the instructions below to build a sample application that uses the ResNet50 model to classify images.

.. tabs::
         
   .. tab:: Prerequisites

      - Follow this :ref:`instruction <Setup_PCIe_Mode>` to setup the development system in PCIe mode.
      - Download the `test image dataset <https://docs.sima.ai/assets/test_images.tar.xz>`_. This will be used by the host side application.
      - Download the optimized `ResNet50 model <https://docs.sima.ai/assets/resnet50_mpk.tar.gz>`_ and make it available to the Palette container environment. To learn more on how to optimize a standard Resnet50 ONNX model refer to this :ref:`link <optimize_model>`.

         .. code-block:: console

            sima-user@sima-user-machine:~$ mkdir -p ~/workspace/resnet50
            sima-user@sima-user-machine:~$ mkdir -p ~/workspace/hostapp
            sima-user@sima-user-machine:~$ cp resnet50_mpk.tar.gz ~/workspace/resnet50/ 
            sima-user@sima-user-machine:~$ cp test_images.tar.gz ~/workspace/hostapp/ 


   .. tab:: Create an MPK

      .. note:: 

         The following command is executed inside the Palette container environment.


      #. Optionally, depends on the input image resolution, edit the ``user_cfg.json`` file to modify the parameters shown below. The sample_cfg.json file is in ``/usr/local/simaai/utils/mpk_parser/user_cfg.json``.
   
         .. code-block:: console

            sima-user@docker-image-id:/home/docker/sima-cli/resnet50$ vi /usr/local/simaai/utils/mpk_parser/user_cfg.json
            {
               "img_width": 1920,
               "img_height": 1080,
               "input_width": 1920,
               "input_height": 1080,
               "output_width": 224,
               "output_height": 224,
               "input_depth": 3,
               "keep_aspect": 0,
               "norm_channel_params": [[1.0, 0, 0.003921569], [1.0, 0, 0.003921569], [1.0, 0, 0.003921569]],
               "normalize": 1,
               "input_type": "RGB",
               "output_type": "BGR",
               "scaling_type": "INTER_LINEAR",
               "padding_type": "CENTER",
               "ibufname": "input"
            }

      #. Using the MPK parser tool create an MPK file from the downloaded model ``resnet50_mpk.tar.gz``. Run the below commands inside the Palette container environment.

         .. code-block:: console

            sima-user@docker-image-id:/home/docker/sima-cli/resnet50$ mkdir -p /tmp/resnet50_mpk && tar -xzf resnet50_mpk.tar.gz -C /tmp/resnet50_mpk && rm /tmp/resnet50_mpk/preproc.json /tmp/resnet50_mpk/mla.json /tmp/resnet50_mpk/detess_dequant.json && tar -czf resnet50_mpk.tar.gz -C /tmp/resnet50_mpk . && rm -r /tmp/resnet50_mpk
            sima-user@docker-image-id:/home/docker/sima-cli/resnet50$ python3 /usr/local/simaai/utils/mpk_parser/m_parser.py -targz resnet50_mpk.tar.gz  -project /usr/local/simaai/app_zoo/Gstreamer/CPP_API_TestPipeline -cfg /usr/local/simaai/utils/mpk_parser/user_cfg.json
            File preproc.json copied to /usr/local/simaai/app_zoo/Gstreamer/CPP_API_TestPipeline/plugins/pre_process/cfg
            File postproc.json copied to /usr/local/simaai/app_zoo/Gstreamer/CPP_API_TestPipeline/plugins/post_process/cfg
            File mla.json copied to /usr/local/simaai/app_zoo/Gstreamer/CPP_API_TestPipeline/plugins/process_mla/cfg
            File ./sima_temp/resnet50_stage1_mla.lm copied to /usr/local/simaai/app_zoo/Gstreamer/CPP_API_TestPipeline/plugins/process_mla/res/process.lm
            File ./sima_temp/resnet50_mpk.json copied to /usr/local/simaai/app_zoo/Gstreamer/CPP_API_TestPipeline/resources/mpk.json
            ℹ Compiling a65-apps...
            ✔ a65-apps compiled successfully.
            ℹ Compiling Plugins...
            ✔ Plugins Compiled successfully.
            ℹ Copying Resources...
            ✔ Resources Copied successfully.
            ℹ Building Rpm...
            ✔ Rpm built successfully.
            ℹ Creating mpk file...
            ✔ Mpk file created successfully at /home/docker/sima-cli/resnet50/project.mpk .

            MPK Created successfully.

         The generated ``project.mpk`` file will be used by the host side application later to deploy to the DevKit.

   .. tab:: Create a Host Side Application

      Once the ResNet50 MPK is created, the pipeline is deployed and run on the DevKit. 
      Executing the MPK includes the following tasks:

      #. Read the image → Pre-process (resize, normalise) → MLSoC CPP Sync Inference → post-process (argmax) →  Overlay → save output.
      #. Save the output images in the output folder.
      #. Compile the example application on the Host PC outside the Docker SDK.
      #. You can download the Resnet50 example :download:`resnet50_pcie_application.tar.xz <https://docs.sima.ai/assets/resnet50_pcie_application.tar.xz>`.

      .. note:: 

         The following command is executed on the host side. To avoid permission conflicts with the Palette environment, it is recommended to create a separate project folder dedicated to the host application.

      .. code-block:: console
         
         sima-user@sima-user-machine:~/workspace/hostapp$ tar -xvf test_images.tar.xz
         sima-user@sima-user-machine:~/workspace/hostapp$ tar -xvf resnet50_pcie_application.tar.xz
         CMakeLists.txt
         imagenet1000_clsidx_to_labels.txt
         main.cpp
         resnet50_project.mpk

         sima-user@sima-user-machine:~/workspace/hostapp$ mkdir build && cd build
         sima-user@sima-user-machine:~/workspace/hostapp/build$ cmake ../
         -- The C compiler identification is GNU 11.4.0
         -- The CXX compiler identification is GNU 11.4.0
         -- Detecting C compiler ABI info
         -- Detecting C compiler ABI info - done
         -- Check for working C compiler: /usr/bin/cc - skipped
         -- Detecting C compile features
         -- Detecting C compile features - done
         -- Detecting CXX compiler ABI info
         -- Detecting CXX compiler ABI info - done
         -- Check for working CXX compiler: /usr/bin/c++ - skipped
         -- Detecting CXX compile features
         -- Detecting CXX compile features - done
         -- Found OpenCV: /usr/local (found version "4.6.0") 
         -- Configuring done
         -- Generating done
         -- Build files have been written to: /home/sima-user/workspace/hostapp/build

         sima-user@sima-user-machine:~/workspace/hostapp/build$ make
         [ 50%] Building CXX object CMakeFiles/test_img.dir/main.cpp.o
         [100%] Linking CXX executable test_img
         [100%] Built target test_img

   .. tab:: Execute the Host Side Application

         .. code-block:: console

               sima-user@sima-user-machine:~/workspace/hostapp/build$ ./test_img ../../resnet50/project.mpk

               Directory created or already exists: ./../output
               SiMaDevicePtr for GUID : sima_mla_c0
               is : 0x5627ab5bf9b0
               
               sima_send_mgmt_file: File Name: ../../resnet50/project.mpk
               sima_send_mgmt_file: File size 20184712
               sima_send_mgmt_file: Total Bytes sent 20184712 in 1 seconds
               Opening /dev/sima_mla_c0
               si_mla_create_data_queues: Data completion queue successfully created
               si_mla_create_data_queues: Data work queue successfully created
               si_mla_create_data_queues: Data receive queue successfully created
               loadModel is successful with modelPtr: 0x5627ab5c8b00
               runInferenceSynchronousloadModel_cppsdkpipeline
               modelPtr->inputShape.size:1
               modelPtr->inputShape:
               224 224 3
               modelPtr->outputShape.size:1
               modelPtr->outputShape:
               1 1 1000
               total Images:7
               ./../test_images/000000009448.jpg
               starting the run Synchronous Inference
               Time taken per iteration: 378 milliseconds
               Predicted label: 879: 'umbrella',
               Image with predicted label saved to: ./../output/000000009448.jpg

               ./../test_images/000000007784.jpg
               starting the run Synchronous Inference
               Time taken per iteration: 4 milliseconds
               Predicted label: 701: 'parachute, chute',
               Image with predicted label saved to: ./../output/000000007784.jpg

               ... ... ... 

   .. tab:: Inside The Host Side App

      The application workflow includes the following steps:

      - Enumerating available SiMa devices
      - Loading and running inference models
      - Preprocessing images for inference
      - Handling the inference results and printing them

      .. dropdown:: Device Initialization
         :animate: fade-in
         :color: secondary
         :open:                     

         Before performing inference, the application initializes and `enumerates <../api_reference/pcie_host_apis/cpp_api_references.html#_CPPv4N6simaai12SimaMlsocApi20enumerateDeviceGuidsEv>`_ available SiMa MLSoC devices using the `SimaMlsocApi <../api_reference/pcie_host_apis/cpp_api_references.html#_CPPv4N6simaai12SimaMlsocApiE>`_ interface:

         .. code:: CPP

            shared_ptr<simaai::SimaMlsocApi> simaDeviceInst = simaai::SimaMlsocApi::getInstance();
            vector<string> guids = simaDeviceInst->enumerateDeviceGuids();
            simaDeviceInst->setLogVerbosity(simaai::SiMaLogLevel::debug);

         For each device found, the application opens the device and prepares it for inference:

         .. code:: CPP

            shared_ptr<simaai::SiMaDevice>  SiMaDevicePtr = simaDeviceInst->openDevice(guids[i]);

      .. dropdown:: Model Loading
         :animate: fade-in
         :color: secondary
         :open:                     

         Once a device is initialized, the application loads a pre-trained model onto the MLSoC:

         .. code:: CPP

            simaai::SiMaModel model;
            std::vector<uint32_t> in_shape{224,224,3};
            std::vector<uint32_t> out_shape{1,1,1000};

            model.numInputTensors = 1;
            model.numOutputTensors = 1;
            model.outputBatchSize = 1;
            model.inputBatchSize = 1;
            model.inputShape.emplace_back(in_shape);
            model.outputShape.emplace_back(out_shape);

            shared_ptr<simaai::SiMaModel> modelPtr = simaDeviceInst->load(SiMaDevicePtr, model_path, model);


      .. dropdown:: Image Preprocessing
         :animate: fade-in
         :color: secondary
         :open:            

         As Resnet50 model expects 224x224 input size, images are preprocessed before inference, including resizing, normalization, and channel reordering:

         .. code:: CPP

            cv::Mat preprocess(const cv::Mat& input_image) {
               cv::Size target_size(224, 224);
               double width_ratio = static_cast<double>(target_size.width) / input_image.cols;
               double height_ratio = static_cast<double>(target_size.height) / input_image.rows;
               cv::Mat resized_image;

               if (width_ratio < height_ratio) {
                  int new_height = static_cast<int>(input_image.rows * width_ratio);
                  cv::resize(input_image, resized_image, cv::Size(target_size.width, new_height));
                  int top_padding = (target_size.height - new_height) / 2;
                  int bottom_padding = target_size.height - new_height - top_padding;
                  cv::copyMakeBorder(resized_image, resized_image, top_padding, bottom_padding, 0, 0, cv::BORDER_CONSTANT, cv::Scalar(0, 0, 0));
               } else {
                  int new_width = static_cast<int>(input_image.cols * height_ratio);
                  cv::resize(input_image, resized_image, cv::Size(new_width, target_size.height));
                  int left_padding = (target_size.width - new_width) / 2;
                  int right_padding = target_size.width - new_width - left_padding;
                  cv::copyMakeBorder(resized_image, resized_image, 0, 0, left_padding, right_padding, cv::BORDER_CONSTANT, cv::Scalar(0, 0, 0));
               }
               
               resized_image.convertTo(resized_image, CV_32FC3, 1.0 / 255.0);
               cv::Scalar mean(0.485, 0.456, 0.406);
               cv::Scalar std_dev(0.229, 0.224, 0.225);
               cv::subtract(resized_image, mean, resized_image);
               cv::divide(resized_image, std_dev, resized_image);
               cv::cvtColor(resized_image, resized_image, cv::COLOR_BGR2RGB);
               
               return resized_image;
            }

      .. dropdown:: Inference Execution
         :animate: fade-in
         :color: secondary
         :open:            

         The preprocessed image is loaded into an input tensor, and inference is executed `synchronously <../api_reference/pcie_host_apis/cpp_api_references.html#_CPPv4N6simaai12SimaMlsocApi14runSynchronousEK10shared_ptrI9SiMaModelERK14SiMaTensorListRK12SiMaMetaDataRK14SiMaTensorList>`_:

         .. code:: CPP

            memcpy(inputTensorsList[0].getPtr().get(), preprocessed_image.data, inputTensorsList[0].getSizeInBytes());

            simaai::SiMaErrorCode ret = simaDeviceInst->runSynchronous(modelPtr, inputTensorsList, metaData, outputTensorsList);

            if (ret != simaai::success) {
               cout << "runInference Failure" << endl;
            }

      .. dropdown:: Post-processing and Result Handling
         :animate: fade-in
         :color: secondary
         :open:                     

         The application extracts the classification result from the inference output tensor and saves the image with the predicted label:

         .. code:: CPP

            cv::Mat output(1, 1000, CV_32FC1, (char*)outputTensorsList[0].getPtr().get());                
            double max_val;
            cv::Point max_loc;
            cv::minMaxLoc(output, nullptr, &max_val, nullptr, &max_loc);
            int predicted_val = max_loc.x;

            if (predicted_val >= 0 && predicted_val < labels.size()) {
               std::string predicted_label = labels[predicted_val];
               writeTextToImage(image, image_path, predicted_label, outputFolder);
            } 


      .. dropdown:: Model and Device Cleanup
         :animate: fade-in
         :color: secondary
         :open:                           

         After inference is complete, the model is unloaded, and the device is disconnected:

         .. code:: CPP

            simaDeviceInst->unload(modelPtr);
            simaDeviceInst->closeDevice(SiMaDevicePtr);