.. _developing_gstreamer_app_gstreamer_inference_mla:

Step 2: Run and verify the output of ``simaaiprocessmla`` MLA process
#####################################################################

.. image:: media/resnet50_application_simaaisrc_simaaiprocessmla.jpg
    :align: center
    :scale: 30%

|

In this section we will explore running through the MLA plugin in order to run the ML model. 

We have two options going forward:

#. We can only run the MLA plugin to ensure that we are getting the right output. If so, we will use the ``simaaisrc`` with the output ``/tmp/generic_preproc-001.out`` from the previous step as input and feed directly into the ``simaaiprocessmla`` plugin.
#. We can simply expand the pipeline to now include the MLA step. In this guide, we will go with this step.

Before running the ``simaaiprocessmla`` plugin to perform inference on the MLA, we need to configure the json file for the plugin and ensure that we have saved the model locally on the board.

Copy the model to the MLSoC
===========================

Copy the quantized and compiled model from the  Palette docker on the host machine to the MLSoC:

.. code-block:: console

    sima-user@docker-image-id$ scp models/compiled_resnet50/quantized_resnet50_mpk.tar.gz sima@<IP address of MLSoC>:/home/sima/resnet50_example_app/models/    

From the MLSoC shell prompt, lets extract the contents:

.. code-block:: console

    davinci:~/resnet50_example_app/models$ tar xvf quantized_resnet50_mpk.tar.gz 
        quantized_resnet50_stage1_mla.lm
        quantized_resnet50_mpk.json
        quantized_resnet50_stage1_mla_stats.yaml


Creating the JSON configuration file
====================================

On the MLSoC, create the JSON configuration in ``/home/sima/resnet50_example_app/app_configs``.


.. code-block:: console

    davinci:~/resnet50_example_app/app_configs$ ls

Run the following command:

.. code-block:: bash

    echo '{ 
        "version" : 0.1,
        "node_name" : "mla-resnet",
        "simaai__params" : {
            "params" : 15,
            "index" : 1,
            "cpu" : 4,
            "next_cpu" : 1, 
            "out_sz" : 1008,
            "no_of_outbuf" : 1,
            "batch_size" : 1,
            "batch_sz_model" : 1,
            "in_tensor_sz": 0, 
            "out_tensor_sz": 0,
            "ibufname" : "generic_preproc",
            "model_path" : "/home/sima/resnet50_example_app/models/quantized_resnet50_stage1_mla.lm",
            "debug" : 0,
            "dump_data" : 1
        }
    }' > simaaiprocessmla_cfg_params.json


The GStreamer string update
===========================

Let's update the previous ``run_pipeline.sh`` script to include our new plugin.

.. code-block:: bash

    #!/bin/bash

    # Constants
    APP_DIR=/home/sima/resnet50_example_app
    DATA_DIR="${APP_DIR}/data"
    SIMA_PLUGINS_DIR="${APP_DIR}/../gst-plugins"
    SAMPLE_IMAGE_SRC="${DATA_DIR}/golden_retriever_207_rgb.bin"
    CONFIGS_DIR="${APP_DIR}/app_configs"
    PREPROC_CVU_CONFIG_BIN="${CONFIGS_DIR}/genpreproc_200_cvu_cfg_app"
    PREPROC_CVU_CONFIG_JSON="${CONFIGS_DIR}/genpreproc_200_cvu_cfg_params.json"
    INFERENCE_MLA_CONFIG_JSON="${CONFIGS_DIR}/simaaiprocessmla_cfg_params.json"

    # Remove any existing temporary files before running
    rm /tmp/generic_preproc*.out

    # Run the configuration app for generic_preproc
    $PREPROC_CVU_CONFIG_BIN $PREPROC_CVU_CONFIG_JSON

    # Run the application
    export LD_LIBRARY_PATH="${SIMA_PLUGINS_DIR}"
    gst-launch-1.0 -v --gst-plugin-path="${SIMA_PLUGINS_DIR}" \
    simaaisrc mem-target=1 node-name="my_image_src" location="${SAMPLE_IMAGE_SRC}" num-buffers=1 ! \
    simaaiprocesscvu source-node-name="my_image_src" buffers-list="my_image_src" config="$PREPROC_CVU_CONFIG_JSON" name="generic_preproc" ! \
    simaaiprocessmla config="${INFERENCE_MLA_CONFIG_JSON}" name="mla_inference" ! \
    fakesink

To run the application:

.. code:: console

    davinci:~/resnet50_example_app$ sudo sh run_pipeline.sh 
        Password: 
        Completed SIMA_GENERIC_PREPROC graph configure 
        ** Message: 04:37:40.073: Num of chunks 1
        ** Message: 04:37:40.073: Buffer_name: my_image_src, num_of_chunks:1

        (gst-launch-1.0:2398): GLib-GObject-CRITICAL **: 04:37:40.084: g_pointer_type_register_static: assertion 'g_type_from_name (name) == 0' failed

        (gst-launch-1.0:2398): GLib-GObject-CRITICAL **: 04:37:40.085: g_type_set_qdata: assertion 'node != NULL' failed

        (gst-launch-1.0:2398): GLib-GObject-CRITICAL **: 04:37:40.085: g_pointer_type_register_static: assertion 'g_type_from_name (name) == 0' failed

        (gst-launch-1.0:2398): GLib-GObject-CRITICAL **: 04:37:40.086: g_type_set_qdata: assertion 'node != NULL' failed
        Setting pipeline to PAUSED ...
        ** Message: 04:37:40.093: Initialize dispatcher
        ** Message: 04:37:40.094: handle: 0xa3b295b0, 0xffffa3b295b0
        ** Message: 04:37:41.238: Loaded model from location /data/simaai/building_apps_palette/gstreamer/resnet50_example_app/models/quantized_resnet50_stage1_mla.lm, model:hdl: 0xaaaae079eaa0
        ** Message: 04:37:41.242: Filename memalloc = /data/simaai/building_apps_palette/gstreamer/resnet50_example_app/data/golden_retriever_207_rgb.bin
        Pipeline is PREROLLING ...
        Pipeline is PREROLLED ...
        Setting pipeline to PLAYING ...
        Redistribute latency...
        New clock: GstSystemClock
        Got EOS from element "pipeline0".
        Execution ended after 0:00:00.001474163
        Setting pipeline to NULL ...
        Freeing pipeline ...

You will see the output of the CVU preprocess and the MLA inference in the ``/tmp/`` folder:

.. code-block:: console

    davinci:~/resnet50_example_app$ ls /tmp/*.out
        generic_preproc-001.out  mla-resnet-1.out

Verifying the output
====================

Just like the input of the MLA needs to be ``quantized`` and ``tesselated``, the output of the MLA is still ``quantized`` and ``tesselated``.
Thus, any reference we are going to compare against, also needs to be in the same form, or, we must ``dequantize`` and ``detesselate`` before verifying the output.

#. Let's first take a look at the output from the plugin:

.. code-block:: console

    davinci:~/resnet50_example_app$ hexdump -C /tmp/mla-resnet-1.out
        00000000  80 80 80 80 80 80 80 80  80 80 80 80 80 80 80 80  |................|
        *
        000000c0  80 80 80 80 80 80 80 80  80 80 80 80 80 80 80 7f  |................|
        000000d0  81 80 80 80 80 80 80 80  81 80 80 80 80 80 80 80  |................|
        000000e0  80 80 80 80 80 80 80 80  80 80 80 80 80 80 80 80  |................|
        *
        000003e0  80 80 80 80 80 80 80 80  00 00 00 00 00 00 00 00  |................|
        000003f0

As noted earlier, this output should be interpreted as ``1008`` values representing the output of the ``softmax`` function from the network of type ``int8`` (2's complement).

.. note:: 

    Why ``1008`` values and not ``1000`` as expected for the ResNet50 output? This is just due to memory alignment requirements (``tesselation``) for the MLA.
    When the output is ``detesselated``, we will again have the output having ``1000`` values.

.. note:: 

    The ``*`` between values of the ``hexdump`` signify that there are n lines before that contain repeated values.
    In order to view the entire output, you can use the ``-v`` flag:

    ``hexdump -C -v /tmp/mla-resnet-1.out``

Using a Python on the MLSoC, let's manually ``dequantize`` the output in order to see if the top 3 results match our expectations:

.. code-block:: console

    davinci:~/resnet50_example_app$ vi print_mla_top_output_indices.py

Copy the following inside the script:

.. code-block:: python

    import numpy as np

    # Step 1: Read the binary file as int8 values
    mla_data = np.fromfile('/tmp/mla-resnet-1.out', dtype=np.int8)[9:]

    # Step 2: Dequantize (values from *_mpk.json for the output dequantization node)
    dequantize_scale, dequantized_zero_point = 255.02200010497842, -128
    dequantized_data = (mla_data - dequantized_zero_point).astype(np.float32) / dequantize_scale

    # Step 3: Find the indices of the top 3 largest values
    top_3_indices = np.argpartition(dequantized_data, -3)[-3:]

    # Step 4: Sort the top 3 indices by the actual values (descending)
    top_3_indices = top_3_indices[np.argsort(-dequantized_data[top_3_indices])]

    # Print the results
    print("Top 3 largest values and their indices:")
    for idx in top_3_indices:
        print(f"Index: {idx}")


.. note:: 

    The values in the script can be extracted from the compiled ``.tar.gz`` ``*_mpk.json`` found under: 
    
    * dequantize_scale = ``plugins[5] → config_params → params → channel_params[0][0]``
    * dequantized_zero_point = ``plugins[5] → config_params → params → channel_params[0][1]``

    ``mla_data`` is gathered without the first 8 pixels (``[9:]``) in order to remove 0's that are a result of ``tesselation``.

Run the script to take a look at the highest value classes to get an idea if the expected ``207`` class is the top class:

.. code-block:: console

    davinci:~/resnet50_example_app$ python3 print_mla_top_output_indices.py 
        Top 3 largest values and their indices:
        Index: 207
        Index: 199
        Index: 332

Excellent, that is what we expected.

Conclusion and next steps
=========================

In this section, we: 

    * Went through the steps of setting up the ``simaaiprocessmla`` plugin to run inference using a model that was compiled using the ModelSDK.
    * Ran and verified the output given the ``dump`` of the plugin as configured in the JSON and the output from our python reference application

Next, we will add another CVU graph in order to ``detesselate`` and ``dequantize`` the output from the MLA.