.. _MLPerf Benchmark:

MLPerf Benchmark
################

This section provides an overview of how to run the MLPerf benchmark tests using the Palette Software. 
The official MLPerf benchmark is run on a board that is different from the one in the Development Kit. 
However, following the steps below using the board in the Development Kit will allow users to run the MLPerf benchmark 
and replicate the performance results published.

.. note:: 
    You can get the FPS (Frames Per Second), but you would not be able to get accuracy or power figures. 
    Accuracy requires a different and bigger dataset than what is shipped with SiMa’s Software Development Kit. 


Before you begin running the MLPerf tests, follow the steps below: 

#. Confirm that your board has been flashed with the latest tRoot and Yocto build. Please refer to :ref:`Firmware and Board Software Update` for more details.

    - To verify, run ``cat /etc/build`` and look for the build number. 
    - If the version needs to be upgraded, follow the instructions to flash the board or contact **support@sima.ai**. 

#. Connect your Developer Board to your laptop and make sure you can SSH to the board.

#. Download the following files within the ``.zip`` file using the download button:
        + mlperf_resnet50_dataset.dat 
        + check_accuracy.sh 
        + imagenet_accuracy.py 
        + val_map.txt


Accessing the MLPerf Files
==========================

.. button-link:: https://docs.sima.ai/pkg_downloads/SDK1.3.0/ml_perf.zip
    :color: primary
    :shadow:

    Download Now

#. Unzip the file.

    ..  code-block:: console

        sima-user@sima-user-machine:~$ cd ~/Downloads
        sima-user@sima-user-machine:~/Downloads$ unzip ml_perf.zip

#. SSH into the board.

    ..  code-block:: console

        sima-user@sima-user-machine:~$ ssh sima@10.42.0.241
            The authenticity of host '10.42.0.241 (10.42.0.241)' can't be established.
            ED25519 key fingerprint is SHA256:FbMdheLl0xLWy33YLEWUAcddRvjavYqg83rgnkFYcos.
            This key is not known by any other names
            Are you sure you want to continue connecting (yes/no/[fingerprint])? yes
            Warning: Permanently added '10.42.0.241' (ED25519) to the list of known hosts.
            sima@10.42.0.241's password: 
        davinci:~$

#. Change the directory to ``/data`` and secure Copy (``scp``) the above four files to the ``/data`` directory on the Development Kit (device).

    ..  code-block:: console

        davinci:/data# sudo scp <host_user_name>@<host_ip_address>:/path/to/datafile/mlperf_resnet50_dataset.dat . 


Running MLPerf Tests
====================

To run the Batch1, Batch8, and Batch14 performance and accuracy mode tests, follow the steps described below.

----------------------------
Batch1 Performance Mode Test
----------------------------

#. Go to the MLPerf directory in the Docker container.

    ..  code-block:: console

        sima-user@docker-image-id:/home# cd /usr/local/simaai/app_zoo/Gstreamer/MLPerf


#. Verify the dependencies section in the ``application.json`` file in the SDK.


    ..  code-block:: console

        sima-user@docker-image-id:/home# vi /usr/local/simaai/app_zoo/Gstreamer/MLPerf/application.json


#. Make sure the ``gst`` section in ``/usr/local/simaai/app_zoo/Gstreamer/MLPerf/application.json`` within the Pallete Docker container is updated as shown below:

    ..  code-block:: console

        "gst": "MLA_OCM=max LD_LIBRARY_PATH=\"/data/simaai/applications/MLPerf/lib\" gst-launch-1.0 --gst-plugin-path='/data/simaai/applications/MLPerf/lib' fakesrc ! ml_filter in-dims=1:3:224:224 out-dims=1:64 mlperf-run-type=0 mlperf-scenario=0 toy-mode=false output-path=\"/data/simaai/applications/MLPerf\"  inpath=\"/data/mlperf_resnet50_dataset.dat\" config=\"/data/simaai/applications/MLPerf/etc/bad_sparse_resnet50_v1_b1.config\" silent=true ! fakesink"


    Update the parameter values as shown below:

    | **“In-dims” to 1:3:224:224**
    | **“Out-dims” to 1:64**
    | **“Mlperf-run-type” to 0**
    | **“Mlperf-scenario” to 0**
    | **“Inpath” to "/data/mlperf_resnet50_dataset.dat”**
    | **“Config” to "/data/simaai/applications/MLPerf/etc/bad_sparse_resnet50_v1_b1.config\”**

    This will change the configuration to the current set of values that we want to evaluate.

#. After modifying the ``application.json`` file, create an ``mpk``.

    ..  code-block:: console

        sima-user@docker-image-id:/usr/local/simaai/app_zoo/Gstreamer/MLPerf# mpk create -s . -d .
            ℹ Step a65-apps COMPILE completed successfully.
            ℹ Step COMPILE completed successfully.
            ℹ Step COPY RESOURCE completed successfully
            ℹ Step RPM BUILD completed successfully.
            ✔ Successfully created MPK at '/usr/local/simaai/app_zoo/Gstreamer/MLPerf/project.mpk'

    By default an ``mpk`` file gets created with the name, ``“project.mpk”``.

#. Connect to the device using the IP address.

    ..  code-block:: console

        sima-user@docker-image-id:/usr/local/simaai/app_zoo/Gstreamer/MLPerf# mpk device connect -t sima@<your-device_ip>
            ℹ Please enter the password for 10.42.0.241 🔐 : 
            ℹ Connecting to sima@10.42.0.241...
            ✔ Connection established to 10.42.0.241 .


#. Enter the password for the connection. By default the password is **edgeai** unless you already changed the default password to your own password.

#. Deploy the ``mpk`` on the device.

    ..  code-block:: console

        sima-user@docker-image-id:/usr/local/simaai/app_zoo/Gstreamer/MLPerf# mpk deploy -f project.mpk -t <your_device_ip>
            🚀 Sending MPK to 10.42.0.241...
            Transfer Progress for project.mpk:  100.00% 
            🏁 MPK sent successfully!
            ✔ MPK Deployed! ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:00
            ✔ MPK Deployment is successful for project.mpk.


#. Upon successful deployment of the ``mpk``, memory gets allocated in the device (log prints can be seen on the device console) and the ``gst process`` starts in the device (check running process list on device using the ``top`` or ``ps`` commands):

    ..  code-block:: console

        davinci:/data$ top
            Mem: 4291844K used, 89864K free, 199220K shrd, 7504K buff, 3604120K cached
            CPU:   9% usr  10% sys   0% nic  79% idle   0% io   0% irq   0% sirq
            Load average: 1.60 1.33 0.72 4/162 606
            PID  PPID USER     STAT   VSZ %VSZ %CPU COMMAND
            546   305 root     S    8476m 195%  14% gst-launch-1.0 --gst-plugin-path=/data/simaai/applications/MLPerf/lib fakesrc


#. The performance mode test runs for **approximately 14 to 15** minutes. Wait until the test runs are complete.

#. Upon completion of the test, memory gets deallocated (log prints can be seen in the log file ``/var/log/simaai.log``) and the ``gst`` process ends.

#. Verify the results on the device as **root user** at the ``/data/simaai/applications/MLPerf`` directory. View the test summary in the file ``mlperf_log_summary.txt``.

    Follow the steps below to verify test results:

    ..  code-block:: console

        davinci:~$ sudo cat /data/simaai/applications/MLPerf/mlperf_log_summary.txt 
            ================================================
            MLPerf Results Summary
            ================================================
            SUT name : 
            Scenario : SingleStream
            Mode     : PerformanceOnly
            90th percentile latency (ns) : 892472
            Result is : VALID
            Min duration satisfied : Yes
            Min queries satisfied : Yes
            Early stopping satisfied: Yes
            Early Stopping Result:
            * Processed at least 64 queries (688100).
            * Would discard 68230 highest latency queries.
            * Early stopping 90th percentile estimate: 892664
            * Early stopping 99th percentile estimate: 940977

            ================================================
            Additional Stats
            ================================================
            QPS w/ loadgen overhead         : 1146.83
            QPS w/o loadgen overhead        : 1175.86

            Min latency (ns)                : 797311
            Max latency (ns)                : 7575694
            Mean latency (ns)               : 850443
            50.00 percentile latency (ns)   : 840563
            90.00 percentile latency (ns)   : 892472
            95.00 percentile latency (ns)   : 907110
            97.00 percentile latency (ns)   : 917172
            99.00 percentile latency (ns)   : 940319
            99.90 percentile latency (ns)   : 1001692

            ================================================
            Test Parameters Used
            ================================================
            samples_per_query : 1
            target_qps : 1113.27
            target_latency (ns): 0
            max_async_queries : 1
            min_duration (ms): 600000
            max_duration (ms): 0
            min_query_count : 50000
            max_query_count : 0
            qsl_rng_seed : 148687905518835231
            sample_index_rng_seed : 520418551913322573
            schedule_rng_seed : 811580660758947900
            accuracy_log_rng_seed : 0
            accuracy_log_probability : 0
            accuracy_log_sampling_target : 0
            print_timestamps : 0
            performance_issue_unique : 0
            performance_issue_same : 0
            performance_issue_same_index : 0
            performance_sample_count : 2048

            No warnings encountered during test.

            No errors encountered during test.


    Under the **“Additional Stats”** section, compare QPS w/ and w/o loadgen stats with reference values.

    ..  code-block:: 

        QPS w/ loadgen overhead : 1030.15
        QPS w/o loadgen overhead : 1052.92


#. Use the ``mpk remove`` command to free up disk space on the device before running a new test. For details on how to use the mpk commands, see the section on :ref:`MPK Tool`.


-------------------------
Batch1 Accuracy Mode Test
-------------------------


#. Make sure that the **Batch1** performance mode test is run before running the accuracy mode test.

#. Make sure to **update** the ``gst`` command in the ``application.json`` as shown in the code below (with ``“mlperf-run-type”=1``) and then save and close the file.

    ..  code-block:: console
    
        "gst": "MLA_OCM=max LD_LIBRARY_PATH=\"/data/simaai/applications/MLPerf/lib\" gst-launch-1.0 --gst-plugin-path='/data/simaai/applications/MLPerf/lib' fakesrc ! ml_filter in-dims=1:3:224:224 out-dims=1:64 mlperf-run-type=1 mlperf-scenario=0 toy-mode=false output-path=\"/data/simaai/applications/MLPerf\"  inpath=\"/data/mlperf_resnet50_dataset.dat\" config=\"/data/simaai/applications/MLPerf/etc/bad_sparse_resnet50_v1_b1.config\" silent=true ! fakesink


    | **Parameters values to change:**
    | **“Mlperf-run-type” to 1**

#. Create ``mpk``, ``deploy``, and wait till the test ends. The test run will take approximately approximately 4 to 5 minutes.


#. After the test run is complete, memory deallocation messages can be seen in the log file ``/var/log/simaai.log``.


#. For verifying results, use the validation script stored in the ``/data`` directory.

    ..  code-block:: console

        davinci:~$ cd /data

#. Change the file permissions, if required, by running the following command.

    ..  code-block:: console

        davinci:/data$ sudo chmod 777 ./check_accuracy.sh


#. Run the validation script.

    ..  code-block:: console

        davinci:/data$ ./check_accuracy.sh /data/simaai/applications/MLPerf/mlperf_log_accuracy.json

#. The output numbers should match the values shown below.

    ..  code-block::

        accuracy=75.698%, good=37849, total=50000

#. Use the ``mpk remove`` command to free up disk space on the device before running a new test.


----------------------------
Batch8 Performance Mode Test
----------------------------

#. Go to the ``MLPerf`` directory in the Docker container.

    ..  code-block:: console

        sima-user@docker-image-id:# cd /usr/local/simaai/app_zoo/Gstreamer/MLPerf


#. Make sure the ``gst`` command has been updated in the ``application.json``, as shown below.

    ..  code-block:: console
        
        "gst": "MLA_OCM=max LD_LIBRARY_PATH=\"/data/simaai/applications/MLPerf/lib\" gst-launch-1.0 --gst-plugin-path='/data/simaai/applications/MLPerf/lib' fakesrc ! ml_filter in-dims=8:3:224:224 out-dims=8:64 mlperf-run-type=0 mlperf-scenario=1 toy-mode=false output-path=\"/data/simaai/applications/MLPerf\"  inpath=\"/data/mlperf_resnet50_dataset.dat\" config=\"/data/simaai/applications/MLPerf/etc/bad_sparse_resnet50_v1_b8.config\" silent=true ! fakesink"


    | **Parameters values to change:**
    | **“In-dims” to 8:3:224:224**
    | **“Out-dims” to 8:64**
    | **“Mlperf-run-type” to 0**
    | **“Mlperf-scenario” to 1**
    | **“Inpath” to "/data/mlperf_resnet50_dataset.dat”**
    | **“Config” to "/data/simaai/applications/MLPerf/etc/bad_sparse_resnet50_v1_b8.config\”**


#. Create ``mpk`` in the SDK and deploy.

    ..  code-block:: console

        sima-user@docker-image-id:/usr/local/lib/simaai/MLPerf# mpk create -s . -d .
        sima-user@docker-image-id:/usr/local/lib/simaai/MLPerf# mpk deploy -f project.mpk -t <your_device_ip>


#. Check for memory allocation messages in the device console and cross-check the ``gst`` command in the device process tree.

    ..  code-block:: console

        davinci:~$ top



#. Wait till the test run is complete (memory deallocation happens) and check for the ``mlperf summary log`` file in the device root ``(/data/simaai/applications/MLPerf/)`` directory.


#. The log summary should appear as shown below. The result files are stored under the ``same directory path /data/simaai/applications/MLPerf/`` directory.

    ..  code-block:: console

        davinci:/data$ sudo cat ./simaai/applications/MLPerf/mlperf_log_summary.txt 
            ================================================
            MLPerf Results Summary
            ================================================
            SUT name :
            Scenario : MultiStream
            Mode     : PerformanceOnly
            99th percentile latency (ns) : 2927200
            Result is : VALID
            Min duration satisfied : Yes
            Min queries satisfied : Yes
            Early stopping satisfied: Yes
            Early Stopping Result:
            * Processed at least 662 queries (207154).
            * Would discard 1965 highest latency queries.
            * Early stopping 99th percentile estimate: 2928160
            ================================================
            Additional Stats
            ================================================
            Per-query latency:
            Min latency (ns)                : 2815320
            Max latency (ns)                : 5802920
            Mean latency (ns)               : 2873437
            50.00 percentile latency (ns)   : 2872120
            90.00 percentile latency (ns)   : 2895280
            95.00 percentile latency (ns)   : 2903880
            97.00 percentile latency (ns)   : 2910600
            99.00 percentile latency (ns)   : 2927200
            99.90 percentile latency (ns)   : 2998560
            ================================================
            Test Parameters Used
            ================================================
            samples_per_query : 8
            target_qps : 348.432
            target_latency (ns): 0
            max_async_queries : 1
            min_duration (ms): 600000
            max_duration (ms): 0
            min_query_count : 50000
            max_query_count : 0
            qsl_rng_seed : 148687905518835231
            sample_index_rng_seed : 520418551913322573
            schedule_rng_seed : 811580660758947900
            accuracy_log_rng_seed : 0
            accuracy_log_probability : 0
            accuracy_log_sampling_target : 0
            print_timestamps : 0
            performance_issue_unique : 0
            performance_issue_same : 0
            performance_issue_same_index : 0
            performance_sample_count : 1024
            1 warning encountered. See detailed log.
            No errors encountered during test.


#. Check the **Additional Stats** section in the **mlperf_log_summary** file for **Mean latency (ns)** and compare against the reference value shown below.

    ..  code-block::
    
        Mean latency (ns) : 2873437

#. Use the ``mpk remove`` command to free up disk space on the device before running a new test. For details on how to use the mpk commands, see the section on :ref:`MPK Tool`.


-------------------------
Batch8 Accuracy Mode Test
-------------------------

#. Make sure the **Batch8** performance mode test is run before running the **Batch8** accuracy mode test.

#. Update the ``application.json`` file and the ``gst`` command, as shown below. That is, the ``mlperf-run-type=1 mlperf-scenario=1`` in the applicaton json.

    ..  code-block:: console

        "gst": "MLA_OCM=max LD_LIBRARY_PATH=\"/data/simaai/applications/MLPerf/lib\" gst-launch-1.0 --gst-plugin-path='/data/simaai/applications/MLPerf/lib' fakesrc ! ml_filter in-dims=8:3:224:224 out-dims=8:64 mlperf-run-type=1 mlperf-scenario=1 toy-mode=false output-path=\"/data/simaai/applications/MLPerf\"  inpath=\"/data/mlperf_resnet50_dataset.dat\" config=\"/data/simaai/applications/MLPerf/etc/bad_sparse_resnet50_v1_b8.config\" silent=true ! fakesink"


    | **Parameters values to change:**
    | **“In-dims” to 8:3:224:224**
    | **“Out-dims” to 8:64**
    | **“Mlperf-run-type” to 1**
    | **“Mlperf-scenario” to 1**
    | **“Inpath” to "/data/mlperf_resnet50_dataset.dat”**
    | **“Config” to "/data/simaai/applications/MLPerf/etc/bad_sparse_resnet50_v1_b8.config\”**



#. Follow steps **3** through **8** of the **“Batch1 Accuracy Mode Test”**.

    Run the validation script.

    ..  code-block:: console

        davinci:/data$ ./check_accuracy.sh ./simaai/applications/MLPerf/mlperf_log_accuracy.json


    The output numbers should match the values shown here : ``accuracy=75.990%, good=37995, total=50000``


#. Use the ``mpk remove`` command to free up disk space on the device before running a new test. For details on how to use the mpk commands, see the section on :ref:`MPK Tool`.


-----------------------------
Batch14 Performance Mode Test
-----------------------------

#. Go to the ``MLPerf`` directory in the Docker container.

    ..  code-block:: console

        sima-user@docker-image-id:/home# cd /usr/local/simaai/app_zoo/Gstreamer/MLPerf


#. Modify the ``application.json`` file to update the parameters, as shown below.

    ..  code-block:: console

        "gst": "MLA_OCM=max LD_LIBRARY_PATH=\"/data/simaai/applications/MLPerf/lib\" gst-launch-1.0 --gst-plugin-path='/data/simaai/applications/MLPerf/lib' fakesrc ! ml_filter in-dims=14:3:224:224 out-dims=14:64 mlperf-run-type=0 mlperf-scenario=2 toy-mode=false output-path=\"/data/simaai/applications/MLPerf\"  inpath=\"/data/mlperf_resnet50_dataset.dat\" config=\"/data/simaai/applications/MLPerf/etc/bad_sparse_resnet50_v1_b14.config\" silent=true ! fakesink"


    | **Parameters values to change:**
    | **“In-dims” to 14:3:224:224**
    | **“Out-dims” to 14:64**
    | **“Mlperf-run-type” to 0**
    | **“Mlperf-scenario” to 2**
    | **“Inpath” to "/data/mlperf_resnet50_dataset.dat”**
    | **“Config” to "/data/simaai/applications/MLPerf/etc/bad_sparse_resnet50_v1_b14.config\”**


#. Create the ``mpk`` in SDK and deploy.

    ..  code-block:: console

        sima-user@docker-image-id:/usr/local/simaai/app_zoo/Gstreamer/MLPerf# mpk create -s . -d. 
        sima-user@docker-image-id:/usr/local/simaai/app_zoo/Gstreamer/MLPerf# mpk deploy -f project.mpk -t <device-ip>

#. Check for memory allocation messages in the device console and cross-check the ``gst`` command in the device process tree by running the following command: 

    ..  code-block:: console

        davinci:~$ top


#. Wait until the test is complete (memory deallocation happens) and check for the logs in the device as root in the ``/data/simaai/applications/MLPerf/`` directory. Generally the test takes 20 to 22 minutes. The summary log should appear as shown below.

    ..  code-block:: console

        davinci:~$ sudo cat /data/simaai/applications/MLPerf/mlperf_log_summary.txt 
            ================================================
            MLPerf Results Summary
            ================================================
            SUT name :
            Scenario : Offline
            Mode     : PerformanceOnly
            Samples per second: 3397.96
            Result is : VALID
            Min duration satisfied : Yes
            Min queries satisfied : Yes
            Early stopping satisfied: Yes

            ================================================
            Additional Stats
            ================================================
            Min latency (ns)                : 1529062681
            Max latency (ns)                : 922611782310
            Mean latency (ns)               : 462219336486
            50.00 percentile latency (ns)   : 462310204729
            90.00 percentile latency (ns)   : 830534473673
            95.00 percentile latency (ns)   : 876460296204
            97.00 percentile latency (ns)   : 894974228836
            99.00 percentile latency (ns)   : 913361873441
            99.90 percentile latency (ns)   : 921692414874

            ================================================
            Test Parameters Used
            ================================================
            samples_per_query : 3135000
            target_qps : 4750
            target_latency (ns): 0
            max_async_queries : 1
            min_duration (ms): 600000
            max_duration (ms): 0
            min_query_count : 1
            max_query_count : 0
            qsl_rng_seed : 148687905518835231
            sample_index_rng_seed : 520418551913322573
            schedule_rng_seed : 811580660758947900
            accuracy_log_rng_seed : 0
            accuracy_log_probability : 0
            accuracy_log_sampling_target : 0
            print_timestamps : 0
            performance_issue_unique : 0
            performance_issue_same : 0
            performance_issue_same_index : 0
            performance_sample_count : 50000

            1 warning encountered. See detailed log.

            No errors encountered during test.

#. Verify the samples per second against the reference value shown here ``Samples per second: 3397.96``.


--------------------------
Batch14 Accuracy Mode Test
--------------------------

#. Make sure the **Batch14** performance mode test has been run before running the **Batch14** accuracy mode test.

#. Make sure to update the ``gst`` command in the ``application.json`` as shown below.

    ..  code-block:: console

        "gst": "MLA_OCM=max LD_LIBRARY_PATH=\"/data/simaai/applications/MLPerf/lib\" gst-launch-1.0 --gstplugin-path='/data/simaai/applications/MLPerf/lib' fakesrc ! \
        ml_filter indims=14:3:224:224 out-dims=14:64 mlperf-run-type=1 mlperf-scenario=2 toy-mode=false inpath=\"/data/mlperf_resnet50_dataset.dat\" \
        output-path=\"/data/simaai/applications/MLPerf\" config=\"/data/simaai/applications/MLPerf/etc/bad_sparse_resnet50_v1_b14.config\" \
        silent=true ! fakesink"

    | **Parameters values to change:**
    | **“In-dims” to 14:3:224:224**
    | **“Out-dims” to 14:64**
    | **“Mlperf-run-type” to 1**
    | **“Mlperf-scenario” to 2**
    | **“Inpath” to "/data/mlperf_resnet50_dataset.dat”**
    | **“Config” to "/data/simaai/applications/MLPerf/etc/bad_sparse_resnet50_v1_b14.config\”**



#. Follow steps **"3 through 8"** of the **“Batch1 Accuracy Mode Test”**.