MLPerf Benchmark

This section provides an overview of how to run the MLPerf benchmark tests using the Palette Software. The official MLPerf benchmark is run on a board that is different from the one in the Development Kit. However, following the steps below using the board in the Development Kit will allow users to run the MLPerf benchmark and replicate the performance results published.

Note

You can get the FPS (Frames Per Second), but you would not be able to get accuracy or power figures. Accuracy requires a different and bigger dataset than what is shipped with SiMa’s Software Development Kit.

Before you begin running the MLPerf tests, follow the steps below:

Confirm that your board has been flashed with the latest tRoot and Yocto build. Please refer to Firmware and Board Software Update for more details.
- To verify, run cat /etc/buildinfo and look for the build number.
- If the version needs to be upgraded, follow the instructions to flash the board or contact support@sima.ai.
Connect your Developer Board to your laptop and make sure you can SSH to the board.
Download the following files within the .zip file using the download button:
- mlperf_resnet50_dataset.dat
- check_accuracy.sh
- imagenet_accuracy.py
- val_map.txt

Accessing the MLPerf Files

Download Now

Unzip the file.

sima-user@sima-user-machine:~$ cd ~/Downloads
sima-user@sima-user-machine:~/Downloads$ unzip ml_perf.zip

SSH into the board.

sima-user@sima-user-machine:~$ ssh sima@10.42.0.241
    The authenticity of host '10.42.0.241 (10.42.0.241)' can't be established.
    ED25519 key fingerprint is SHA256:FbMdheLl0xLWy33YLEWUAcddRvjavYqg83rgnkFYcos.
    This key is not known by any other names
    Are you sure you want to continue connecting (yes/no/[fingerprint])? yes
    Warning: Permanently added '10.42.0.241' (ED25519) to the list of known hosts.
    sima@10.42.0.241's password:
davinci:~$

Change the directory to /data and secure Copy (scp) the above four files to the /data directory on the Development Kit (device).
davinci:/data# sudo scp <host_user_name>@<host_ip_address>:/path/to/datafile/mlperf_resnet50_dataset.dat .

Running MLPerf Tests

To run the Batch1, Batch8, and Batch14 performance and accuracy mode tests, follow the steps described below.

Batch1 Performance Mode Test

Go to the MLPerf directory in the Docker container.

sima-user@docker-image-id:/home# cd /usr/local/simaai/app_zoo/Gstreamer/MLPerf

Verify the dependencies section in the application.json file in the SDK.

sima-user@docker-image-id:/home# vi /usr/local/simaai/app_zoo/Gstreamer/MLPerf/application.json

Make sure the gst section in /usr/local/simaai/app_zoo/Gstreamer/MLPerf/application.json within the Pallete Docker container is updated as shown below:
"gst": "MLA_OCM=max LD_LIBRARY_PATH=\"/data/simaai/applications/MLPerf/lib\" gst-launch-1.0 --gst-plugin-path='/data/simaai/applications/MLPerf/lib' fakesrc ! ml_filter in-dims=1:3:224:224 out-dims=1:64 mlperf-run-type=0 mlperf-scenario=0 toy-mode=false output-path=\"/data/simaai/applications/MLPerf\" inpath=\"/data/mlperf_resnet50_dataset.dat\" config=\"/data/simaai/applications/MLPerf/etc/bad_sparse_resnet50_v1_b1.config\" silent=true ! fakesink"
Update the parameter values as shown below:

“In-dims” to 1:3:224:224

“Out-dims” to 1:64

“Mlperf-run-type” to 0

“Mlperf-scenario” to 0

“Inpath” to “/data/mlperf_resnet50_dataset.dat”

“Config” to “/data/simaai/applications/MLPerf/etc/bad_sparse_resnet50_v1_b1.config”

This will change the configuration to the current set of values that we want to evaluate.

After modifying the application.json file, create an mpk.

sima-user@docker-image-id:/usr/local/simaai/app_zoo/Gstreamer/MLPerf# mpk create -s . -d .
    ℹ Step a65-apps COMPILE completed successfully.
    ℹ Step COMPILE completed successfully.
    ℹ Step COPY RESOURCE completed successfully
    ℹ Step RPM BUILD completed successfully.
    ✔ Successfully created MPK at '/usr/local/simaai/app_zoo/Gstreamer/MLPerf/project.mpk'

By default an mpk file gets created with the name, “project.mpk”.

Connect to the device using the IP address.

sima-user@docker-image-id:/usr/local/simaai/app_zoo/Gstreamer/MLPerf# mpk device connect -t sima@<your-device_ip>
    ℹ Please enter the password for 10.42.0.241 🔐 :
    ℹ Connecting to sima@10.42.0.241...
    ✔ Connection established to 10.42.0.241 .

Enter the password for the connection. By default the password is edgeai unless you already changed the default password to your own password.

Deploy the mpk on the device.

sima-user@docker-image-id:/usr/local/simaai/app_zoo/Gstreamer/MLPerf# mpk deploy -f project.mpk -t <your_device_ip>
    🚀 Sending MPK to 10.42.0.241...
    Transfer Progress for project.mpk:  100.00%
    🏁 MPK sent successfully!
    ✔ MPK Deployed! ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:00
    ✔ MPK Deployment is successful for project.mpk.

Upon successful deployment of the mpk, memory gets allocated in the device (log prints can be seen on the device console) and the gst process starts in the device (check running process list on device using the top or ps commands):

davinci:/data$ top
    Mem: 4291844K used, 89864K free, 199220K shrd, 7504K buff, 3604120K cached
    CPU:   9% usr  10% sys   0% nic  79% idle   0% io   0% irq   0% sirq
    Load average: 1.60 1.33 0.72 4/162 606
    PID  PPID USER     STAT   VSZ %VSZ %CPU COMMAND
    546   305 root     S    8476m 195%  14% gst-launch-1.0 --gst-plugin-path=/data/simaai/applications/MLPerf/lib fakesrc

The performance mode test runs for approximately 14 to 15 minutes. Wait until the test runs are complete.
Upon completion of the test, memory gets deallocated (log prints can be seen in the log file /var/log/simaai.log) and the gst process ends.

Verify the results on the device as root user at the /data/simaai/applications/MLPerf directory. View the test summary in the file mlperf_log_summary.txt.

Follow the steps below to verify test results:

davinci:~$ sudo cat /data/simaai/applications/MLPerf/mlperf_log_summary.txt
    ================================================
    MLPerf Results Summary
    ================================================
    SUT name :
    Scenario : SingleStream
    Mode     : PerformanceOnly
    90th percentile latency (ns) : 892472
    Result is : VALID
    Min duration satisfied : Yes
    Min queries satisfied : Yes
    Early stopping satisfied: Yes
    Early Stopping Result:
    * Processed at least 64 queries (688100).
    * Would discard 68230 highest latency queries.
    * Early stopping 90th percentile estimate: 892664
    * Early stopping 99th percentile estimate: 940977

    ================================================
    Additional Stats
    ================================================
    QPS w/ loadgen overhead         : 1146.83
    QPS w/o loadgen overhead        : 1175.86

    Min latency (ns)                : 797311
    Max latency (ns)                : 7575694
    Mean latency (ns)               : 850443
    50.00 percentile latency (ns)   : 840563
    90.00 percentile latency (ns)   : 892472
    95.00 percentile latency (ns)   : 907110
    97.00 percentile latency (ns)   : 917172
    99.00 percentile latency (ns)   : 940319
    99.90 percentile latency (ns)   : 1001692

    ================================================
    Test Parameters Used
    ================================================
    samples_per_query : 1
    target_qps : 1113.27
    target_latency (ns): 0
    max_async_queries : 1
    min_duration (ms): 600000
    max_duration (ms): 0
    min_query_count : 50000
    max_query_count : 0
    qsl_rng_seed : 148687905518835231
    sample_index_rng_seed : 520418551913322573
    schedule_rng_seed : 811580660758947900
    accuracy_log_rng_seed : 0
    accuracy_log_probability : 0
    accuracy_log_sampling_target : 0
    print_timestamps : 0
    performance_issue_unique : 0
    performance_issue_same : 0
    performance_issue_same_index : 0
    performance_sample_count : 2048

    No warnings encountered during test.

    No errors encountered during test.

Under the “Additional Stats” section, compare QPS w/ and w/o loadgen stats with reference values.

QPS w/ loadgen overhead : 1030.15
QPS w/o loadgen overhead : 1052.92

Use the mpk remove command to free up disk space on the device before running a new test. For details on how to use the mpk commands, see the section on MPK Tool.

Batch1 Accuracy Mode Test

Make sure that the Batch1 performance mode test is run before running the accuracy mode test.

Make sure to update the gst command in the application.json as shown in the code below (with “mlperf-run-type”=1) and then save and close the file.

"gst": "MLA_OCM=max LD_LIBRARY_PATH=\"/data/simaai/applications/MLPerf/lib\" gst-launch-1.0 --gst-plugin-path='/data/simaai/applications/MLPerf/lib' fakesrc ! ml_filter in-dims=1:3:224:224 out-dims=1:64 mlperf-run-type=1 mlperf-scenario=0 toy-mode=false output-path=\"/data/simaai/applications/MLPerf\"  inpath=\"/data/mlperf_resnet50_dataset.dat\" config=\"/data/simaai/applications/MLPerf/etc/bad_sparse_resnet50_v1_b1.config\" silent=true ! fakesink

Parameters values to change:

“Mlperf-run-type” to 1

Create mpk, deploy, and wait till the test ends. The test run will take approximately approximately 4 to 5 minutes.
After the test run is complete, memory deallocation messages can be seen in the log file /var/log/simaai.log.
For verifying results, use the validation script stored in the /data directory.
davinci:~$ cd /data
Change the file permissions, if required, by running the following command.
davinci:/data$ sudo chmod 777 ./check_accuracy.sh

Run the validation script.

davinci:/data$ ./check_accuracy.sh /data/simaai/applications/MLPerf/mlperf_log_accuracy.json

The output numbers should match the values shown below.
accuracy=75.698%, good=37849, total=50000
Use the mpk remove command to free up disk space on the device before running a new test.

Batch8 Performance Mode Test

Go to the MLPerf directory in the Docker container.

sima-user@docker-image-id:# cd /usr/local/simaai/app_zoo/Gstreamer/MLPerf

Make sure the gst command has been updated in the application.json, as shown below.

"gst": "MLA_OCM=max LD_LIBRARY_PATH=\"/data/simaai/applications/MLPerf/lib\" gst-launch-1.0 --gst-plugin-path='/data/simaai/applications/MLPerf/lib' fakesrc ! ml_filter in-dims=8:3:224:224 out-dims=8:64 mlperf-run-type=0 mlperf-scenario=1 toy-mode=false output-path=\"/data/simaai/applications/MLPerf\"  inpath=\"/data/mlperf_resnet50_dataset.dat\" config=\"/data/simaai/applications/MLPerf/etc/bad_sparse_resnet50_v1_b8.config\" silent=true ! fakesink"
Parameters values to change:

“In-dims” to 8:3:224:224

“Out-dims” to 8:64

“Mlperf-run-type” to 0

“Mlperf-scenario” to 1

“Inpath” to “/data/mlperf_resnet50_dataset.dat”

“Config” to “/data/simaai/applications/MLPerf/etc/bad_sparse_resnet50_v1_b8.config”

Create mpk in the SDK and deploy.

sima-user@docker-image-id:/usr/local/lib/simaai/MLPerf# mpk create -s . -d .
sima-user@docker-image-id:/usr/local/lib/simaai/MLPerf# mpk deploy -f project.mpk -t <your_device_ip>

Check for memory allocation messages in the device console and cross-check the gst command in the device process tree.
davinci:~$ top
Wait till the test run is complete (memory deallocation happens) and check for the mlperf summary log file in the device root (/data/simaai/applications/MLPerf/) directory.

The log summary should appear as shown below. The result files are stored under the same directory path /data/simaai/applications/MLPerf/ directory.

davinci:/data$ sudo cat ./simaai/applications/MLPerf/mlperf_log_summary.txt
    ================================================
    MLPerf Results Summary
    ================================================
    SUT name :
    Scenario : MultiStream
    Mode     : PerformanceOnly
    99th percentile latency (ns) : 2927200
    Result is : VALID
    Min duration satisfied : Yes
    Min queries satisfied : Yes
    Early stopping satisfied: Yes
    Early Stopping Result:
    * Processed at least 662 queries (207154).
    * Would discard 1965 highest latency queries.
    * Early stopping 99th percentile estimate: 2928160
    ================================================
    Additional Stats
    ================================================
    Per-query latency:
    Min latency (ns)                : 2815320
    Max latency (ns)                : 5802920
    Mean latency (ns)               : 2873437
    50.00 percentile latency (ns)   : 2872120
    90.00 percentile latency (ns)   : 2895280
    95.00 percentile latency (ns)   : 2903880
    97.00 percentile latency (ns)   : 2910600
    99.00 percentile latency (ns)   : 2927200
    99.90 percentile latency (ns)   : 2998560
    ================================================
    Test Parameters Used
    ================================================
    samples_per_query : 8
    target_qps : 348.432
    target_latency (ns): 0
    max_async_queries : 1
    min_duration (ms): 600000
    max_duration (ms): 0
    min_query_count : 50000
    max_query_count : 0
    qsl_rng_seed : 148687905518835231
    sample_index_rng_seed : 520418551913322573
    schedule_rng_seed : 811580660758947900
    accuracy_log_rng_seed : 0
    accuracy_log_probability : 0
    accuracy_log_sampling_target : 0
    print_timestamps : 0
    performance_issue_unique : 0
    performance_issue_same : 0
    performance_issue_same_index : 0
    performance_sample_count : 1024
    1 warning encountered. See detailed log.
    No errors encountered during test.

Check the Additional Stats section in the mlperf_log_summary file for Mean latency (ns) and compare against the reference value shown below.
Mean latency (ns) : 2873437
Use the mpk remove command to free up disk space on the device before running a new test. For details on how to use the mpk commands, see the section on MPK Tool.

Batch8 Accuracy Mode Test

Make sure the Batch8 performance mode test is run before running the Batch8 accuracy mode test.

Update the application.json file and the gst command, as shown below. That is, the mlperf-run-type=1 mlperf-scenario=1 in the applicaton json.

"gst": "MLA_OCM=max LD_LIBRARY_PATH=\"/data/simaai/applications/MLPerf/lib\" gst-launch-1.0 --gst-plugin-path='/data/simaai/applications/MLPerf/lib' fakesrc ! ml_filter in-dims=8:3:224:224 out-dims=8:64 mlperf-run-type=1 mlperf-scenario=1 toy-mode=false output-path=\"/data/simaai/applications/MLPerf\"  inpath=\"/data/mlperf_resnet50_dataset.dat\" config=\"/data/simaai/applications/MLPerf/etc/bad_sparse_resnet50_v1_b8.config\" silent=true ! fakesink"
Parameters values to change:

“In-dims” to 8:3:224:224

“Out-dims” to 8:64

“Mlperf-run-type” to 1

“Mlperf-scenario” to 1

“Inpath” to “/data/mlperf_resnet50_dataset.dat”

“Config” to “/data/simaai/applications/MLPerf/etc/bad_sparse_resnet50_v1_b8.config”

Follow steps 3 through 8 of the “Batch1 Accuracy Mode Test”.
Run the validation script.
davinci:/data$ ./check_accuracy.sh ./simaai/applications/MLPerf/mlperf_log_accuracy.json
The output numbers should match the values shown here : accuracy=75.990%, good=37995, total=50000
Use the mpk remove command to free up disk space on the device before running a new test. For details on how to use the mpk commands, see the section on MPK Tool.

Batch14 Performance Mode Test

Go to the MLPerf directory in the Docker container.

sima-user@docker-image-id:/home# cd /usr/local/simaai/app_zoo/Gstreamer/MLPerf

Modify the application.json file to update the parameters, as shown below.

"gst": "MLA_OCM=max LD_LIBRARY_PATH=\"/data/simaai/applications/MLPerf/lib\" gst-launch-1.0 --gst-plugin-path='/data/simaai/applications/MLPerf/lib' fakesrc ! ml_filter in-dims=14:3:224:224 out-dims=14:64 mlperf-run-type=0 mlperf-scenario=2 toy-mode=false output-path=\"/data/simaai/applications/MLPerf\"  inpath=\"/data/mlperf_resnet50_dataset.dat\" config=\"/data/simaai/applications/MLPerf/etc/bad_sparse_resnet50_v1_b14.config\" silent=true ! fakesink"
Parameters values to change:

“In-dims” to 14:3:224:224

“Out-dims” to 14:64

“Mlperf-run-type” to 0

“Mlperf-scenario” to 2

“Inpath” to “/data/mlperf_resnet50_dataset.dat”

“Config” to “/data/simaai/applications/MLPerf/etc/bad_sparse_resnet50_v1_b14.config”

Create the mpk in SDK and deploy.

sima-user@docker-image-id:/usr/local/simaai/app_zoo/Gstreamer/MLPerf# mpk create -s . -d.
sima-user@docker-image-id:/usr/local/simaai/app_zoo/Gstreamer/MLPerf# mpk deploy -f project.mpk -t <device-ip>

Check for memory allocation messages in the device console and cross-check the gst command in the device process tree by running the following command:
davinci:~$ top

Wait until the test is complete (memory deallocation happens) and check for the logs in the device as root in the /data/simaai/applications/MLPerf/ directory. Generally the test takes 20 to 22 minutes. The summary log should appear as shown below.

davinci:~$ sudo cat /data/simaai/applications/MLPerf/mlperf_log_summary.txt
    ================================================
    MLPerf Results Summary
    ================================================
    SUT name :
    Scenario : Offline
    Mode     : PerformanceOnly
    Samples per second: 3397.96
    Result is : VALID
    Min duration satisfied : Yes
    Min queries satisfied : Yes
    Early stopping satisfied: Yes

    ================================================
    Additional Stats
    ================================================
    Min latency (ns)                : 1529062681
    Max latency (ns)                : 922611782310
    Mean latency (ns)               : 462219336486
    50.00 percentile latency (ns)   : 462310204729
    90.00 percentile latency (ns)   : 830534473673
    95.00 percentile latency (ns)   : 876460296204
    97.00 percentile latency (ns)   : 894974228836
    99.00 percentile latency (ns)   : 913361873441
    99.90 percentile latency (ns)   : 921692414874

    ================================================
    Test Parameters Used
    ================================================
    samples_per_query : 3135000
    target_qps : 4750
    target_latency (ns): 0
    max_async_queries : 1
    min_duration (ms): 600000
    max_duration (ms): 0
    min_query_count : 1
    max_query_count : 0
    qsl_rng_seed : 148687905518835231
    sample_index_rng_seed : 520418551913322573
    schedule_rng_seed : 811580660758947900
    accuracy_log_rng_seed : 0
    accuracy_log_probability : 0
    accuracy_log_sampling_target : 0
    print_timestamps : 0
    performance_issue_unique : 0
    performance_issue_same : 0
    performance_issue_same_index : 0
    performance_sample_count : 50000

    1 warning encountered. See detailed log.

    No errors encountered during test.

Verify the samples per second against the reference value shown here Samples per second: 3397.96.

Batch14 Accuracy Mode Test

Make sure the Batch14 performance mode test has been run before running the Batch14 accuracy mode test.

Make sure to update the gst command in the application.json as shown below.

"gst": "MLA_OCM=max LD_LIBRARY_PATH=\"/data/simaai/applications/MLPerf/lib\" gst-launch-1.0 --gstplugin-path='/data/simaai/applications/MLPerf/lib' fakesrc ! \
ml_filter indims=14:3:224:224 out-dims=14:64 mlperf-run-type=1 mlperf-scenario=2 toy-mode=false inpath=\"/data/mlperf_resnet50_dataset.dat\" \
output-path=\"/data/simaai/applications/MLPerf\" config=\"/data/simaai/applications/MLPerf/etc/bad_sparse_resnet50_v1_b14.config\" \
silent=true ! fakesink"
Parameters values to change:

“In-dims” to 14:3:224:224

“Out-dims” to 14:64

“Mlperf-run-type” to 1

“Mlperf-scenario” to 2

“Inpath” to “/data/mlperf_resnet50_dataset.dat”

“Config” to “/data/simaai/applications/MLPerf/etc/bad_sparse_resnet50_v1_b14.config”

Follow steps “3 through 8” of the “Batch1 Accuracy Mode Test”.