compile
Through the afe.apis.model.Model.compile()
API, the compiler will convert a quantized model into a
binary format that can be executed on the SiMa MLSoC device.
from afe.apis.model import Model
# Load quantized model
quant_model = Model.load(<quant_model_name>, <path to quantized model file>)
The output of the .compile()
API is a .tar.gz
archive with a name that is derived from the quantized model file name.
The .tar.gz
archive includes the following:
.lm files that will be executed on the MLA
.so files that will be executed on the Cortex A65 (only generated if necessary)
.yaml file for execution statistics profiling
.json files for various processor plugin configuration
Compile with Default Options
Just specify the output folder path:
quant_model.compile(output_path=<output_folder_path>)
Compiling for Batch Sizes > 1
The desired batch size of the compiled model can be set like this:
quant_model.compile(output_path=<output_folder_path>,
batch_size=16)
Note
There is no guarantee that the specified batch size will be met; the compiler will try to implement the biggest batch size possible, up to the number
specified by the batch_size argument. The current release of the Palette software (SDK) does not report the implemented batch size, therefore, you will
need to search for desired_batch_size
and actual_batch_size
in the JSON file included in the .tar.gz file.
"name": "MLA_0",
"sequence": 3,
"processor": "MLA",
"config_params": {
"desired_batch_size": 16,
"actual_batch_size": 12,
"number_of_quads_to_user": 4
},
In the example above, the batch_size argument was set to 16 but the compiler was only able to implement a batch size of 12.
Printing tar.gz Contents
The current version of the compiler does not indicate the files incorporated into the tar.gz archive. The contents of the .tar.gz can be printed out using a Python script as shown below.
import tarfile
file = tarfile.open(<name_of_archive.tar.gz>)
for filename in file.getnames():
print(filename)
file.close()
Per-Layer Runtime Statistics
Whenever a model is compiled, the compiled model tar.gz
package includes a *_mla_stats.yaml
file that specifies the cycle count of each layer as estimated by the compiler.
For each layer in the model that targets the MLA, there is a name MLA_*
and start start_cycle
and end cycles end_cycle
:
4:
name: MLA_0/conv2d_add_relu_3
start_cycle: 63615
end_cycle: 71558
5:
name: MLA_0/conv2d_add_relu_4
start_cycle: 71559
end_cycle: 79502
This constitutes the start and end cycles based on static scheduling, and does not account for cycle stalls that occur due to instruction and memory fetches. To get full runtime statistics that include memory cycles, you can run .lm
models on the hardware using the accelerator mode provided in the Palette software (SDK).