.. _Model Compilation:

Compiling the Quantized Model
#############################

The compiler will convert a quantized model into a binary format that can be executed on the SiMa Technologies MLSoC device.


.. function:: Model.compile(output_path: str, batch_size: int = 1, compress: bool = True, log_level: Optional[int] = logging.NOTSET, tessellate_parameters: Optional[TessellateParameters] = None, l2_caching_mode: L2CachingMode = L2CachingMode.NONE)

    :module: Model

    The input to the .compile() API is a quantized model file with a .'sima' suffix that is generated by the .quantize() API.
    If it needs to be loaded from an extaernal file, then use Model.load().
    Compiles a network to a zip file that can be incorporated into an MPK package.
    The output file name is the concatenation of the model’s name and “_mpk.tar.gz”.
    The network will be compiled for a batch size smaller or equal to the value of the batch_size parameter, overriding any batch size that was used by the input model.  
    The batch size is chosen by compiler optimizations.  The input model’s tensors’ first dimension must be the batch dimension so that the compiler can adjust the batch size.  
    The compiler attempts to detect unsupported use or non-use of a batch dimension, but it cannot always detect this situation.
    The output zip file contains a JSON file describing the compiled model (in addition to other files).

    
    :param output_path: Default is current working directory. A string that defines the folder where the output ``.tar.gz`` file will be written.
    :param batch_size: Default is 1. Batch size to be used in the compiled model.
    :param compress: Default is True. Enable or disable DRAM data compression in the compiled .lm file.
    :param log_level: Default is logging.NOTSET. Sets the logging level for this API call as described in the Logging section.
    :param tessellate_parameters: Do not use, leave at default value.
    :param l2_caching_mode: Do not use, leave at default value.
   

.. code-block:: python

    from afe.apis.model import Model

    # Load quantized model
    quant_model = Model.load(<quant_model_name>, <path to quantized model file>)

The output of the ``.compile()`` API is a ``.tar.gz`` archive with a name that is derived from the quantized model file name.
The ``.tar.gz`` archive includes the following:

* .lm files that will be executed on the MLA
* .so files that will be executed on the Cortex A65 (only generated if necessary)
* .yaml file for execution statistics profiling
* JSON file

Compile with Default Options
----------------------------

Just specify the output folder path:

.. code-block:: python

    quant_model.compile(output_path=<output_folder_path>)


Compiling for Batch Sizes > 1
-----------------------------

The **desired** batch size of the compiled model can be set like this:

.. code-block:: python

    quant_model.compile(output_path=<output_folder_path>,
                                    batch_size=16)

.. note::
    
    There is no guarantee that the specified batch size will actually be met, the compiler will try to implement the biggest batch size possible, up to the number specified by the ``batch_size`` argument. The current release of the Palette software (SDK) does not report which batch size has actually been implemented, therefore, you will need to look in the JSON file which is included in the ``.tar.gz`` file and search for "desired_batch_size" and "actual_batch_size":

    .. code-block:: json

        "name": "MLA_0",
        "sequence": 3,
        "processor": "MLA",
        "config_params": {
            "desired_batch_size": 16,
            "actual_batch_size": 12,
            "number_of_quads_to_user": 4
        },

In the example above, the batch_size argument was set to 16 but the compiler was only able to implement batch size of 12. 

Printing tar.gz Contents
------------------------

The current version of the compiler does not tell the user which files have been incorporated into the tar.gz archive. The contents of the .tar.gz can be printed out by a Python script like this:

.. code-block:: python

    import tarfile 

    file = tarfile.open(<name_of_archive.tar.gz>)
    for filename in file.getnames():
        print(filename) 
    file.close() 


Per-Layer Runtime Statistics
----------------------------

Whenever a model is compiled, the compiled model ``tar.gz`` package includes a ``*_mla_stats.yaml`` file that specifies the cycle count of each layer as estimated by the compiler.
For each layer in the model that targets the MLA, there is a name ``MLA_*`` and start ``start_cycle`` and end cycles ``end_cycle``:


.. code-block:: yaml

    4:
    name: MLA_0/conv2d_add_relu_3
    start_cycle: 63615
    end_cycle: 71558
    5:
    name: MLA_0/conv2d_add_relu_4
    start_cycle: 71559
    end_cycle: 79502


This constitutes the start and end cycles based on static scheduling, and does not account for cycle stalls that occur due to instruction and memory fetches. To get full runtime statistics that include memory cycles, you can run ``.lm`` models on the hardware using the accelerator mode provided in the Palette software (SDK).