.. _post_training_quantization:

Post Training Quantization
##########################

Post-Training Quantization (PTQ) is a key feature of the ModelSDK that enables developers to optimize machine learning models for efficient execution 
on the SiMa MLSoC. PTQ simplifies the process of reducing model size and improving inference speed by converting high-precision data types (like float32) 
to lower-precision formats such as ``int8`` or ``int16``. 

This approach is particularly beneficial because it can significantly enhance computational performance with minimal impact on model accuracy. 
The PTQ workflow is straightforward: developers load a pre-trained model, quantize it, evaluate its accuracy, and compile it for execution on the MLSoC.

PTQ is ideal for most models, offering an efficient balance between speed and accuracy. It helps reduce the memory footprint, accelerates inference, and 
optimizes resource usage—making it an excellent first step in model optimization for deployment.

The PTQ workflow illustrated below includes loading a model, quantizing it to int8 or int16, evaluating accuracy, and compiling it for execution.

.. image:: ../palette/media/ModelSDK.png
    :scale: 35%
    :align: center
    
|    

In cases where PTQ does not meet the desired accuracy, developers can explore more advanced methods like Quantization Aware Training (QAT). However, 
for many scenarios, PTQ offers a practical, fast, and effective path to optimizing machine learning models for the SiMa MLSoC.

Follow this step-by-step :ref:`guide <optimize_model>` to learn how to use PTQ to quantize a Resnet50 model.

.. toctree::
   :maxdepth: 2
   :hidden:

   load_model.rst
   quantize.rst
   compilation.rst