afe.core.configs
================

.. py:module:: afe.core.configs


Classes
-------

.. autoapisummary::

   afe.core.configs.ModelConfigs
   afe.core.configs.QuantizationPrecision
   afe.core.configs.EmptyValue
   afe.core.configs.Opt
   afe.core.configs.QuantizationConfigs
   afe.core.configs.CompressionConfigs
   afe.core.configs.CalibrationConfigs
   afe.core.configs.PerfThreshold
   afe.core.configs.RelativePerfThreshold
   afe.core.configs.AbsolutePerfThreshold
   afe.core.configs.QuantizationAwarePartitioningConfigs
   afe.core.configs.OptimizationConfigs
   afe.core.configs.RunConfigs
   afe.core.configs.ConvertLayoutMethod
   afe.core.configs.TransformerConfigs
   afe.core.configs.AfeProcessingConfigs


Functions
---------

.. autoapisummary::

   afe.core.configs.initialize_empty_quant_config
   afe.core.configs.merge_quantization_configs
   afe.core.configs.create_quantization_configs
   afe.core.configs.update_quantization_configs
   afe.core.configs.create_testcase_calibration_configs
   afe.core.configs.api_calibration_configs


Module Contents
---------------

.. py:class:: ModelConfigs

   .. py:attribute:: name
      :type:  str


   .. py:attribute:: framework
      :type:  str


   .. py:attribute:: input_names
      :type:  List[str]


   .. py:attribute:: input_shapes
      :type:  List[afe.ir.defines.InputShape]


   .. py:attribute:: input_dtypes
      :type:  List[afe.ir.tensor_type.ScalarType]


   .. py:attribute:: layout
      :type:  str


   .. py:attribute:: model_path
      :type:  str
      :value: ''


   .. py:attribute:: model_file_paths
      :type:  List[str]
      :value: []


   .. py:attribute:: is_quantized
      :type:  bool
      :value: False


   .. py:attribute:: output_names
      :type:  Optional[List[str]]
      :value: None


   .. py:attribute:: output_directory
      :type:  Optional[str]
      :value: None


   .. py:attribute:: toolbox_config
      :type:  bool
      :value: False


   .. py:attribute:: mlc_files
      :type:  Optional[str]
      :value: None


   .. py:attribute:: trace_files
      :type:  Optional[str]
      :value: None


   .. py:method:: set_default_output_directory(output_directory_path: str)


   .. py:method:: set_absolute_model_path(model_path: str)


   .. py:property:: shape_dict
      :type: Dict[afe.ir.defines.NodeName, afe.ir.defines.InputShape]


   .. py:property:: dtype_dict
      :type: Dict[afe.ir.defines.NodeName, afe.ir.tensor_type.ScalarType]


   .. py:property:: input_shapes_hwc
      :type: Tuple[int, int, int]


   .. py:property:: shape_dict_hwc
      :type: Dict[afe.ir.defines.NodeName, Tuple[int, int, int]]


.. py:class:: QuantizationPrecision


   Generic enumeration.

   Derive from this class to define new enumerations.


   .. py:attribute:: INT_8
      :value: 'int8'


   .. py:attribute:: INT_16
      :value: 'int16'


   .. py:attribute:: BFLOAT_16
      :value: 'bfloat16'


   .. py:attribute:: BFLOAT_16_INT8_WEIGHTS
      :value: 'bfloat16_int8_weights'


   .. py:attribute:: BFLOAT_16_INT4_WEIGHTS
      :value: 'bfloat16_int4_weights'


   .. py:method:: from_string(precision: str) -> QuantizationPrecision
      :staticmethod:


   .. py:method:: to_scalar_type() -> afe.ir.tensor_type.ScalarType


   .. py:method:: to_expected_int_scalar_type() -> afe.ir.tensor_type.ScalarType


   .. py:method:: is_int8_precision() -> bool


   .. py:method:: is_int16_precision() -> bool


   .. py:method:: is_bfloat16_precision() -> bool


   .. py:method:: is_bfloat16_with_int8_weights() -> bool


   .. py:method:: is_bfloat16_with_int_weights() -> bool


.. py:class:: EmptyValue

   An empty value class, used to initialize an empty Opt class.


   .. py:attribute:: empty
      :type:  str
      :value: ''


.. py:class:: Opt


   Generic immutable container class having either one value or no value.
   It is used for storing values of QuantizationConfigs fields.


   .. py:attribute:: value
      :type:  Union[_T, EmptyValue]


   .. py:method:: merge(option: Opt) -> Opt


   .. py:method:: get()


   .. py:method:: is_empty()


.. py:class:: QuantizationConfigs

   Parameters controlling how to quantize a network.

   Instances should be constructed using one of the construction functions, not using
   the class constructor.

   Fields can be overridden for specific nodes using the custom_quantization_configs
   parameter of UpdateQuantizationConfigs.  This parameter is accepted by several
   other functions, as well.  See individual fields for restrictions on overriding.


   .. py:attribute:: asymmetry
      :type:  Opt[bool]


   .. py:attribute:: per_channel
      :type:  Opt[bool]


   .. py:attribute:: leaky_relu_uses_udf
      :type:  Opt[bool]


   .. py:attribute:: quantization_precision
      :type:  Opt[QuantizationPrecision]


   .. py:attribute:: quantization_sensitivity
      :type:  Opt[int]


   .. py:attribute:: intermediate_int32
      :type:  Opt[bool]


   .. py:attribute:: biascorr_type
      :type:  Opt[afe.ir.defines.BiasCorrectionType]


   .. py:attribute:: output_int32
      :type:  Opt[bool]


   .. py:attribute:: requantization_mode
      :type:  Opt[afe.ir.defines.RequantizationMode]


   .. py:attribute:: channel_equalization
      :type:  Opt[bool]


   .. py:attribute:: smooth_quant
      :type:  Opt[bool]


.. py:function:: initialize_empty_quant_config()

   Helper function for initializing empty QuantizationConfigs.


.. py:function:: merge_quantization_configs(*, config1: QuantizationConfigs, config2: QuantizationConfigs) -> QuantizationConfigs

   Merge 2 QuantizationConfigs.
   When merging, values from first config have higher priority and values from second are discarded
   so QuantizationConfigs with higher priority should be config1.


.. py:function:: create_quantization_configs(*, asymmetry: bool = True, per_channel: bool = False, leaky_relu_uses_udf: bool = True, quantization_precision: QuantizationPrecision = QuantizationPrecision.INT_8, quantization_sensitivity: int = 0, requantization_mode: afe.ir.defines.RequantizationMode = RequantizationMode.sima, intermediate_int32: bool = False, biascorr_type: afe.ir.defines.BiasCorrectionType = BiasCorrectionType.NONE, output_int32: bool = False, channel_equalization: bool = False, smooth_quant: bool = False) -> QuantizationConfigs

   Construct QuantizationConfigs.

   :param asymmetry: Whether to use asymmetric quantization.
   :param per_channel: Whether to use per-channel quantization.
   :param leaky_relu_uses_udf: Whether to use UDF instead of arithmetic instructions for quantization.
   :param quantization_precision: Precision used during quantization.
   :param quantization_sensitivity: Sensitivity for mixed precision quantization.
   :param requantization_mode: A way of doing quantized arithmetic.
   :param intermediate_int32: Whether to use wide node outputs during quantization.
   :param biascorr_type: Method to correct for quantization-induced bias: None/Regular/Iterative.
   :param output_int32:  Whether to use the int32 numeric type in the output of convolution related operators.
   :param channel_equalization: Whether to enable channel equalization.
   :param smooth_quant: Whether to enable smooth quant.
   :return: QuantizationConfigs


.. py:function:: update_quantization_configs(quantization_configs: QuantizationConfigs, field_name: str, value: Any) -> None

   Given a field name and a value, if the QuantizationConfigs has attribute name same
   as the given field name, update the attribute with the given value.

   Parameters
   ----------
   :param quantization_configs: QuantizationConfigs.
   :param field_name: str. Name of the target attribute in the given QuantizationConfigs object.
   :param value: Any. Target value that is going to be assigned to the attribute.


.. py:class:: CompressionConfigs

   .. py:attribute:: compress
      :type:  bool
      :value: False


.. py:class:: CalibrationConfigs

   Parameters for calibration.

   Instances should be constructed using one of the construction functions.

   Attribute
   ----------
       :attribute calibration_method: CalibrationMethod used during calibration.  See the
           CalibrationMethod Enum class for currently supported methods.
       :attribute num_calibration_samples: int. Limit on number of data samples that we
           use to feed inputs to the AwesomeNet during calibration.  If None, all data samples
           that are passed to calibration are used.
       :attribute percentile_value: Optional[float]. Percentage of values to keep when using histogram percentile.


   .. py:attribute:: calibration_method
      :type:  afe.apis.defines.CalibrationMethod


   .. py:attribute:: num_calibration_samples
      :type:  Optional[int]


.. py:function:: create_testcase_calibration_configs(num_calibration_samples: int, calibration_method: afe.apis.defines.CalibrationMethod = MinMaxMethod()) -> CalibrationConfigs

   Construct CalibrationConfigs using parameters from the network test configuration data.

   :param num_calibration_samples: Maximum number of calibration data samples to use for calibration.
   :param calibration_method: CalibrationMethod used in calibration. See the CalibrationMethod Enum
       class for supported values.
   :return: Constructed value


.. py:function:: api_calibration_configs(calibration_method: afe.apis.defines.CalibrationMethod = MinMaxMethod()) -> CalibrationConfigs

   Construct CalibrationConfigs using user-specified parameters.

   :param calibration_method: CalibrationMethod used in calibration.  See the CalibrationMethod
       Enum class for supported values.
   :param percentile_value: Optional[float]. In case of Histogram percentile observer, configures percentage
       of values to keep in the histogram.
   :return: Constructed value


.. py:class:: PerfThreshold


   Used to store the threshold value for quantized model performance.


   .. py:method:: set_threshold(fp32_perf: float) -> float
      :abstractmethod:


.. py:class:: RelativePerfThreshold


   Used to store the threshold value for quantized model performance. The threshold value is given as a value
   relative to the floating-point model performance.

   :param rel_value: float. Quantized model performance threshold value relative to the floating-point model
                     performance.


   .. py:attribute:: rel_value
      :type:  float


   .. py:method:: set_threshold(fp32_perf: float) -> float

      Returns the threshold for quantized model performance relative to the floating-point model performance.

      :param fp32_perf: float. Floating-point model performance.
      :return: float. Threshold value relative to floating-point model performance.


.. py:class:: AbsolutePerfThreshold


   Used to store the threshold value for quantized model performance. The threshold value is given as an absolute
   value.

   :param abs_value: float. Quantized model performance threshold value.


   .. py:attribute:: abs_value
      :type:  float


   .. py:method:: set_threshold(fp32_perf: float) -> float

      Returns the threshold for quantized model performance.

      :param fp32_perf: float. Unused. Floating-point model performance.
      :return: float. Quantized model performance threshold value.


.. py:class:: QuantizationAwarePartitioningConfigs

   Config used for quantization-aware partitioning

   Attribute
   ----------
       :attribute performance_threshold: PerfThreshold. Value used as a target for quantized model
                                         performance. Given either as an absolute value or as a value
                                         relative to the floating point model performance.
       :attribute target_performance_mode: PerformanceMode. Whether the target performance is given as
                                           an absolute value or as a value relative to the floating
                                           point model performance.
       :attribute max_iterations: int. Maximal number of iterations in the QAP loop. Represents the
                                       maximal number of layers that are to be fixed to floating point
                                       while performing the quantization-aware partitioning.
       :attribute graph_analyzer_mode: QuantizedGraphAnalyzerMode. Graph analysis execution mode.
       :attribute graph_analyzer_metric: str. Metric used in graph analysis.
       :attribute graph_analyzer_number_of_samples: int. Number of input samples to be used in graph
                                                    analysis.


   .. py:attribute:: performance_threshold
      :type:  PerfThreshold


   .. py:attribute:: max_iterations
      :type:  int
      :value: 1


   .. py:attribute:: graph_analyzer_mode
      :type:  afe.core.graph_analyzer.utils.QuantizedGraphAnalyzerMode


   .. py:attribute:: graph_analyzer_metric
      :type:  afe.core.graph_analyzer.utils.Metric


   .. py:attribute:: graph_analyzer_number_of_samples
      :type:  int
      :value: 2


.. py:class:: OptimizationConfigs

   Class for holding the configuration information used by the OptimizerClass


   .. py:attribute:: strategy
      :type:  str
      :value: 'sequential'


   .. py:attribute:: calibration_configs
      :type:  CalibrationConfigs


   .. py:attribute:: quantization_configs
      :type:  QuantizationConfigs


   .. py:attribute:: compression_configs
      :type:  CompressionConfigs


.. py:class:: RunConfigs

   Configuration parameters for how to execute networks in
   software emulation.

   :attribute fast_mode: If True, use a fast implementation of
      an operator.  Results may not match the result of executing
      on the MLA.
      If False, use an implementation that exactly matches execution on the MLA.


   .. py:attribute:: fast_mode
      :type:  bool
      :value: False


.. py:class:: ConvertLayoutMethod


   Enumeration specifying the layout conversion algorithm.

   Common processing flow requires the model to be converted to MLA's native
   layout, that is 'NHWC'.  This enumeration specifies what algorithm shall be
   used for layout conversion.  Currently, the 'legacy' algorithm is tested
   and proven on most CNN models.  The 'automated' algorithm is being
   developed and is aimed mostly for ViT models.  The 'none' option is used in
   internal test cases, and it specifies the processing flow where layout
   conversion is skipped.  It should not be used in the general model
   processing pipeline.


   .. py:attribute:: NONE
      :value: 'none'


      No layout conversion.  This is backward-compatible value designating the case where layout
      conversion transform is disabled in some test cases.  Since we would want to run the layout
      conversion algorithm for all of the test cases, this value would get deprecated in the future.


   .. py:attribute:: LEGACY
      :value: 'legacy'


      Legacy algorithm using TVM's ConvertLayout pass


   .. py:attribute:: AUTOMATED
      :value: 'automated'


      Algorithm doing automatic rewrite for MLA suported layouts


   .. py:method:: from_str(method: str) -> ConvertLayoutMethod
      :staticmethod:


      Helper method for constructing an enumeration instance from string.

      :param method: String parameter defining the layout conversion method.

      :returns: The ConvertLayoutMethod Enum instance.


.. py:class:: TransformerConfigs(convert_layout_method: ConvertLayoutMethod = ConvertLayoutMethod.LEGACY, enable_graph_partition: bool = True, backend_indices_dict: dict[afe.backends.Backend, list[int | tuple[int, Ellipsis]]] | None = None, enable_quantization_based_partitioning: bool = False, *, requantization_mode: afe.ir.defines.RequantizationMode = RequantizationMode.sima, enabled_backends: afe._tvm._tvm_graph_partition.CompileMode = CompileMode.MLA_EV74_CPU)

   Class holding the configuration information used by GraphTransformer and Partitioner.

   Attributes
   ----------
   :attribute convert_layout_method: Specifies the algorithm used for converting the model layout.
   :attribute enable_graph_partition: bool. Whether to apply graph partitioning on the model.
   :attribute indices_to_backend_dict: Optional[Dict[int, afe.backends.Backend]]. Dictionary
                                       containing mapping of layer indices to it's targeted
                                       backend, if any. If a layer {index: target_backend}
                                       pair is present in the dictionary, it means that the
                                       layer with given index will be executed on the
                                       target_backend Backend. If the index is absent from
                                       the dictionary, it means that the layer with that index
                                       will be executed on the highest-priority backend that
                                       is supported for that layer.
   :attribute enable_quantization_based_partitioning: bool. Flag containing information whether
                                                      to apply quantization-based partitioning.
   :attribute requantization_mode: How to convert TVM quantized operators to SiMa IR quantized
       operators.  Only quantized TVM operators that are assigned the MLA are affected.
   :attribute enabled_backends: Which set of backends to assign nodes to in graph partitioning.
           Any assignment in backend_indices_dict overrides this parameter.

   Example
   -------
   The example shows how to create a TransformerConfigs that will convert the layout to NHWC,
   enable the graph partitioning, and set the node with index = [1, 13, 22] to APU:
       backend_indices_dict = {Backend.APU: [1, 13, 22]}
       transformer_configs = TransformerConfigs(convert_layout=True,
                                                enable_graph_partition=True,
                                                backend_indices_dict=backend_indices_dict)


   .. py:attribute:: convert_layout_method
      :type:  ConvertLayoutMethod


   .. py:attribute:: enable_graph_partition
      :type:  bool


   .. py:attribute:: indices_to_backend_dict
      :type:  dict[int, afe.backends.Backend]


   .. py:attribute:: enable_quantization_based_partitioning
      :type:  bool


   .. py:attribute:: requantization_mode
      :type:  afe.ir.defines.RequantizationMode


   .. py:attribute:: enabled_backends
      :type:  afe._tvm._tvm_graph_partition.CompileMode


   .. py:property:: convert_layout
      :type: bool


      Property defining whether layout conversion algorithm is enabled.

      This property is defined due to backward portability issues, and should be used only in
      some helper test functions (i.e. determining whether model inputs and/or outputs should be
      transposed during the test run).  This property should not be used to define any aspect of
      tvm transformations, as it would be deprecated in near future.

      :returns: Boolean flag determining whether layout conversion algorithm is run during tvm
                transformations.


.. py:class:: AfeProcessingConfigs

   Dataclass holding all the configuration information used in end-to-end processing.

   Attributes
   ----------
   :attribute model_configs: ModelConfigs. Configuration information on model that is being processed.
   :attribute transformer_configs: TransformerConfigs. Configuration information on transformations
                                   being used in model processing.
   :attribute optimization_configs: OptimizationConfigs. Configuration information on optimizations
                                    being used in processing.
   :attribute qap_configs: QuantizationAwarePartitioningConfigs. Configuration information being used
                           in quantization-aware partitioning algorithm.
   :attribute target: A target platform that a model is compiled for.


   .. py:attribute:: model_configs
      :type:  ModelConfigs


   .. py:attribute:: transformer_configs
      :type:  TransformerConfigs


   .. py:attribute:: optimization_configs
      :type:  OptimizationConfigs


   .. py:attribute:: qap_configs
      :type:  QuantizationAwarePartitioningConfigs


   .. py:attribute:: target
      :type:  sima_utils.common.Platform