afe.core.configs ================ .. py:module:: afe.core.configs Classes ------- .. autoapisummary:: afe.core.configs.ModelConfigs afe.core.configs.QuantizationPrecision afe.core.configs.EmptyValue afe.core.configs.Opt afe.core.configs.QuantizationConfigs afe.core.configs.CompressionConfigs afe.core.configs.CalibrationConfigs afe.core.configs.PerfThreshold afe.core.configs.RelativePerfThreshold afe.core.configs.AbsolutePerfThreshold afe.core.configs.QuantizationAwarePartitioningConfigs afe.core.configs.OptimizationConfigs afe.core.configs.RunConfigs afe.core.configs.ConvertLayoutMethod afe.core.configs.TransformerConfigs afe.core.configs.AfeProcessingConfigs Functions --------- .. autoapisummary:: afe.core.configs.initialize_empty_quant_config afe.core.configs.merge_quantization_configs afe.core.configs.create_quantization_configs afe.core.configs.update_quantization_configs afe.core.configs.create_testcase_calibration_configs afe.core.configs.api_calibration_configs Module Contents --------------- .. py:class:: ModelConfigs .. py:attribute:: name :type: str .. py:attribute:: framework :type: str .. py:attribute:: input_names :type: List[str] .. py:attribute:: input_shapes :type: List[afe.ir.defines.InputShape] .. py:attribute:: input_dtypes :type: List[afe.ir.tensor_type.ScalarType] .. py:attribute:: layout :type: str .. py:attribute:: model_path :type: str :value: '' .. py:attribute:: model_file_paths :type: List[str] :value: [] .. py:attribute:: is_quantized :type: bool :value: False .. py:attribute:: output_names :type: Optional[List[str]] :value: None .. py:attribute:: output_directory :type: Optional[str] :value: None .. py:attribute:: toolbox_config :type: bool :value: False .. py:attribute:: mlc_files :type: Optional[str] :value: None .. py:attribute:: trace_files :type: Optional[str] :value: None .. py:method:: set_default_output_directory(output_directory_path: str) .. py:method:: set_absolute_model_path(model_path: str) .. py:property:: shape_dict :type: Dict[afe.ir.defines.NodeName, afe.ir.defines.InputShape] .. py:property:: dtype_dict :type: Dict[afe.ir.defines.NodeName, afe.ir.tensor_type.ScalarType] .. py:property:: input_shapes_hwc :type: Tuple[int, int, int] .. py:property:: shape_dict_hwc :type: Dict[afe.ir.defines.NodeName, Tuple[int, int, int]] .. py:class:: QuantizationPrecision Generic enumeration. Derive from this class to define new enumerations. .. py:attribute:: INT_8 :value: 'int8' .. py:attribute:: INT_16 :value: 'int16' .. py:attribute:: BFLOAT_16 :value: 'bfloat16' .. py:attribute:: BFLOAT_16_INT8_WEIGHTS :value: 'bfloat16_int8_weights' .. py:attribute:: BFLOAT_16_INT4_WEIGHTS :value: 'bfloat16_int4_weights' .. py:method:: from_string(precision: str) -> QuantizationPrecision :staticmethod: .. py:method:: to_scalar_type() -> afe.ir.tensor_type.ScalarType .. py:method:: to_expected_int_scalar_type() -> afe.ir.tensor_type.ScalarType .. py:method:: is_int8_precision() -> bool .. py:method:: is_int16_precision() -> bool .. py:method:: is_bfloat16_precision() -> bool .. py:method:: is_bfloat16_with_int8_weights() -> bool .. py:method:: is_bfloat16_with_int_weights() -> bool .. py:class:: EmptyValue An empty value class, used to initialize an empty Opt class. .. py:attribute:: empty :type: str :value: '' .. py:class:: Opt Generic immutable container class having either one value or no value. It is used for storing values of QuantizationConfigs fields. .. py:attribute:: value :type: Union[_T, EmptyValue] .. py:method:: merge(option: Opt) -> Opt .. py:method:: get() .. py:method:: is_empty() .. py:class:: QuantizationConfigs Parameters controlling how to quantize a network. Instances should be constructed using one of the construction functions, not using the class constructor. Fields can be overridden for specific nodes using the custom_quantization_configs parameter of UpdateQuantizationConfigs. This parameter is accepted by several other functions, as well. See individual fields for restrictions on overriding. .. py:attribute:: asymmetry :type: Opt[bool] .. py:attribute:: per_channel :type: Opt[bool] .. py:attribute:: leaky_relu_uses_udf :type: Opt[bool] .. py:attribute:: quantization_precision :type: Opt[QuantizationPrecision] .. py:attribute:: quantization_sensitivity :type: Opt[int] .. py:attribute:: intermediate_int32 :type: Opt[bool] .. py:attribute:: biascorr_type :type: Opt[afe.ir.defines.BiasCorrectionType] .. py:attribute:: output_int32 :type: Opt[bool] .. py:attribute:: requantization_mode :type: Opt[afe.ir.defines.RequantizationMode] .. py:attribute:: channel_equalization :type: Opt[bool] .. py:attribute:: smooth_quant :type: Opt[bool] .. py:function:: initialize_empty_quant_config() Helper function for initializing empty QuantizationConfigs. .. py:function:: merge_quantization_configs(*, config1: QuantizationConfigs, config2: QuantizationConfigs) -> QuantizationConfigs Merge 2 QuantizationConfigs. When merging, values from first config have higher priority and values from second are discarded so QuantizationConfigs with higher priority should be config1. .. py:function:: create_quantization_configs(*, asymmetry: bool = True, per_channel: bool = False, leaky_relu_uses_udf: bool = True, quantization_precision: QuantizationPrecision = QuantizationPrecision.INT_8, quantization_sensitivity: int = 0, requantization_mode: afe.ir.defines.RequantizationMode = RequantizationMode.sima, intermediate_int32: bool = False, biascorr_type: afe.ir.defines.BiasCorrectionType = BiasCorrectionType.NONE, output_int32: bool = False, channel_equalization: bool = False, smooth_quant: bool = False) -> QuantizationConfigs Construct QuantizationConfigs. :param asymmetry: Whether to use asymmetric quantization. :param per_channel: Whether to use per-channel quantization. :param leaky_relu_uses_udf: Whether to use UDF instead of arithmetic instructions for quantization. :param quantization_precision: Precision used during quantization. :param quantization_sensitivity: Sensitivity for mixed precision quantization. :param requantization_mode: A way of doing quantized arithmetic. :param intermediate_int32: Whether to use wide node outputs during quantization. :param biascorr_type: Method to correct for quantization-induced bias: None/Regular/Iterative. :param output_int32: Whether to use the int32 numeric type in the output of convolution related operators. :param channel_equalization: Whether to enable channel equalization. :param smooth_quant: Whether to enable smooth quant. :return: QuantizationConfigs .. py:function:: update_quantization_configs(quantization_configs: QuantizationConfigs, field_name: str, value: Any) -> None Given a field name and a value, if the QuantizationConfigs has attribute name same as the given field name, update the attribute with the given value. Parameters ---------- :param quantization_configs: QuantizationConfigs. :param field_name: str. Name of the target attribute in the given QuantizationConfigs object. :param value: Any. Target value that is going to be assigned to the attribute. .. py:class:: CompressionConfigs .. py:attribute:: compress :type: bool :value: False .. py:class:: CalibrationConfigs Parameters for calibration. Instances should be constructed using one of the construction functions. Attribute ---------- :attribute calibration_method: CalibrationMethod used during calibration. See the CalibrationMethod Enum class for currently supported methods. :attribute num_calibration_samples: int. Limit on number of data samples that we use to feed inputs to the AwesomeNet during calibration. If None, all data samples that are passed to calibration are used. :attribute percentile_value: Optional[float]. Percentage of values to keep when using histogram percentile. .. py:attribute:: calibration_method :type: afe.apis.defines.CalibrationMethod .. py:attribute:: num_calibration_samples :type: Optional[int] .. py:function:: create_testcase_calibration_configs(num_calibration_samples: int, calibration_method: afe.apis.defines.CalibrationMethod = MinMaxMethod()) -> CalibrationConfigs Construct CalibrationConfigs using parameters from the network test configuration data. :param num_calibration_samples: Maximum number of calibration data samples to use for calibration. :param calibration_method: CalibrationMethod used in calibration. See the CalibrationMethod Enum class for supported values. :return: Constructed value .. py:function:: api_calibration_configs(calibration_method: afe.apis.defines.CalibrationMethod = MinMaxMethod()) -> CalibrationConfigs Construct CalibrationConfigs using user-specified parameters. :param calibration_method: CalibrationMethod used in calibration. See the CalibrationMethod Enum class for supported values. :param percentile_value: Optional[float]. In case of Histogram percentile observer, configures percentage of values to keep in the histogram. :return: Constructed value .. py:class:: PerfThreshold Used to store the threshold value for quantized model performance. .. py:method:: set_threshold(fp32_perf: float) -> float :abstractmethod: .. py:class:: RelativePerfThreshold Used to store the threshold value for quantized model performance. The threshold value is given as a value relative to the floating-point model performance. :param rel_value: float. Quantized model performance threshold value relative to the floating-point model performance. .. py:attribute:: rel_value :type: float .. py:method:: set_threshold(fp32_perf: float) -> float Returns the threshold for quantized model performance relative to the floating-point model performance. :param fp32_perf: float. Floating-point model performance. :return: float. Threshold value relative to floating-point model performance. .. py:class:: AbsolutePerfThreshold Used to store the threshold value for quantized model performance. The threshold value is given as an absolute value. :param abs_value: float. Quantized model performance threshold value. .. py:attribute:: abs_value :type: float .. py:method:: set_threshold(fp32_perf: float) -> float Returns the threshold for quantized model performance. :param fp32_perf: float. Unused. Floating-point model performance. :return: float. Quantized model performance threshold value. .. py:class:: QuantizationAwarePartitioningConfigs Config used for quantization-aware partitioning Attribute ---------- :attribute performance_threshold: PerfThreshold. Value used as a target for quantized model performance. Given either as an absolute value or as a value relative to the floating point model performance. :attribute target_performance_mode: PerformanceMode. Whether the target performance is given as an absolute value or as a value relative to the floating point model performance. :attribute max_iterations: int. Maximal number of iterations in the QAP loop. Represents the maximal number of layers that are to be fixed to floating point while performing the quantization-aware partitioning. :attribute graph_analyzer_mode: QuantizedGraphAnalyzerMode. Graph analysis execution mode. :attribute graph_analyzer_metric: str. Metric used in graph analysis. :attribute graph_analyzer_number_of_samples: int. Number of input samples to be used in graph analysis. .. py:attribute:: performance_threshold :type: PerfThreshold .. py:attribute:: max_iterations :type: int :value: 1 .. py:attribute:: graph_analyzer_mode :type: afe.core.graph_analyzer.utils.QuantizedGraphAnalyzerMode .. py:attribute:: graph_analyzer_metric :type: afe.core.graph_analyzer.utils.Metric .. py:attribute:: graph_analyzer_number_of_samples :type: int :value: 2 .. py:class:: OptimizationConfigs Class for holding the configuration information used by the OptimizerClass .. py:attribute:: strategy :type: str :value: 'sequential' .. py:attribute:: calibration_configs :type: CalibrationConfigs .. py:attribute:: quantization_configs :type: QuantizationConfigs .. py:attribute:: compression_configs :type: CompressionConfigs .. py:class:: RunConfigs Configuration parameters for how to execute networks in software emulation. :attribute fast_mode: If True, use a fast implementation of an operator. Results may not match the result of executing on the MLA. If False, use an implementation that exactly matches execution on the MLA. .. py:attribute:: fast_mode :type: bool :value: False .. py:class:: ConvertLayoutMethod Enumeration specifying the layout conversion algorithm. Common processing flow requires the model to be converted to MLA's native layout, that is 'NHWC'. This enumeration specifies what algorithm shall be used for layout conversion. Currently, the 'legacy' algorithm is tested and proven on most CNN models. The 'automated' algorithm is being developed and is aimed mostly for ViT models. The 'none' option is used in internal test cases, and it specifies the processing flow where layout conversion is skipped. It should not be used in the general model processing pipeline. .. py:attribute:: NONE :value: 'none' No layout conversion. This is backward-compatible value designating the case where layout conversion transform is disabled in some test cases. Since we would want to run the layout conversion algorithm for all of the test cases, this value would get deprecated in the future. .. py:attribute:: LEGACY :value: 'legacy' Legacy algorithm using TVM's ConvertLayout pass .. py:attribute:: AUTOMATED :value: 'automated' Algorithm doing automatic rewrite for MLA suported layouts .. py:method:: from_str(method: str) -> ConvertLayoutMethod :staticmethod: Helper method for constructing an enumeration instance from string. :param method: String parameter defining the layout conversion method. :returns: The ConvertLayoutMethod Enum instance. .. py:class:: TransformerConfigs(convert_layout_method: ConvertLayoutMethod = ConvertLayoutMethod.LEGACY, enable_graph_partition: bool = True, backend_indices_dict: dict[afe.backends.Backend, list[int | tuple[int, Ellipsis]]] | None = None, enable_quantization_based_partitioning: bool = False, *, requantization_mode: afe.ir.defines.RequantizationMode = RequantizationMode.sima, enabled_backends: afe._tvm._tvm_graph_partition.CompileMode = CompileMode.MLA_EV74_CPU) Class holding the configuration information used by GraphTransformer and Partitioner. Attributes ---------- :attribute convert_layout_method: Specifies the algorithm used for converting the model layout. :attribute enable_graph_partition: bool. Whether to apply graph partitioning on the model. :attribute indices_to_backend_dict: Optional[Dict[int, afe.backends.Backend]]. Dictionary containing mapping of layer indices to it's targeted backend, if any. If a layer {index: target_backend} pair is present in the dictionary, it means that the layer with given index will be executed on the target_backend Backend. If the index is absent from the dictionary, it means that the layer with that index will be executed on the highest-priority backend that is supported for that layer. :attribute enable_quantization_based_partitioning: bool. Flag containing information whether to apply quantization-based partitioning. :attribute requantization_mode: How to convert TVM quantized operators to SiMa IR quantized operators. Only quantized TVM operators that are assigned the MLA are affected. :attribute enabled_backends: Which set of backends to assign nodes to in graph partitioning. Any assignment in backend_indices_dict overrides this parameter. Example ------- The example shows how to create a TransformerConfigs that will convert the layout to NHWC, enable the graph partitioning, and set the node with index = [1, 13, 22] to APU: backend_indices_dict = {Backend.APU: [1, 13, 22]} transformer_configs = TransformerConfigs(convert_layout=True, enable_graph_partition=True, backend_indices_dict=backend_indices_dict) .. py:attribute:: convert_layout_method :type: ConvertLayoutMethod .. py:attribute:: enable_graph_partition :type: bool .. py:attribute:: indices_to_backend_dict :type: dict[int, afe.backends.Backend] .. py:attribute:: enable_quantization_based_partitioning :type: bool .. py:attribute:: requantization_mode :type: afe.ir.defines.RequantizationMode .. py:attribute:: enabled_backends :type: afe._tvm._tvm_graph_partition.CompileMode .. py:property:: convert_layout :type: bool Property defining whether layout conversion algorithm is enabled. This property is defined due to backward portability issues, and should be used only in some helper test functions (i.e. determining whether model inputs and/or outputs should be transposed during the test run). This property should not be used to define any aspect of tvm transformations, as it would be deprecated in near future. :returns: Boolean flag determining whether layout conversion algorithm is run during tvm transformations. .. py:class:: AfeProcessingConfigs Dataclass holding all the configuration information used in end-to-end processing. Attributes ---------- :attribute model_configs: ModelConfigs. Configuration information on model that is being processed. :attribute transformer_configs: TransformerConfigs. Configuration information on transformations being used in model processing. :attribute optimization_configs: OptimizationConfigs. Configuration information on optimizations being used in processing. :attribute qap_configs: QuantizationAwarePartitioningConfigs. Configuration information being used in quantization-aware partitioning algorithm. :attribute target: A target platform that a model is compiled for. .. py:attribute:: model_configs :type: ModelConfigs .. py:attribute:: transformer_configs :type: TransformerConfigs .. py:attribute:: optimization_configs :type: OptimizationConfigs .. py:attribute:: qap_configs :type: QuantizationAwarePartitioningConfigs .. py:attribute:: target :type: sima_utils.common.Platform