afe.core.configs

Classes

`ModelConfigs`
`QuantizationPrecision`	Generic enumeration.
`EmptyValue`	An empty value class, used to initialize an empty Opt class.
`Opt`	Generic immutable container class having either one value or no value.
`QuantizationConfigs`	Parameters controlling how to quantize a network.
`CompressionConfigs`
`CalibrationConfigs`	Parameters for calibration.
`PerfThreshold`	Used to store the threshold value for quantized model performance.
`RelativePerfThreshold`	Used to store the threshold value for quantized model performance. The threshold value is given as a value
`AbsolutePerfThreshold`	Used to store the threshold value for quantized model performance. The threshold value is given as an absolute
`QuantizationAwarePartitioningConfigs`	Config used for quantization-aware partitioning
`OptimizationConfigs`	Class for holding the configuration information used by the OptimizerClass
`RunConfigs`	Configuration parameters for how to execute networks in
`ConvertLayoutMethod`	Enumeration specifying the layout conversion algorithm.
`TransformerConfigs`	Class holding the configuration information used by GraphTransformer and Partitioner.
`AfeProcessingConfigs`	Dataclass holding all the configuration information used in end-to-end processing.

Functions

`initialize_empty_quant_config`()	Helper function for initializing empty QuantizationConfigs.
`merge_quantization_configs`(→ QuantizationConfigs)	Merge 2 QuantizationConfigs.
`create_quantization_configs`(→ QuantizationConfigs)	Construct QuantizationConfigs.
`update_quantization_configs`(→ None)	Given a field name and a value, if the QuantizationConfigs has attribute name same
`create_testcase_calibration_configs`() → CalibrationConfigs)	Construct CalibrationConfigs using parameters from the network test configuration data.
`api_calibration_configs`() → CalibrationConfigs)	Construct CalibrationConfigs using user-specified parameters.

Module Contents

class afe.core.configs.ModelConfigs

name: str

framework: str

input_names: List[str]

input_shapes: List[afe.ir.defines.InputShape]

input_dtypes: List[afe.ir.tensor_type.ScalarType]

layout: str

model_path: str = ''

model_file_paths: List[str] = []

is_quantized: bool = False

output_names: List[str] | None = None

output_directory: str | None = None

toolbox_config: bool = False

mlc_files: str | None = None

trace_files: str | None = None

set_default_output_directory(output_directory_path: str)

set_absolute_model_path(model_path: str)

property shape_dict: Dict[afe.ir.defines.NodeName, afe.ir.defines.InputShape]

property dtype_dict: Dict[afe.ir.defines.NodeName, afe.ir.tensor_type.ScalarType]

property input_shapes_hwc: Tuple[int, int, int]

property shape_dict_hwc: Dict[afe.ir.defines.NodeName, Tuple[int, int, int]]

class afe.core.configs.QuantizationPrecision

Generic enumeration.

Derive from this class to define new enumerations.

INT_8 = 'int8'

INT_16 = 'int16'

BFLOAT_16 = 'bfloat16'

BFLOAT_16_INT8_WEIGHTS = 'bfloat16_int8_weights'

BFLOAT_16_INT4_WEIGHTS = 'bfloat16_int4_weights'

static from_string(precision: str) → QuantizationPrecision

to_scalar_type() → afe.ir.tensor_type.ScalarType

to_expected_int_scalar_type() → afe.ir.tensor_type.ScalarType

is_int8_precision() → bool

is_int16_precision() → bool

is_bfloat16_precision() → bool

is_bfloat16_with_int8_weights() → bool

is_bfloat16_with_int4_weights() → bool

is_bfloat16_with_int_weights() → bool

class afe.core.configs.EmptyValue

An empty value class, used to initialize an empty Opt class.

empty: str = ''

class afe.core.configs.Opt

Generic immutable container class having either one value or no value. It is used for storing values of QuantizationConfigs fields.

value: _T | EmptyValue

merge(option: Opt) → Opt

get()

is_empty()

class afe.core.configs.QuantizationConfigs

Parameters controlling how to quantize a network.

Instances should be constructed using one of the construction functions, not using the class constructor.

Fields can be overridden for specific nodes using the custom_quantization_configs parameter of UpdateQuantizationConfigs. This parameter is accepted by several other functions, as well. See individual fields for restrictions on overriding.

asymmetry: Opt[bool]

per_channel: Opt[bool]

leaky_relu_uses_udf: Opt[bool]

quantization_precision: Opt[QuantizationPrecision]

quantization_sensitivity: Opt[int]

intermediate_int32: Opt[bool]

biascorr_type: Opt[afe.ir.defines.BiasCorrectionType]

output_int32: Opt[bool]

requantization_mode: Opt[afe.ir.defines.RequantizationMode]

channel_equalization: Opt[bool]

smooth_quant: Opt[bool]

afe.core.configs.initialize_empty_quant_config(): Helper function for initializing empty QuantizationConfigs.

afe.core.configs.merge_quantization_configs(*, config1: QuantizationConfigs, config2: QuantizationConfigs) → QuantizationConfigs: Merge 2 QuantizationConfigs. When merging, values from first config have higher priority and values from second are discarded so QuantizationConfigs with higher priority should be config1.

afe.core.configs.create_quantization_configs(*, asymmetry: bool = True, per_channel: bool = False, leaky_relu_uses_udf: bool = True, quantization_precision: QuantizationPrecision = QuantizationPrecision.INT_8, quantization_sensitivity: int = 0, requantization_mode: afe.ir.defines.RequantizationMode = RequantizationMode.sima, intermediate_int32: bool = False, biascorr_type: afe.ir.defines.BiasCorrectionType = BiasCorrectionType.NONE, output_int32: bool = False, channel_equalization: bool = False, smooth_quant: bool = False) → QuantizationConfigs

Construct QuantizationConfigs.

Parameters:

asymmetry – Whether to use asymmetric quantization.
per_channel – Whether to use per-channel quantization.
leaky_relu_uses_udf – Whether to use UDF instead of arithmetic instructions for quantization.
quantization_precision – Precision used during quantization.
quantization_sensitivity – Sensitivity for mixed precision quantization.
requantization_mode – A way of doing quantized arithmetic.
intermediate_int32 – Whether to use wide node outputs during quantization.
biascorr_type – Method to correct for quantization-induced bias: None/Regular/Iterative.
output_int32 – Whether to use the int32 numeric type in the output of convolution related operators.
channel_equalization – Whether to enable channel equalization.
smooth_quant – Whether to enable smooth quant.

Returns:

QuantizationConfigs

afe.core.configs.update_quantization_configs(quantization_configs: QuantizationConfigs, field_name: str, value: Any) → None

Given a field name and a value, if the QuantizationConfigs has attribute name same as the given field name, update the attribute with the given value.

Parameters

param quantization_configs:: QuantizationConfigs.
param field_name:: str. Name of the target attribute in the given QuantizationConfigs object.
param value:: Any. Target value that is going to be assigned to the attribute.

class afe.core.configs.CompressionConfigs

compress: bool = False

class afe.core.configs.CalibrationConfigs

Parameters for calibration.

Instances should be constructed using one of the construction functions.

Attribute

attribute calibration_method:

CalibrationMethod used during calibration. See the CalibrationMethod Enum class for currently supported methods.

attribute num_calibration_samples:

int. Limit on number of data samples that we use to feed inputs to the AwesomeNet during calibration. If None, all data samples that are passed to calibration are used.

attribute percentile_value:

Optional[float]. Percentage of values to keep when using histogram percentile.

calibration_method: afe.apis.defines.CalibrationMethod

num_calibration_samples: int | None

afe.core.configs.create_testcase_calibration_configs(num_calibration_samples: int, calibration_method: afe.apis.defines.CalibrationMethod = MinMaxMethod()) → CalibrationConfigs

Construct CalibrationConfigs using parameters from the network test configuration data.

Parameters:

num_calibration_samples – Maximum number of calibration data samples to use for calibration.
calibration_method – CalibrationMethod used in calibration. See the CalibrationMethod Enum class for supported values.

Returns:

Constructed value

afe.core.configs.api_calibration_configs(calibration_method: afe.apis.defines.CalibrationMethod = MinMaxMethod()) → CalibrationConfigs

Construct CalibrationConfigs using user-specified parameters.

Parameters:

calibration_method – CalibrationMethod used in calibration. See the CalibrationMethod Enum class for supported values.
percentile_value – Optional[float]. In case of Histogram percentile observer, configures percentage of values to keep in the histogram.

Returns:

Constructed value

class afe.core.configs.PerfThreshold

Used to store the threshold value for quantized model performance.

abstract set_threshold(fp32_perf: float) → float

class afe.core.configs.RelativePerfThreshold

Used to store the threshold value for quantized model performance. The threshold value is given as a value relative to the floating-point model performance.

Parameters:: rel_value – float. Quantized model performance threshold value relative to the floating-point model performance.

rel_value: float

set_threshold(fp32_perf: float) → float

Returns the threshold for quantized model performance relative to the floating-point model performance.

Parameters:: fp32_perf – float. Floating-point model performance.
Returns:: float. Threshold value relative to floating-point model performance.

class afe.core.configs.AbsolutePerfThreshold

Used to store the threshold value for quantized model performance. The threshold value is given as an absolute value.

Parameters:: abs_value – float. Quantized model performance threshold value.

abs_value: float

set_threshold(fp32_perf: float) → float

Returns the threshold for quantized model performance.

Parameters:: fp32_perf – float. Unused. Floating-point model performance.
Returns:: float. Quantized model performance threshold value.

class afe.core.configs.QuantizationAwarePartitioningConfigs

Config used for quantization-aware partitioning

Attribute

attribute performance_threshold:

PerfThreshold. Value used as a target for quantized model performance. Given either as an absolute value or as a value relative to the floating point model performance.

attribute target_performance_mode:

PerformanceMode. Whether the target performance is given as an absolute value or as a value relative to the floating point model performance.

attribute max_iterations:

int. Maximal number of iterations in the QAP loop. Represents the maximal number of layers that are to be fixed to floating point while performing the quantization-aware partitioning.

attribute graph_analyzer_mode:

QuantizedGraphAnalyzerMode. Graph analysis execution mode.

attribute graph_analyzer_metric:

str. Metric used in graph analysis.

attribute graph_analyzer_number_of_samples:

int. Number of input samples to be used in graph analysis.

performance_threshold: PerfThreshold

max_iterations: int = 1

graph_analyzer_mode: afe.core.graph_analyzer.utils.QuantizedGraphAnalyzerMode

graph_analyzer_metric: afe.core.graph_analyzer.utils.Metric

graph_analyzer_number_of_samples: int = 2

class afe.core.configs.OptimizationConfigs

Class for holding the configuration information used by the OptimizerClass

strategy: str = 'sequential'

calibration_configs: CalibrationConfigs

quantization_configs: QuantizationConfigs

compression_configs: CompressionConfigs

class afe.core.configs.RunConfigs

Configuration parameters for how to execute networks in software emulation.

Attribute fast_mode:: If True, use a fast implementation of an operator. Results may not match the result of executing on the MLA. If False, use an implementation that exactly matches execution on the MLA. Has no effect when use_jax argument is True.
Attribute use_jax:: If True, use a JAX implementation of an operator. JAX implementation results match results of executing an operator on MLA.

fast_mode: bool = False

use_jax: bool = False

class afe.core.configs.ConvertLayoutMethod

Enumeration specifying the layout conversion algorithm.

Common processing flow requires the model to be converted to MLA’s native layout, that is ‘NHWC’. This enumeration specifies what algorithm shall be used for layout conversion. Currently, the ‘legacy’ algorithm is tested and proven on most CNN models. The ‘automated’ algorithm is being developed and is aimed mostly for ViT models. The ‘none’ option is used in internal test cases, and it specifies the processing flow where layout conversion is skipped. It should not be used in the general model processing pipeline.

NONE = 'none': No layout conversion. This is backward-compatible value designating the case where layout conversion transform is disabled in some test cases. Since we would want to run the layout conversion algorithm for all of the test cases, this value would get deprecated in the future.

LEGACY = 'legacy': Legacy algorithm using TVM’s ConvertLayout pass

AUTOMATED = 'automated': Algorithm doing automatic rewrite for MLA suported layouts

static from_str(method: str) → ConvertLayoutMethod

Helper method for constructing an enumeration instance from string.

Parameters:: method – String parameter defining the layout conversion method.
Returns:: The ConvertLayoutMethod Enum instance.

class afe.core.configs.TransformerConfigs(convert_layout_method: ConvertLayoutMethod = ConvertLayoutMethod.LEGACY, enable_graph_partition: bool = True, backend_indices_dict: dict[afe.backends.Backend, list[int | tuple[int, Ellipsis]]] | None = None, enable_quantization_based_partitioning: bool = False, *, requantization_mode: afe.ir.defines.RequantizationMode = RequantizationMode.sima, enabled_backends: afe._tvm._tvm_graph_partition.CompileMode = CompileMode.MLA_EV74_CPU, any_shape_on_mla: bool = True)

Class holding the configuration information used by GraphTransformer and Partitioner.

Attributes

attribute convert_layout_method:: Specifies the algorithm used for converting the model layout.
attribute enable_graph_partition:: bool. Whether to apply graph partitioning on the model.
attribute indices_to_backend_dict:: Optional[Dict[int, afe.backends.Backend]]. Dictionary containing mapping of layer indices to it’s targeted backend, if any. If a layer {index: target_backend} pair is present in the dictionary, it means that the layer with given index will be executed on the target_backend Backend. If the index is absent from the dictionary, it means that the layer with that index will be executed on the highest-priority backend that is supported for that layer.
attribute enable_quantization_based_partitioning:: bool. Flag containing information whether to apply quantization-based partitioning.
attribute requantization_mode:: How to convert TVM quantized operators to SiMa IR quantized operators. Only quantized TVM operators that are assigned the MLA are affected.
attribute enabled_backends:: Which set of backends to assign nodes to in graph partitioning. Any assignment in backend_indices_dict overrides this parameter.
attribute any_shape_on_mla:: If False, partitioning will not assign operators to the MLA if they use a tensor that is not 4D.

Example

The example shows how to create a TransformerConfigs that will convert the layout to NHWC, enable the graph partitioning, and set the node with index = [1, 13, 22] to APU:

backend_indices_dict = {Backend.APU: [1, 13, 22]} transformer_configs = TransformerConfigs(convert_layout=True,

enable_graph_partition=True, backend_indices_dict=backend_indices_dict)

convert_layout_method: ConvertLayoutMethod

enable_graph_partition: bool

indices_to_backend_dict: dict[int, afe.backends.Backend]

enable_quantization_based_partitioning: bool

requantization_mode: afe.ir.defines.RequantizationMode

enabled_backends: afe._tvm._tvm_graph_partition.CompileMode

any_shape_on_mla: bool

property convert_layout: bool

Property defining whether layout conversion algorithm is enabled.

This property is defined due to backward portability issues, and should be used only in some helper test functions (i.e. determining whether model inputs and/or outputs should be transposed during the test run). This property should not be used to define any aspect of tvm transformations, as it would be deprecated in near future.

Returns:: Boolean flag determining whether layout conversion algorithm is run during tvm transformations.

class afe.core.configs.AfeProcessingConfigs

Dataclass holding all the configuration information used in end-to-end processing.

Attributes

attribute model_configs:: ModelConfigs. Configuration information on model that is being processed.
attribute transformer_configs:: TransformerConfigs. Configuration information on transformations being used in model processing.
attribute optimization_configs:: OptimizationConfigs. Configuration information on optimizations being used in processing.
attribute qap_configs:: QuantizationAwarePartitioningConfigs. Configuration information being used in quantization-aware partitioning algorithm.
attribute target:: A target platform that a model is compiled for.

model_configs: ModelConfigs

transformer_configs: TransformerConfigs

optimization_configs: OptimizationConfigs

qap_configs: QuantizationAwarePartitioningConfigs

target: sima_utils.common.Platform