afe.apis.defines

This file contains definitions of the types exposed by the development API for AFE.

Attributes

InputValues

gen1_target

gen2_target

BT_COLOR_COEFF

YUV2RGB_FULL_RANGE_CONSTANTS

default_quantization

Default quantization parameters for model quantization.

int16_quantization

Int16 quantization parameters for model quantization.

Classes

ExceptionFuncType

Generic enumeration.

ColorSpaceStandard

Color space standards for YUV and RGB conversion.

ColorConversion

Color conversion direction.

ChromaSampling

Chroma sub-sampling representation.

ResizeMethod

Interpolation method used in resize transform.

ResizeDepositLocation

Deposit location of resized image in padded frame.

CalibrationMethod

Represents a calibration method for model quantization.

MinMaxMethod

Represents a calibration method for model quantization.

HistogramMSEMethod

Represents the histogram MSE calibration method for quantization.

MovingAverageMinMaxMethod

Represents the moving average min-max calibration method for quantization.

HistogramEntropyMethod

Represents the histogram entropy calibration method for quantization.

HistogramPercentileMethod

Represents the histogram percentile calibration method for quantization.

QuantizationScheme

Quantization scheme.

QuantizationParams

Parameters controlling how to quantize a network.

Functions

default_calibration(β†’Β CalibrationMethod)

quantization_scheme(β†’Β QuantizationScheme)

Constructs a quantization scheme, which determines the range of quantizations that a quantization algorithm may choose from.

bfloat16_scheme(β†’Β QuantizationScheme)

Constructs a bfloat16 quantization scheme.

Module Contents

afe.apis.defines.InputValues[source]
afe.apis.defines.gen1_target[source]
afe.apis.defines.gen2_target[source]
class afe.apis.defines.ExceptionFuncType[source]

Generic enumeration.

Derive from this class to define new enumerations.

LOADED_NET_LOAD[source]
LOADED_NET_EXECUTE[source]
LOADED_NET_QUANTIZE[source]
LOADED_NET_CONVERT[source]
MODEL_EXECUTE[source]
MODEL_SAVE[source]
MODEL_LOAD[source]
MODEL_COMPILE[source]
MODEL_CREATE_AUXILIARY[source]
MODEL_COMPOSE[source]
MODEL_EVALUATE[source]
MODEL_PERFORMANCE[source]
GENERATE_ELF_FILES[source]
QUANTIZATION_ERROR_ANALYSIS[source]
class afe.apis.defines.ColorSpaceStandard[source]

Color space standards for YUV and RGB conversion. BT601 is for SD video; BT709 is for HD video; BT2020 is for HDR.

BT601 = 'BT601'[source]
BT709 = 'BT709'[source]
BT2020 = 'BT2020'[source]
afe.apis.defines.BT_COLOR_COEFF: Dict[ColorSpaceStandard, List[float]][source]
afe.apis.defines.YUV2RGB_FULL_RANGE_CONSTANTS: Dict[str, List[float]][source]
class afe.apis.defines.ColorConversion[source]

Color conversion direction.

YUV2RGB = 'YUV2RGB'[source]
RGB2YUV = 'RGB2YUV'[source]
BGR2RGB = 'BGR2RGB'[source]
RGB2BGR = 'RGB2BGR'[source]
class afe.apis.defines.ChromaSampling[source]

Chroma sub-sampling representation.

NV12 = 'NV12'[source]
YUV420 = 'YUV420'[source]
YUV422 = 'YUV422'[source]
class afe.apis.defines.ResizeMethod[source]

Interpolation method used in resize transform.

LINEAR = 'linear'[source]
NEAREST = 'nearest'[source]
AREA = 'area'[source]
CUBIC = 'cubic'[source]
class afe.apis.defines.ResizeDepositLocation[source]

Deposit location of resized image in padded frame.

TOPLEFT = 'topleft'[source]
CENTER = 'center'[source]
BOTTOMRIGHT = 'bottomright'[source]
class afe.apis.defines.CalibrationMethod[source]

Represents a calibration method for model quantization.

The CalibrationMethod class defines a base structure for various calibration techniques used during the quantization process. Each method is identified by a unique name and can be instantiated using the from_str method.

property name[source]
static from_str(method: str)[source]

Creates a calibration method based on the provided method name.

Supported Methods:
  • MIN_MAX / min_max: Uses the minimum and maximum values of the dataset to determine the quantization range.

  • MSE / mse: Utilizes a histogram-based method that minimizes the mean squared error (MSE) between the original and quantized values. This method uses 2048 histogram bins for precise calibration.

  • MOVING_AVERAGE / moving_average: Computes the quantization range by maintaining a moving average of the observed min and max values during calibration.

  • HISTOGRAM_ENTROPY / entropy: Employs an entropy-based approach to find the optimal threshold for quantization by minimizing information loss. It uses 512 histogram bins.

  • HISTOGRAM_PERCENTILE / percentile: Sets the quantization range based on a specified percentile of the distribution (defaulting to the 99.9th percentile). This method helps to ignore outliers and uses 1024 histogram bins.

Parameters:

method (str) – The name of the calibration method to use.

Returns:

The corresponding calibration method configured with default parameters.

Return type:

CalibrationMethod

Raises:

UserFacingException – If an unsupported calibration method is specified.

class afe.apis.defines.MinMaxMethod[source]

Represents a calibration method for model quantization.

The CalibrationMethod class defines a base structure for various calibration techniques used during the quantization process. Each method is identified by a unique name and can be instantiated using the from_str method.

class afe.apis.defines.HistogramMSEMethod(num_bins)[source]

Represents the histogram MSE calibration method for quantization.

The HistogramMSEMethod records the running histogram of tensor values during calibration. It searches for the optimal min and max values based on the histogram distribution to minimize the mean squared error (MSE) between the quantized model and the floating-point model. These optimal values are then used to compute the quantization parameters in the same way as the MIN_MAX method.

By default, the number of bins used for histogram calculation is 2048 when instantiated via the from_str() method. To customize the number of bins, use the HistogramMSEMethod(num_bins) method.

Parameters:

num_bins (int) – The number of bins used for histogram calculation. Default is 2048.

Returns:

The configured histogram MSE calibration method.

Return type:

HistogramMSEMethod

num_bins: int[source]
class afe.apis.defines.MovingAverageMinMaxMethod[source]

Represents the moving average min-max calibration method for quantization.

The MovingAverageMinMaxMethod records the running histogram of tensor values during calibration. It searches for the optimal min and max values based on the distribution of the histogram to minimize the mean squared error (MSE) between the quantized model and the floating-point model. These optimal values are then used to compute the quantization parameters similarly to the MIN_MAX method.

By default, the number of bins used for histogram calculation is 2048 when instantiated via the from_str() method. To customize the number of bins, use the HistogramMSEMethod(num_bins) instead.

_name[source]

The internal name of the calibration method, set to 'moving_average'.

Type:

str

class afe.apis.defines.HistogramEntropyMethod(num_bins)[source]

Represents the histogram entropy calibration method for quantization.

The HistogramEntropyMethod records the running histogram of tensor values during calibration. It searches for the optimal min and max values based on the distribution of the histogram to minimize the Kullback-Leibler (KL) divergence (relative entropy) between the quantized model and the floating-point model. These optimal values are then used to compute the quantization parameters in the same way as the MIN_MAX method.

By default, the number of bins used for histogram calculation is 512 when instantiated via the from_str() method. To customize the number of bins, use the HistogramEntropyMethod(num_bins) method.

num_bins[source]

The number of bins used for histogram calculation.

Type:

int

_name[source]

The internal name of the calibration method, set to 'entropy'.

Type:

str

num_bins: int[source]
class afe.apis.defines.HistogramPercentileMethod(percentile_value, num_bins)[source]

Represents the histogram percentile calibration method for quantization.

The HistogramPercentileMethod records the running histogram of tensor values during calibration. It determines the optimal min and max values based on the specified percentile of the histogram distribution. These values are then used to compute the quantization parameters in the same way as the MIN_MAX method.

When instantiated using the from_str method, the default values are: - percentile_value: 99.9 - num_bins: 1024

To use custom values for the percentile or the number of bins, this class should be instantiated directly.

Parameters:
  • percentile_value (float) – The select percentile for determining the quantization range. Defaults to 99.9.

  • num_bins (int) – The number of bins used for histogram calculation. Defaults to 1024.

percentile_value: float[source]
num_bins: int[source]
afe.apis.defines.default_calibration() CalibrationMethod[source]
class afe.apis.defines.QuantizationScheme[source]

Quantization scheme.

Parameters:
  • asymmetric – Whether to use asymmetric quantization.

  • per_channel – Whether to use per-channel quantization.

  • bits – Number of bits of precision to use in the quantized representation

  • bf16 – Whether to use bfloat16. If True, then asymmetric, per_channel, and bits are ignored.

asymmetric: bool[source]
per_channel: bool[source]
bits: int = 8[source]
bf16: bool = False[source]
afe.apis.defines.quantization_scheme(asymmetric: bool, per_channel: bool, bits: int = 8) QuantizationScheme[source]

Constructs a quantization scheme, which determines the range of quantizations that a quantization algorithm may choose from.

Parameters:
  • asymmetric (bool) – Required. Specifies whether to use asymmetric (versus symmetric) quantization.

  • per_channel (bool) – Required. Specifies whether to use per-channel (versus per-tensor) quantization.

  • bits (int, optional) – The number of bits of precision to use for the quantized representation of activations. Must be either 8 (for int8) or 16 (for int16). Defaults to 8. The quantization of weights is fixed as int8.

Returns:

The defined quantization scheme configured with the specified parameters.

Return type:

QuantizationScheme

afe.apis.defines.bfloat16_scheme() QuantizationScheme[source]

Constructs a bfloat16 quantization scheme. It directs the compiler to use bfloat16 instead of integer quantization.

class afe.apis.defines.QuantizationParams[source]

Parameters controlling how to quantize a network.

Parameters:
  • calibration_method – Calibration method.

  • activation_quantization_scheme – Quantization scheme for activation tensors.

  • weight_quantization_scheme – Quantization scheme for weights tensors.

  • requantization_mode – A way of doing quantized arithmetic.

  • node_names – Nodes to prevent from quantizing.

  • custom_quantization_configs – Dictionary setting the node’s custom quantization options.

  • biascorr_type – Selection of bias correction: regular/iterative/none

  • channel_equalization – If True, channel equalization is enabled.

  • smooth_quant – If True, smooth quant is enabled.

calibration_method: CalibrationMethod[source]
activation_quantization_scheme: QuantizationScheme[source]
weight_quantization_scheme: QuantizationScheme[source]
requantization_mode: afe.ir.defines.RequantizationMode[source]
node_names: Set[str][source]
custom_quantization_configs: Dict[afe.ir.defines.NodeName, Dict[str, Any]] | None = None[source]
biascorr_type: afe.ir.defines.BiasCorrectionType[source]
channel_equalization: bool = False[source]
smooth_quant: bool = False[source]
with_calibration(method: CalibrationMethod) QuantizationParams[source]

Sets the calibration method for activation tensors.

This method configures the calibration approach for activation tensors during quantization. An observer is inserted at each layer to collect statistics of the output tensors. The derived min and max values from these statistics are used to compute the quantization parameters (scale and zero point) for each layer.

Parameters:

method (CalibrationMethod) – Required. The calibration method to use. Supported methods include various approaches to determine the optimal quantization range based on tensor statistics.

Returns:

A new instance of quantization parameters with the updated calibration method.

Return type:

QuantizationParams

with_activation_quantization(scheme: QuantizationScheme) QuantizationParams[source]

Sets the quantization scheme for activation tensors.

For activations, per-channel quantization is not supported. With per-tensor quantization, the asymmetric flag in the scheme can be set to either True or False to define the quantization behavior.

Parameters:

scheme (QuantizationScheme) – Required. The quantization scheme to be applied for the model activations.

Returns:

A new instance of quantization parameters with the updated activation quantization scheme.

Return type:

QuantizationParams

with_weight_quantization(scheme: QuantizationScheme) QuantizationParams[source]

Sets the quantization scheme for weight tensors.

For weights, the asymmetric quantization scheme is not supported. With symmetric quantization, the per_channel flag can be set to True or False to define the quantization behavior for weights.

Parameters:

scheme (QuantizationScheme) – Required. The quantization scheme to be applied for the model weights.

Returns:

A new instance of quantization parameters using the chosen weight quantization scheme.

Return type:

QuantizationParams

with_requantization_mode(requantization_mode: afe.ir.defines.RequantizationMode)[source]

Sets the requantization mode for convolutions.

Two requantization modes are supported:

  • RequantizationMode.sima: Uses arithmetic optimized for fast performance on SiMa’s accelerator. This is the default mode.

  • RequantizationMode.tflite: Uses TFLite’s arithmetic with an 8-bit constant multiplier.

Parameters:

requantization_mode (RequantizationMode) – Required. The requantization mode to be applied.

Returns:

A new instance of quantization parameters with the updated requantization mode.

Return type:

QuantizationParams

with_unquantized_nodes(node_names: Set[str]) QuantizationParams[source]

Selects nodes to prevent from quantizing.

Nodes with the specified names will be excluded from the quantization process. This replaces the set of node names selected by any previous call to with_unquantized_nodes. Note that node names can be sensitive to changes in optimization or quantization settings, as some nodes may be created or renamed by the compiler.

Parameters:

node_names (Set[str]) – Required. A set of strings specifying the names of nodes that should not be quantized.

Returns:

A new instance of quantization parameters with the updated unquantized node configuration.

Return type:

QuantizationParams

with_custom_quantization_configs(custom_quantization_configs: Dict[afe.ir.defines.NodeName, Dict[str, Any]])[source]

Sets custom quantization options for specific nodes.

The custom_quantization_configs is a dictionary where each key is a node name, and the corresponding value is a dictionary defining custom quantization options. This method is typically used in the following scenarios:

  1. Enable the int32 output of the last convolution node.

  2. Enable mixed-precision quantization.

Note: Users must obtain the node names from the SiMa IR graph. To do this, perform int8 quantization of a model and inspect the .sima.json file using Netron to identify node names.

Parameters:

custom_quantization_configs (Dict[NodeName, Dict[str, Any]]) – A dictionary where each key is a node name and the value is a dictionary of custom quantization settings for that node.

Returns:

A new instance of quantization parameters with the custom quantization configuration applied.

Return type:

QuantizationParams

with_bias_correction(enable: bool | afe.ir.defines.BiasCorrectionType = True)[source]

Enables or disables bias correction for the quantization of convolutions with a bias.

Bias correction calculates a bias term based on the observed input mean and the quantized weights. This term is then added to the convolution output to compensate for quantization errors. The algorithm is described in detail in Section 4.2 of the referenced paper.

Parameters:

enable (bool | BiasCorrectionType) – Required. Determines whether bias correction is enabled or disabled. - True: Enables regular bias correction. - False: Disables bias correction. - BiasCorrectionType: Allows specifying a custom bias correction type.

Returns:

A new instance of quantization parameters with bias correction enabled or disabled as specified.

Return type:

QuantizationParams

with_channel_equalization(enable: bool = True)[source]

Enables or disables channel equalization for the quantization parameters.

Channel equalization is a preprocessing step that aims to balance the distribution of weight tensors across different channels, which can enhance the accuracy of quantized models.

Parameters:

enable (bool, optional) – Specifies whether to enable channel equalization. Defaults to True.

Returns:

A new instance of quantization parameters with the updated channel equalization setting.

Return type:

QuantizationParams

with_smooth_quant(enable: bool = True)[source]
afe.apis.defines.default_quantization: QuantizationParams[source]

Default quantization parameters for model quantization.

This configuration can be used as a baseline for quantizing a neural network using quantize_net or as a starting point for customizing quantization parameters. It specifies default settings for calibration methods, activation and weight quantization schemes, and the requantization mode.

afe.apis.defines.calibration_method

The calibration method used for quantization. Defaults to MSE().

Type:

CalibrationMethod

afe.apis.defines.activation_quantization_scheme

Defines the quantization scheme for activations. - Asymmetric: True - Per-channel: False - Bits: 8

Type:

QuantizationScheme

afe.apis.defines.weight_quantization_scheme

Defines the quantization scheme for weights. - Asymmetric: False - Per-channel: True` - Bits: ``8

Type:

QuantizationScheme

afe.apis.defines.requantization_mode

The mode used for requantization. Defaults to RequantizationMode.sima.

Type:

RequantizationMode

afe.apis.defines.node_names

A set of node names to apply custom quantization configurations. Defaults to an empty set ({β€˜β€™}).

Type:

set

afe.apis.defines.custom_quantization_configs

Custom configurations for specific nodes. Defaults to None.

Type:

Optional[Dict]

Returns:

The default quantization configuration for model quantization.

Return type:

QuantizationParams

afe.apis.defines.int16_quantization: QuantizationParams[source]

Int16 quantization parameters for model quantization.

This configuration is designed for quantizing neural networks where activations use 16-bit precision and weights use 8-bit precision. It can be used with quantize_net as a baseline for models requiring higher precision for activations, while maintaining efficient weight quantization.

afe.apis.defines.calibration_method

The calibration method used for quantization. Defaults to MSE().

Type:

CalibrationMethod

afe.apis.defines.activation_quantization_scheme

Defines the quantization scheme for activations. - Asymmetric: True - Per-channel: False - Bits: 16

Type:

QuantizationScheme

afe.apis.defines.weight_quantization_scheme

Defines the quantization scheme for weights. - Asymmetric: False - Per-channel: True - Bits: 8

Type:

QuantizationScheme

afe.apis.defines.requantization_mode

The mode used for requantization. Defaults to RequantizationMode.sima.

Type:

RequantizationMode

afe.apis.defines.node_names

A set of node names to apply custom quantization configurations. Defaults to an empty set ({''}).

Type:

set

afe.apis.defines.custom_quantization_configs

Custom configurations for specific nodes. Defaults to None.

Type:

Optional[Dict]

Returns:

The quantization configuration using int16 for activations and int8 for weights.

Return type:

QuantizationParams