afe.ir.quantization_convο
Quantization functions for convolution and matrix multiply.
Attributesο
Classesο
The precision to use for quantizing convolution. This determines |
|
Adjustable requantization for convolution. |
|
Adjustable quantization parameters for convolution or matrix multiply. |
|
Quantization parameters that are fixed at the beginning of the quantization algorithm, such that |
Functionsο
|
Reshape a weight tensor so that its last axis corresponds to a convolution operation's |
|
Get the numeric range that should be used when quantizing numbers |
|
Decompose x into a power-of-2 part i and a fractional part f such that |
|
Find powers of 2 that normalize each element of x to the range (0.5, 1.0]. |
|
Calculate a scalar quantization scale for a convolution or matrix multiply weight tensor. |
|
Calculate a quantization scale for a convolution or matrix multiply weight tensor. |
|
Choose quantization parameters for a generalized matrix multiply based on |
|
Retry the backtracking computation in f until it succeeds. |
|
Adjust the convolution plan where the weights would be zero after quantization. |
|
Set backtracking_parameters.intrinsic_shift_adjustment to True where positions is True. |
|
Adjust the convolution plan where the shift value is out of range or where |
|
Adjust the convolution plan where the integer convolution result |
|
Adjust the quantization parameters based on zero values, |
|
Create a quantized weight tensor. |
|
Quantize a bias tensor. If it can't be quantized due to integer overflow, |
|
Calculate the result of quantized generalized matrix multiply when the input is filled |
|
Calculate the zero point correction to add to the convolution or matrix multiply's |
|
Select quantized parameters for convolution or matrix multiply. |
Get quantized weights and bias if present and requantization. |
Module Contentsο
- afe.ir.quantization_conv.reshape_weight_to_output_channels(weight: numpy.ndarray) numpy.ndarray [source]ο
Reshape a weight tensor so that its last axis corresponds to a convolution operationβs output channel axis. That is, the convolutionβs output at a given channel output[β¦, c] depends on reshaped_weights[β¦, c], bias[c], and some values from the convolutionβs input. This tensor shape is useful for code that computes per-channel information or does per-channel scaling on weights.
- afe.ir.quantization_conv.get_quantization_range(dtype: afe.ir.tensor_type.ScalarType | numpy.number, asymmetry: bool) Tuple[int, int] [source]ο
Get the numeric range that should be used when quantizing numbers to be stored using dtype. The range is the entire value range when using asymmetric quantization, and is reduced to a symmetric range when using symmetric quantization.
- Parameters:
dtype β Quantized data type. It must be a signed integer type.
asymmetry β Whether to use an asymmetric range
- Returns:
Numeric range
- afe.ir.quantization_conv.decompose_power_of_2(x: ChannelScale, rounding: ml_kernels.math_helpers.RoundType) Tuple[ChannelShift, ChannelScale] [source]ο
Decompose x into a power-of-2 part i and a fractional part f such that
x = f * 2**i
- The range of f is selected based on how i is rounded:
UPWARD: 0.5 < f <= 1 TONEAREST: sqrt(0.5) <= f <= sqrt(2) TRUNC: 1 <= f < 2
Where x is 0, f and i will be 0.
- Parameters:
x β Number to decompose
rounding β How to round the exponent
- Returns:
Decomposed values (i, f)
- afe.ir.quantization_conv.normalize_with_pow2(x: ChannelScale) Tuple[ChannelShift, ChannelScale] [source]ο
Find powers of 2 that normalize each element of x to the range (0.5, 1.0].
- Parameters:
x β Scale factors to normalize
- Returns:
Tuple (i, y) of exponents and normalized scale factors satisfying x = y * 2**i.
- afe.ir.quantization_conv.weight_single_quantization_scale(weight: numpy.ndarray, bits: int = 8) float [source]ο
Calculate a scalar quantization scale for a convolution or matrix multiply weight tensor.
- Parameters:
weight β Floating-point weight tensor
bits β Number of bits used for quantization
- Returns:
Quantization scale. It has the same meaning as the scale field of class Quantization.
- afe.ir.quantization_conv.weight_quantization_scale(weight: numpy.ndarray, per_channel: bool, bits: int = 8) ChannelScale [source]ο
Calculate a quantization scale for a convolution or matrix multiply weight tensor.
- Parameters:
weight β Floating-point weight tensor
per_channel β Whether to do per-channel quantization
bits β Number of bits to be used
- Returns:
Quantization scale.
- class afe.ir.quantization_conv.ConvolutionPrecision[source]ο
The precision to use for quantizing convolution. This determines how quantization does some calculations and chooses which integer type to use. Some choices (such as sima_int8) completely determine the integer type, while others do not.
- has_multiplier() bool [source]ο
Return true if this quantization method can use a TFLite multiplier other than 1. Return False if it uses ArithFoldedRequantization or forces the multiplier to be 1.
- has_zp_correction() bool [source]ο
Return true if this quantization method can use a zero point correction other than 0.
- class afe.ir.quantization_conv.ConvPlanRequantization(scale: ChannelScale, shift: ChannelShift, multiplier: ChannelQScale)[source]ο
Adjustable requantization for convolution. This class holds the requantization as both a floating-point number and a quantized representation. When these values are modified, they are kept consistent (modulo rounding) with the formula
scale = multiplier * (2**-shift)
- Parameters:
scale β Requantization scale as a floating-point value.
shift β Right shift to perform. Its shape must be the same as scaleβs.
multiplier β Integer multiplier to use. Its shape must be either () or the same as scaleβs.
- deepcopy() ConvPlanRequantization [source]ο
Make an independent copy of this object.
- class afe.ir.quantization_conv.ConvPlanQuantizations[source]ο
Adjustable quantization parameters for convolution or matrix multiply. This class holds parameters that may be modified while deciding how to quantize the calculation.
The parameters relate a real-number calculation
c = a * w + b
to a quantized calculation (the actual calculation is not selected here, and it may be different from this formula)
Qc = S * (Qa * Qw) / 2^h + constant_terms
by
Qw = w * Sw Qa = a * Sa Qc = c * Sc + Zc S = 2^h * Sc / (Sa * Sw).
The factor of 2^h is a right-shift that is included in the integer convolution.
- Parameters:
weight β Scale factor Sw relating real weight w to quantized weight Qw. It may contain 0.
output β Quantization (Sc, Zc) relating real output c to quantized output Qc
requant β Requantization S relating quantized product to output Qc
intrinsic_shift β Right-shift h, used to produce an additional scale factor in the convolution product
- requant: ConvPlanRequantization[source]ο
- deepcopy() ConvPlanQuantizations [source]ο
Make an independent copy of this object.
- set_intrinsic_shift(value: numpy.ndarray)[source]ο
Set the intrinsic shift, h, to the given value.
- set_weight_zero(positions: numpy.ndarray)[source]ο
Set the weight scale, Sw, to 0 at the given channel positions.
- set_requant_one(positions: numpy.ndarray)[source]ο
Set the requantization scale to 1 at the given channel positions.
- scale_weight_pow2(exponent: numpy.ndarray | int)[source]ο
Multiply the weight quantization scale, Sw, by 2**exponent.
- afe.ir.quantization_conv.select_convolution_scales(weight: numpy.ndarray, input_quant: afe.ir.defines.Quantization, output_distribution: afe.ir.attributes.ObservedDistribution, *, precision: ConvolutionPrecision, asymmetry: bool, per_channel: bool) ConvPlanQuantizations [source]ο
Choose quantization parameters for a generalized matrix multiply based on the inputβs quantization and the optimal quantization of the weight and output.
This choice does not account for value ranges of other integer constants and intermediate results. Those should be handled separately.
- Parameters:
weight β A weight tensor.
input_quant β Quantization that was selected for the input of generalized matrix multiply.
output_distribution β Value distribution of the output of generalized matrix multiply.
precision β Precision to quantize for.
asymmetry β Whether to use asymmetric quantization.
per_channel β Whether to do per-channel quantization. If true, the scales will be a tensor with one value per channel. If false, the scales will be scalars.
- Returns:
Weight tensor scale, requantization scale, and quantization of the convolution output.
- class afe.ir.quantization_conv.ConvBacktrackingParameters[source]ο
Quantization parameters that are fixed at the beginning of the quantization algorithm, such that the algorithm has to restart if they are changed. These values may be modified in the backtracking loop.
- Parameters:
precision β Precision to use for output calculations.
relu_fallback_precision β Alternative precision to use if βprecisionβ canβt be supported due to limitations in the backendβs implementation of ReLU. If this is None, βprecisionβ is assumed to be fully supported.
intrinsic_shift_adjustment β Locations where extra right-shift is used with the int15 convolution algorithm. When the input is int8, it must be a 0D array of False. It is an array of bool, where True means to use extra right-shift. It is 0D for per-tensor or 1D for per-channel.
weight_adjustment β Extra right-shift applied to weights. Values greater than zero reduce the weightβs precision to fewer than 8 bits. It is an array of int. It is 0D for per-tensor or 1D for per-channel.
- precision: ConvolutionPrecision[source]ο
- relu_fallback_precision: ConvolutionPrecision | None[source]ο
- static default_intrinsic_shift_adjustment(n_channels: int, per_channel: bool, use_int15: bool) numpy.ndarray [source]ο
Default value of intrinsic shift. The default is not to use any extra right-shift.
- Parameters:
n_channels β Number of channels in the convolution output
per_channel β Whether per-channel quantization is used
use_int15 β Whether the int15 convolution algorithm is used
- Returns:
Default value of intrinsic shift
- static default_weight_adjustment(n_channels: int, per_channel: bool) numpy.ndarray [source]ο
Default weight adjustment. The default is not to use any extra right-shift.
- Parameters:
n_channels β Number of channels in the convolution output
per_channel β Whether per-channel quantization is used
- Returns:
Default value of weight adjustment
- afe.ir.quantization_conv.run_backtracking_loop(f: Callable[[afe.ir.defines.NodeReporter], _A], backtracking_limit: int, backtracking_error_message: str, error_reporter: afe.ir.defines.NodeReporter | None = None) _A [source]ο
Retry the backtracking computation in f until it succeeds.
The callable object in f represents a restartable function that uses some mutable state to represent its starting condition. It may update its mutable state and raise a _Retry exception to restart; the state change should help it make progress after it restarts. It may return a value to end the loop.
- Parameters:
f β Backtracking computation to run
backtracking_limit β Maximum number of times to attempt f. If f is attempted this many times without returning a result, an exception will be raised.
backtracking_error_message β Error message to use if f does not return.
error_reporter β Used for reporting errors.
- Returns:
Return value of f.
- afe.ir.quantization_conv.adjust_plan_zero_weights(weights: numpy.ndarray, quantizations: ConvPlanQuantizations, per_channel: bool, error_reporter: afe.ir.defines.NodeReporter)[source]ο
Adjust the convolution plan where the weights would be zero after quantization.
- Parameters:
weights β Floating-point weights.
quantizations β Quantization parameters. Will be modified.
per_channel β Whether to do per-channel quantization.
error_reporter β Error reporter used for quantization warnings.
- afe.ir.quantization_conv.try_increase_intrinsic_shift(backtracking_parameters: ConvBacktrackingParameters, positions: numpy.ndarray) None [source]ο
Set backtracking_parameters.intrinsic_shift_adjustment to True where positions is True. Raise _Retry() if any backtracking parameters were changed.
- Parameters:
backtracking_parameters β Mutable variables for backtracking. May be modified.
positions β Array of bool, containing True where the intrinsic shift adjustment should be set to True.
- afe.ir.quantization_conv.try_adjust_plan_shift_value(backtracking_parameters: ConvBacktrackingParameters, quantizations: ConvPlanQuantizations, use_int15: bool, error_reporter: afe.ir.defines.NodeReporter) None [source]ο
Adjust the convolution plan where the shift value is out of range or where the shift is so large that it causes severe precision loss. Raise _Retry() if any backtracking parameters were changed.
- Parameters:
backtracking_parameters β Mutable variables for backtracking. May be modified.
quantizations β Quantization parameters. May be modified.
use_int15 β Whether the plan is for int15 convolution.
error_reporter β Error reporter used for quantization warnings.
- afe.ir.quantization_conv.try_adjust_plan_product_value(backtracking_parameters: ConvBacktrackingParameters, quantizations: ConvPlanQuantizations, use_int15: bool, error_reporter: afe.ir.defines.NodeReporter) None [source]ο
Adjust the convolution plan where the integer convolution result is not in the representable range. Raise _Retry() if any backtracking parameters were changed.
- Parameters:
backtracking_parameters β Mutable variables for backtracking. May be modified.
quantizations β Quantization parameters. May be modified.
use_int15 β Whether the plan is for int15 convolution.
error_reporter β Error reporter used for quantization warnings.
- afe.ir.quantization_conv.quantize_convolution_scales(quantizations: ConvPlanQuantizations, precision: ConvolutionPrecision, allow_full_output_precision: bool) Tuple[ChannelScale, ChannelScale, ml_kernels.requantization.BaseRequantization[numpy.ndarray], afe.ir.tensor_type.ScalarType, afe.ir.defines.Quantization] [source]ο
Adjust the quantization parameters based on zero values, limits on integer constants, and limits on integer intermediate results.
The final choice of weight scale, bias scale, requantization, and output quantization are returned.
- Parameters:
quantizations β Quantization parameters.
precision β The precision to use for quantizing convolution.
allow_full_output_precision β Whether 16-bit precision can be widened to 32-bit output. If false, quantizing with 16-bit precision will always produce 16-bit output.
- Returns:
New quantization scale of weights, requantization to perform after convolution, type of output, and quantization of output.
- afe.ir.quantization_conv.quantize_weight_tensor(weight: numpy.ndarray, weight_scale: ChannelScale, bits: int = 8) Tuple[numpy.ndarray, numpy.ndarray] [source]ο
Create a quantized weight tensor.
- Parameters:
weight β np.ndarray, weights value being quantized
weight_scale β np.ndarray Scale of the weights.
bits β Number of bits used for quantized weights.
- Returns:
Tuple of np.ndarray. First returned value is quantized weights, while fake_quantized weights are calculated by dividing quantized weights by scale, thus returning them to similar fp32 values, and exposing quantization difference that is caused by rounding and clipping during quantization.
- afe.ir.quantization_conv.try_quantize_bias_tensor(backtracking_parameters: ConvBacktrackingParameters, bias: numpy.ndarray | None, zp_correction: numpy.ndarray, bias_scale: ChannelScale, use_int15: bool, per_channel: bool) numpy.ndarray [source]ο
Quantize a bias tensor. If it canβt be quantized due to integer overflow, adjust backtracking parameters. Raise _Retry() if any backtracking parameters were changed.
- Parameters:
backtracking_parameters β Mutable variables for backtracking. May be modified.
bias β Floating-point bias tensor.
zp_correction β Integer zero point correction to be added to the bias. This may include correction for the input zero point and/or output zero point, depending on the quantization scheme.
bias_scale β Quantization scale to use for bias.
use_int15 β Whether int15 convolution is used.
per_channel β Whether per-channel quantization is used.
- Returns:
Quantized bias tensor.
- afe.ir.quantization_conv.quantized_product_zero_value(q_weight: numpy.ndarray, zero_point: int, intrinsic_shift: numpy.ndarray | int) numpy.ndarray [source]ο
Calculate the result of quantized generalized matrix multiply when the input is filled with the zero point value. This represents the zero point result, which should be subtracted to get the true product.
- Parameters:
q_weight β Quantized weight tensor
zero_point β Zero point of input tensor
intrinsic_shift β Right-shift that is performed by the convolution algorithm.
- Returns:
Convolution result as a 1D tensor
- afe.ir.quantization_conv.output_zp_correction_in_bias(precision: ConvolutionPrecision, output_quant: afe.ir.defines.Quantization, requantization: ml_kernels.requantization.BaseRequantization[numpy.ndarray]) int [source]ο
Calculate the zero point correction to add to the convolution or matrix multiplyβs bias array so that the output has the desired quantization.
If the convolution will not combine zero point correction with bias, but instead will do two separate additions, then the result is 0. Otherwise, the result is the outputβs zero point, scaled based on the requantization.
- Parameters:
precision β Convolution precision type
output_quant β Quantization of convolutionβs output
requantization β Requantization that is performed at the end of convolution
- Returns:
Zero point correction that should be added to the bias array
- afe.ir.quantization_conv.quantize_convolution_parameters(input_quant: afe.ir.defines.Quantization, output_distribution: afe.ir.attributes.ObservedDistribution, weight: numpy.ndarray, bias: numpy.ndarray | None, *, per_channel: bool, bias_corrector: afe.ir.bias_correction.BiasCorrector, asymmetry: bool, use_int15: bool, use_sima_relu_workaround: bool, precision: ConvolutionPrecision, allow_full_output_precision: bool, error_reporter: afe.ir.defines.NodeReporter | None = None) Tuple[numpy.ndarray, numpy.ndarray, ml_kernels.requantization.BaseRequantization[numpy.ndarray], afe.ir.tensor_type.ScalarType, afe.ir.defines.Quantization, bool] [source]ο
Select quantized parameters for convolution or matrix multiply.
- Parameters:
input_quant β Quantization that was selected for the input of convolution.
output_distribution β Value distribution of the output of convolution.
weight β Weight tensor.
bias β A bias tensor. If it is None, a bias tensor will still be returned containing the bias correction that was introduced by quantization.
per_channel β Whether to do per-channel quantization. If true, the scale will be a tensor with one value per channel.
bias_corrector β How to calculate a bias correction term.
use_int15 β Whether to quantize for the int15 convolution algorithm. If false, quantize for the int8 convolution algorithm.
use_sima_relu_workaround β Whether to use a workaround for int8 SiMa quantization with relu activation. If True, and relu cannot be executed by the backend, then use TFLite quantization. This parameter is only relevant when precision is sima_int8 or sima_int16, and it must be False otherwise.
precision β The precision to use for quantizing convolution output.
allow_full_output_precision β Whether 16-bit precision can be widened to 32-bit output. If false, quantizing with 16-bit precision will always produce 16-bit output.
error_reporter β Used for warnings about bad quantization.
- Returns:
A tuple containing the chosen quantization-related parameters: the quantized weight tensor, the quantized bias tensor, the requantization, the scalar type of the output, the quantization of the output, and the msb_left_shift flag value.
- afe.ir.quantization_conv.get_bfloat16_with_int_weights_quant_params(attrs: afe.ir.attributes.ConvAddActivationAttrs, per_channel: bool, bits: int) tuple[numpy.ndarray, numpy.ndarray | None, ml_kernels.requantization.BaseRequantization] [source]ο
Get quantized weights and bias if present and requantization. Weights are quantized to int8 or int4 and bias if present is unquantized, this allows the requantization scale factor to be just 1/weight_scale as requantization is done after adding bias.
- Parameters:
attrs β Weights.
per_channel β Whether per-channel quantization scheme is used for weights.
bits β Number of bits to be used.
- Returns:
Quantized weights, Optional(quantized bias) and requantization.