afe.tvm_converter.quantization

Quantization code that is specific to the TVM converter.

Functions

`correction_factors`(→ Tuple[float, float, int])	Determine correction factors for requantizing from input_q to output_q.
`requantize_qnn_convolution_dense`(...)	Convert constant parameters from a relay IR quantized convolution/dense, bias-add,
`tflite_requantization_constants`(...)	Compute constants for TFLite-style requantization.

Module Contents

afe.tvm_converter.quantization.correction_factors(input_q: afe.ir.defines.Quantization, output_q: afe.ir.defines.Quantization) → Tuple[float, float, int]

Determine correction factors for requantizing from input_q to output_q.

The correction factors consist of a scale correction sc, zero point correction zc, and shift n such that

output = (input * sc + zc) * 2**-n

and sc is in the range 0.5 to 1.

Parameters:

input_q – Quantization of data prior to requantization
output_q – Quantization of data after requantization

Returns:

Scale correction, zero point correction, and shift

afe.tvm_converter.quantization.requantize_qnn_convolution_dense(weight: numpy.ndarray, bias: numpy.ndarray | None, data_zero_point: int, product_q: afe.ir.defines.Quantization | List[afe.ir.defines.Quantization], output_q: afe.ir.defines.Quantization, is_dense: bool) → Tuple[numpy.ndarray, numpy.ndarray, int | numpy.ndarray]

Convert constant parameters from a relay IR quantized convolution/dense, bias-add, and requantization to constant parameters for a SiMa IR convolution/dense. The SiMa IR operator is equivalent to these 3 operators. Some precision will be lost due to rounding when converting between these parameters.

Parameters:

weight – Weight tensor from QNN convolution, in HWIGO layout or from QNN dense in OI layout.
bias – Bias tensor from QNN convolution/dense. If None is given, it is treated as an array of zeros.
data_zero_point – Zero point of the convolution’s input activation matrix.
product_q – Quantization of the input of the Relay IR requantize operator. When using per-tensor quantization, it is a single Quantization. When using per-channel quantization, it is a list of Quantization with one item per channel.
output_q – Quantization of the output of the Relay IR requantize operator. This is the same as the quantization of the output of the SiMa IR operator.
is_dense – If True, function is used for requantization of dense operator, otherwise for convolution operator.

Returns:

Weight, bias, and shift for SiMa IR convolution/dense.

afe.tvm_converter.quantization.tflite_requantization_constants(weight: numpy.ndarray, bias: numpy.ndarray | None, data_zero_point: int, input_q: afe.ir.defines.Quantization | List[afe.ir.defines.Quantization], output_q: afe.ir.defines.Quantization, is_dense: bool) → Tuple[numpy.ndarray | None, int, int, int] | Tuple[numpy.ndarray | None, numpy.ndarray, int, numpy.ndarray]

Compute constants for TFLite-style requantization.

Parameters:

weight – Weight tensor from QNN convolution, in HWIGO layout or from QNN dense in OI layout.
bias – Bias tensor from QNN convolution/dense. If None is given, it is treated as an array of zeros.
data_zero_point – Zero point of the convolution’s input activation matrix.
input_q – Quantization of the input of the Relay IR requantize operator. When using per-tensor quantization, it is a single Quantization. When using per-channel quantization, it is a list of Quantization with one item per channel.
output_q – Quantization of the output of the Relay IR requantize operator. This is the same as the quantization of the output of the SiMa IR operator.
is_dense – If True, function is used for requantization of dense operator, otherwise for convolution operator.

Returns:

Weight, bias, and shift for SiMa IR convolution/dense.

Returns:

Modified bias, scale correction, zero point correction, and shift for convolution. Scale correction and shift are integers for per-tensor convolution, or arrays for per-channel convolution.