afe.tvm_converter.quantization
Quantization code that is specific to the TVM converter.
Functions
|
Determine correction factors for requantizing from input_q to output_q. |
Convert constant parameters from a relay IR quantized convolution/dense, bias-add, |
|
Compute constants for TFLite-style requantization. |
Module Contents
- afe.tvm_converter.quantization.correction_factors(input_q: afe.ir.defines.Quantization, output_q: afe.ir.defines.Quantization) Tuple[float, float, int] [source]
Determine correction factors for requantizing from input_q to output_q.
The correction factors consist of a scale correction sc, zero point correction zc, and shift n such that
output = (input * sc + zc) * 2**-n
and sc is in the range 0.5 to 1.
- Parameters:
input_q – Quantization of data prior to requantization
output_q – Quantization of data after requantization
- Returns:
Scale correction, zero point correction, and shift
- afe.tvm_converter.quantization.requantize_qnn_convolution_dense(weight: numpy.ndarray, bias: numpy.ndarray | None, data_zero_point: int, product_q: afe.ir.defines.Quantization | List[afe.ir.defines.Quantization], output_q: afe.ir.defines.Quantization, is_dense: bool) Tuple[numpy.ndarray, numpy.ndarray, int | numpy.ndarray] [source]
Convert constant parameters from a relay IR quantized convolution/dense, bias-add, and requantization to constant parameters for a SiMa IR convolution/dense. The SiMa IR operator is equivalent to these 3 operators. Some precision will be lost due to rounding when converting between these parameters.
- Parameters:
weight – Weight tensor from QNN convolution, in HWIGO layout or from QNN dense in OI layout.
bias – Bias tensor from QNN convolution/dense. If None is given, it is treated as an array of zeros.
data_zero_point – Zero point of the convolution’s input activation matrix.
product_q – Quantization of the input of the Relay IR requantize operator. When using per-tensor quantization, it is a single Quantization. When using per-channel quantization, it is a list of Quantization with one item per channel.
output_q – Quantization of the output of the Relay IR requantize operator. This is the same as the quantization of the output of the SiMa IR operator.
is_dense – If True, function is used for requantization of dense operator, otherwise for convolution operator.
- Returns:
Weight, bias, and shift for SiMa IR convolution/dense.
- afe.tvm_converter.quantization.tflite_requantization_constants(weight: numpy.ndarray, bias: numpy.ndarray | None, data_zero_point: int, input_q: afe.ir.defines.Quantization | List[afe.ir.defines.Quantization], output_q: afe.ir.defines.Quantization, is_dense: bool) Tuple[numpy.ndarray | None, int, int, int] | Tuple[numpy.ndarray | None, numpy.ndarray, int, numpy.ndarray] [source]
Compute constants for TFLite-style requantization.
- Parameters:
weight – Weight tensor from QNN convolution, in HWIGO layout or from QNN dense in OI layout.
bias – Bias tensor from QNN convolution/dense. If None is given, it is treated as an array of zeros.
data_zero_point – Zero point of the convolution’s input activation matrix.
input_q – Quantization of the input of the Relay IR requantize operator. When using per-tensor quantization, it is a single Quantization. When using per-channel quantization, it is a list of Quantization with one item per channel.
output_q – Quantization of the output of the Relay IR requantize operator. This is the same as the quantization of the output of the SiMa IR operator.
is_dense – If True, function is used for requantization of dense operator, otherwise for convolution operator.
- Returns:
Weight, bias, and shift for SiMa IR convolution/dense.
- Returns:
Modified bias, scale correction, zero point correction, and shift for convolution. Scale correction and shift are integers for per-tensor convolution, or arrays for per-channel convolution.