afe.tvm_converter.quantization

Quantization code that is specific to the TVM converter.

Functions

correction_factors(→ Tuple[float, float, int])

Determine correction factors for requantizing from input_q to output_q.

requantize_qnn_convolution_dense(...)

Convert constant parameters from a relay IR quantized convolution/dense, bias-add,

tflite_requantization_constants(...)

Compute constants for TFLite-style requantization.

Module Contents

afe.tvm_converter.quantization.correction_factors(input_q: afe.ir.defines.Quantization, output_q: afe.ir.defines.Quantization) Tuple[float, float, int][source]

Determine correction factors for requantizing from input_q to output_q.

The correction factors consist of a scale correction sc, zero point correction zc, and shift n such that

output = (input * sc + zc) * 2**-n

and sc is in the range 0.5 to 1.

Parameters:
  • input_q – Quantization of data prior to requantization

  • output_q – Quantization of data after requantization

Returns:

Scale correction, zero point correction, and shift

afe.tvm_converter.quantization.requantize_qnn_convolution_dense(weight: numpy.ndarray, bias: numpy.ndarray | None, data_zero_point: int, product_q: afe.ir.defines.Quantization | List[afe.ir.defines.Quantization], output_q: afe.ir.defines.Quantization, is_dense: bool) Tuple[numpy.ndarray, numpy.ndarray, int | numpy.ndarray][source]

Convert constant parameters from a relay IR quantized convolution/dense, bias-add, and requantization to constant parameters for a SiMa IR convolution/dense. The SiMa IR operator is equivalent to these 3 operators. Some precision will be lost due to rounding when converting between these parameters.

Parameters:
  • weight – Weight tensor from QNN convolution, in HWIGO layout or from QNN dense in OI layout.

  • bias – Bias tensor from QNN convolution/dense. If None is given, it is treated as an array of zeros.

  • data_zero_point – Zero point of the convolution’s input activation matrix.

  • product_q – Quantization of the input of the Relay IR requantize operator. When using per-tensor quantization, it is a single Quantization. When using per-channel quantization, it is a list of Quantization with one item per channel.

  • output_q – Quantization of the output of the Relay IR requantize operator. This is the same as the quantization of the output of the SiMa IR operator.

  • is_dense – If True, function is used for requantization of dense operator, otherwise for convolution operator.

Returns:

Weight, bias, and shift for SiMa IR convolution/dense.

afe.tvm_converter.quantization.tflite_requantization_constants(weight: numpy.ndarray, bias: numpy.ndarray | None, data_zero_point: int, input_q: afe.ir.defines.Quantization | List[afe.ir.defines.Quantization], output_q: afe.ir.defines.Quantization, is_dense: bool) Tuple[numpy.ndarray | None, int, int, int] | Tuple[numpy.ndarray | None, numpy.ndarray, int, numpy.ndarray][source]

Compute constants for TFLite-style requantization.

Parameters:
  • weight – Weight tensor from QNN convolution, in HWIGO layout or from QNN dense in OI layout.

  • bias – Bias tensor from QNN convolution/dense. If None is given, it is treated as an array of zeros.

  • data_zero_point – Zero point of the convolution’s input activation matrix.

  • input_q – Quantization of the input of the Relay IR requantize operator. When using per-tensor quantization, it is a single Quantization. When using per-channel quantization, it is a list of Quantization with one item per channel.

  • output_q – Quantization of the output of the Relay IR requantize operator. This is the same as the quantization of the output of the SiMa IR operator.

  • is_dense – If True, function is used for requantization of dense operator, otherwise for convolution operator.

Returns:

Weight, bias, and shift for SiMa IR convolution/dense.

Returns:

Modified bias, scale correction, zero point correction, and shift for convolution. Scale correction and shift are integers for per-tensor convolution, or arrays for per-channel convolution.