afe.ir.quantization_interface

Classes and functions that are used in the interface between operator code and the quantization algorithm.

Data structures in this file are temporary data that is only used during Quantization.

Calibration and quantization information is available for some tensors in a network, as explained below.

Calibration is passed using ObservedDistribution objects. Calibration is determined for the output of all nodes satisfying node_uses_observer, regardless of whether they are quantized.

Quantization information is passed using the quant field of QuantResultTensorType. Quantization information is included only for quantized integer tensors that were floating-point before quantization; otherwise, it is None. As a corollary, if a tensor’s type was not changed by quantization, then its quantization information will be None.

Classes

OpQuantInterface

Quantization-related properties of a node's interface before and after quantization,

OpQuantResult

Quantization-related properties of a node's interface after quantization,

Functions

make_quantize_op_interface(→ Tuple[OpQuantInterface, ...)

Create data structures for the interface between the quantization algorithm

requantize_scaled(...)

Scale input_quant so that its quantized values can be represented with output_scalar_type.

quantize_output(→ afe.ir.attributes.QuantResultTensorType)

Calculate a quantization that could be used for the output, using calibration results.

fix_output(→ afe.ir.attributes.QuantResultTensorType)

Fix the output to the selected quantized type and set the output's quantization.

fix_output_to_int8(...)

Fix the output to int8. See fix_output for documentation.

fix_output_to_int16(...)

Fix the output to int16. See fix_output for documentation.

fix_input(→ afe.ir.attributes.QuantResultTensorType)

Fix the input having the given name to the given type

fix_input_to_int8(...)

Fix the input having the given name to int8. See fix_input for documentation.

fix_input_to_int16(...)

Fix the input having the given name to int16. See fix_input for documentation.

fix_input_to_float32(...)

Fix the input having the given name to float32.

keep_input(→ afe.ir.attributes.QuantResultTensorType)

fix_output_from_input(...)

Set the output to use the same scalar type and quantization as the input.

get_intermediate_min_max(→ dict[str, tuple[float, float]])

Get min and max values of intermediate calibration data.

Module Contents

class afe.ir.quantization_interface.OpQuantInterface(data: _QuantizeOpData)[source]

Quantization-related properties of a node’s interface before and after quantization, for use when quantizing the node.

An operator’s quantize method may call the “get” methods to read calibration information about its inputs and output, and quantization information about its input.

An operator must call set_chosen_input_quant and set_chosen_output_quant to set the quantization that the quantized operator uses at each input and output. If the operator uses an input quantization that is different from the input’s quantization as returned by get_input_quant, the quantization algorithm will cast the input.

get_input_quant() Mapping[afe.ir.defines.InputName, afe.ir.defines.DataValue[afe.ir.attributes.QuantResultTensorType]][source]
get_placeholder_quant() afe.ir.defines.DataValue[afe.ir.attributes.QuantResultTensorType] | None[source]
get_input_distributions() Mapping[afe.ir.defines.InputName, afe.ir.attributes.ObservedDistribution | None][source]
get_intermediate_distributions() Mapping[afe.ir.defines.InputName, afe.ir.attributes.ObservedDistribution | None][source]
get_output_distribution() afe.ir.attributes.ObservedDistribution | None[source]
set_chosen_input_quant(name: afe.ir.defines.InputName, quant: afe.ir.defines.DataValue[afe.ir.attributes.QuantResultTensorType])[source]
get_chosen_input_quant(name: afe.ir.defines.InputName) afe.ir.defines.DataValue[afe.ir.attributes.QuantResultTensorType][source]
set_chosen_output_quant(quant: afe.ir.defines.DataValue[afe.ir.attributes.QuantResultTensorType])[source]
get_calibration_data()[source]
class afe.ir.quantization_interface.OpQuantResult(data: _QuantizeOpData)[source]

Quantization-related properties of a node’s interface after quantization, for use in the quantization algorithm.

After a node is quantized, the quantization algorithm may call get_result to get the node’s quantization.

get_result() afe.ir.defines.NodeAssociatedValue[afe.ir.attributes.QuantResultTensorType][source]
afe.ir.quantization_interface.make_quantize_op_interface(input_data: Mapping[afe.ir.defines.InputName, Tuple[afe.ir.defines.DataValue[afe.ir.attributes.QuantResultTensorType], afe.ir.attributes.ObservedDistribution | None]], placeholder_quant: afe.ir.defines.DataValue[afe.ir.attributes.QuantResultTensorType] | None, output_distribution: afe.ir.attributes.ObservedDistribution | None, intemediate_distributions: Dict[str, afe.ir.attributes.ObservedDistribution] | None) Tuple[OpQuantInterface, OpQuantResult][source]

Create data structures for the interface between the quantization algorithm and an operator’s quantize function.

Parameters:
  • input_data – The quantization and value distribution of the node’s inputs.

  • placeholder_quant – The quantization of the node’s value, if the node is a placeholder.

  • output_distribution – The value distribution of the node’s output.

Returns:

The interface for the operator’s quantize function and the interface for the quantization algorithm.

afe.ir.quantization_interface.requantize_scaled(input_quant: afe.ir.defines.Quantization, output_scalar_type: afe.ir.tensor_type.ScalarType, shape: Tuple[int, Ellipsis], *, restrict_to_pow2: bool = False) afe.ir.attributes.QuantResultTensorType[source]

Scale input_quant so that its quantized values can be represented with output_scalar_type. Construct a QuantResultTensorType with the new quantization and type.

Parameters:
  • input_quant – Original quantization, which may or may not be representable in output_scalar_type

  • output_scalar_type – Scalar type that quantized values will be represented in

  • input_shape – Shape of the tensor that is requantized

  • restrict_to_pow2 – If true, the scale factor will be restricted to a power of 2. This allows requantization to be implemented by a right-shift.

Returns:

Type and quantization of the requantized tensor

afe.ir.quantization_interface.quantize_output(i: OpQuantInterface, quantized_type: afe.ir.tensor_type.ScalarType, output_shape: Tuple[int, Ellipsis], asymmetry: bool, requant_method: afe.ir.defines.RequantMethod = RequantMethod.fractional_zero) afe.ir.attributes.QuantResultTensorType[source]

Calculate a quantization that could be used for the output, using calibration results. The output must have float32 type when quantization begins. If quantized_type is bfloat16 then the return value will not have quantization information.

This function does not change the state of OpQuantInterface; it only returns the quantization.

Parameters:
  • i – Object describing the interface of the node to transform. The node must output a tensor, not a tuple.

  • quantized_type – The data type to quantize for.

  • output_shape – Shape of the node’s output.

  • asymmetry – Whether to use asymmetric quantization.

  • requant_method – Requantization method.

Returns:

A type suitable to use in the result of quantization.

afe.ir.quantization_interface.fix_output(i: OpQuantInterface, quantized_type: afe.ir.tensor_type.ScalarType, output_shape: Tuple[int, Ellipsis], asymmetry: bool) afe.ir.attributes.QuantResultTensorType[source]

Fix the output to the selected quantized type and set the output’s quantization. Use calibration results to decide how to quantize it. The output must have float32 type when quantization begins.

Parameters:
  • i – Object describing the interface of the node to transform. The node must output a tensor, not a tuple.

  • output_shape – Shape of the node’s output.

  • asymmetry – Whether to use asymmetric quantization.

Returns:

The quantized type of the output.

afe.ir.quantization_interface.fix_output_to_int8(i: OpQuantInterface, output_shape: Tuple[int, Ellipsis], asymmetry: bool) afe.ir.attributes.QuantResultTensorType[source]

Fix the output to int8. See fix_output for documentation.

afe.ir.quantization_interface.fix_output_to_int16(i: OpQuantInterface, output_shape: Tuple[int, Ellipsis], asymmetry: bool) afe.ir.attributes.QuantResultTensorType[source]

Fix the output to int16. See fix_output for documentation.

afe.ir.quantization_interface.fix_input(i: OpQuantInterface, quantized_type: afe.ir.tensor_type.ScalarType, name: afe.ir.defines.InputName, asymmetry: bool) afe.ir.attributes.QuantResultTensorType[source]

Fix the input having the given name to the given type and set the input’s quantization.

If the input already has the desired type, use the given type and quantization. If quantized_type is bfloat16, then the input will not be quantized. The given

type is used without quantization or requantization.

If the input type is float32, quantize it. If the input type is int32, requantize it to the narrower integer size.

Parameters:
  • i – Object describing the interface of the node to transform.

  • quantized_type – The quantized type to use. It must be int8 or int16.

  • name – The name of the input to select. The input must have a tensor type.

  • asymmetry – Whether to use asymmetric quantization.

Returns:

The quantized type of the input.

afe.ir.quantization_interface.fix_input_to_int8(i: OpQuantInterface, name: afe.ir.defines.InputName, asymmetry: bool) afe.ir.attributes.QuantResultTensorType[source]

Fix the input having the given name to int8. See fix_input for documentation.

afe.ir.quantization_interface.fix_input_to_int16(i: OpQuantInterface, name: afe.ir.defines.InputName, asymmetry: bool) afe.ir.attributes.QuantResultTensorType[source]

Fix the input having the given name to int16. See fix_input for documentation.

afe.ir.quantization_interface.fix_input_to_float32(i: OpQuantInterface, name: afe.ir.defines.InputName) afe.ir.attributes.QuantResultTensorType[source]

Fix the input having the given name to float32.

If the input’s type is not already float32, then dequantize it. The input must have a known quantization in this case. Set the input’s quantization to reflect a float32 type and no quantization.

Parameters:
  • i – Object describing the interface of the node to transform.

  • name – The name of the input to select. The input must have a tensor type.

  • asymmetry – Whether to use asymmetric quantization.

Returns:

The quantized type of the input.

afe.ir.quantization_interface.keep_input(i: OpQuantInterface, name: afe.ir.defines.InputName) afe.ir.attributes.QuantResultTensorType[source]
afe.ir.quantization_interface.fix_output_from_input(i: OpQuantInterface, shape: Tuple[int, Ellipsis], name: afe.ir.defines.InputName | None = None) afe.ir.attributes.QuantResultTensorType[source]

Set the output to use the same scalar type and quantization as the input. The input’s quantization must be set by set_input_quant first.

Parameters:
  • i – Object describing the interface of the node to transform.

  • shape – Shape of the output. It will be set to this shape.

  • name – The name of the input whose type and quantization will be copied. The input’s quantization must have been set by set_input_quant. If it is None, the node must have exactly one input, and that input will be used.

Returns:

The quantized type of the output.

afe.ir.quantization_interface.get_intermediate_min_max(i: OpQuantInterface) dict[str, tuple[float, float]][source]

Get min and max values of intermediate calibration data. This function does not change the state of OpQuantInterface; it only returns the dict of intermediate min/max values.

Parameters:

i – Object describing the interface of the node to transform.

Returns:

The dict of intermediate min/max values.