afe.ir.quantization_conv
========================

.. py:module:: afe.ir.quantization_conv

.. autoapi-nested-parse::

   Quantization functions for convolution and matrix multiply.


Attributes
----------

.. autoapisummary::

   afe.ir.quantization_conv.ChannelScale
   afe.ir.quantization_conv.ChannelQScale
   afe.ir.quantization_conv.ChannelShift
   afe.ir.quantization_conv.INTRINSIC_SHIFT_LO
   afe.ir.quantization_conv.INTRINSIC_SHIFT_HI


Classes
-------

.. autoapisummary::

   afe.ir.quantization_conv.ConvolutionPrecision
   afe.ir.quantization_conv.ConvPlanRequantization
   afe.ir.quantization_conv.ConvPlanQuantizations
   afe.ir.quantization_conv.ConvBacktrackingParameters


Functions
---------

.. autoapisummary::

   afe.ir.quantization_conv.reshape_weight_to_output_channels
   afe.ir.quantization_conv.get_quantization_range
   afe.ir.quantization_conv.decompose_power_of_2
   afe.ir.quantization_conv.normalize_with_pow2
   afe.ir.quantization_conv.weight_single_quantization_scale
   afe.ir.quantization_conv.weight_quantization_scale
   afe.ir.quantization_conv.select_convolution_scales
   afe.ir.quantization_conv.run_backtracking_loop
   afe.ir.quantization_conv.adjust_plan_zero_weights
   afe.ir.quantization_conv.try_increase_intrinsic_shift
   afe.ir.quantization_conv.try_adjust_plan_shift_value
   afe.ir.quantization_conv.try_adjust_plan_product_value
   afe.ir.quantization_conv.quantize_convolution_scales
   afe.ir.quantization_conv.quantize_weight_tensor
   afe.ir.quantization_conv.try_quantize_bias_tensor
   afe.ir.quantization_conv.quantized_product_zero_value
   afe.ir.quantization_conv.output_zp_correction_in_bias
   afe.ir.quantization_conv.quantize_convolution_parameters
   afe.ir.quantization_conv.get_bfloat16_with_int_weights_quant_params


Module Contents
---------------

.. py:data:: ChannelScale

.. py:data:: ChannelQScale

.. py:data:: ChannelShift

.. py:data:: INTRINSIC_SHIFT_LO
   :value: 1


.. py:data:: INTRINSIC_SHIFT_HI
   :value: 8


.. py:function:: reshape_weight_to_output_channels(weight: numpy.ndarray) -> numpy.ndarray

   Reshape a weight tensor so that its last axis corresponds to a convolution operation's
   output channel axis.  That is, the convolution's output at a given channel output[..., c]
   depends on reshaped_weights[..., c], bias[c], and some values from the convolution's input.
   This tensor shape is useful for code that computes per-channel information or does per-channel
   scaling on weights.


.. py:function:: get_quantization_range(dtype: Union[afe.ir.tensor_type.ScalarType, numpy.number], asymmetry: bool) -> Tuple[int, int]

   Get the numeric range that should be used when quantizing numbers
   to be stored using dtype.  The range is the entire value range when using
   asymmetric quantization, and is reduced to a symmetric range when using
   symmetric quantization.

   :param dtype: Quantized data type.  It must be a signed integer type.
   :param asymmetry: Whether to use an asymmetric range
   :return: Numeric range


.. py:function:: decompose_power_of_2(x: ChannelScale, rounding: ml_kernels.math_helpers.RoundType) -> Tuple[ChannelShift, ChannelScale]

   Decompose x into a power-of-2 part i and a fractional part f such that

       x = f * 2**i

   The range of f is selected based on how i is rounded:
       UPWARD: 0.5 < f <= 1
       TONEAREST: sqrt(0.5) <= f <= sqrt(2)
       TRUNC: 1 <= f < 2

   Where x is 0, f and i will be 0.

   :param x: Number to decompose
   :param rounding: How to round the exponent
   :return: Decomposed values (i, f)


.. py:function:: normalize_with_pow2(x: ChannelScale) -> Tuple[ChannelShift, ChannelScale]

   Find powers of 2 that normalize each element of x to the range (0.5, 1.0].

   :param x: Scale factors to normalize
   :return: Tuple (i, y) of exponents and normalized scale factors satisfying x = y * 2**i.


.. py:function:: weight_single_quantization_scale(weight: numpy.ndarray, bits: int = 8) -> float

   Calculate a scalar quantization scale for a convolution or matrix multiply weight tensor.

   :param weight: Floating-point weight tensor
   :param bits: Number of bits used for quantization
   :return: Quantization scale.  It has the same meaning as the scale field of class Quantization.


.. py:function:: weight_quantization_scale(weight: numpy.ndarray, per_channel: bool, bits: int = 8) -> ChannelScale

   Calculate a quantization scale for a convolution or matrix multiply weight tensor.

   :param weight: Floating-point weight tensor
   :param per_channel: Whether to do per-channel quantization
   :param bits: Number of bits to be used
   :return: Quantization scale.


.. py:class:: ConvolutionPrecision


   The precision to use for quantizing convolution.  This determines
   how quantization does some calculations and chooses which
   integer type to use.  Some choices (such as sima_int8)
   completely determine the integer type, while others do not.


   .. py:attribute:: sima_int8


   .. py:attribute:: tflite_int8


   .. py:attribute:: restricted_tflite_int8


   .. py:attribute:: sima_int16


   .. py:attribute:: tflite_int16


   .. py:attribute:: restricted_tflite_int16


   .. py:attribute:: sima_int32


   .. py:method:: has_multiplier() -> bool

      Return true if this quantization method can use a TFLite multiplier other than 1.
      Return False if it uses ArithFoldedRequantization or forces the multiplier to be 1.


   .. py:method:: has_zp_correction() -> bool

      Return true if this quantization method can use a zero point correction other than 0.


   .. py:method:: is_arith_folded() -> bool

      Return true if this is one of the quantization methods that uses ArithFoldedRequantization.


   .. py:method:: is_tflite() -> bool

      Return true if this is one of the quantization methods that uses TFLiteRequantization.


.. py:class:: ConvPlanRequantization(scale: ChannelScale, shift: ChannelShift, multiplier: ChannelQScale)

   Adjustable requantization for convolution.
   This class holds the requantization as both a floating-point number and a
   quantized representation.  When these values are modified, they are
   kept consistent (modulo rounding) with the formula

       scale = multiplier * (2**-shift)

   :param scale: Requantization scale as a floating-point value.
   :param shift: Right shift to perform.  Its shape must be the same as scale's.
   :param multiplier: Integer multiplier to use.  Its shape must be either () or the same as scale's.


   .. py:attribute:: scale
      :type:  ChannelScale


   .. py:attribute:: shift
      :type:  ChannelShift


   .. py:attribute:: multiplier
      :type:  ChannelQScale


   .. py:method:: deepcopy() -> ConvPlanRequantization

      Make an independent copy of this object.


   .. py:method:: adjust_shift(adjustment: Union[ChannelShift, int])

      Add the given value to the right-shift value.


   .. py:method:: set_unit_scale(positions: numpy.ndarray)

      Set the scale to 1 in the given positions.
      Shift is set to 0 and multiplier is set to 1 in the given positions.


.. py:class:: ConvPlanQuantizations

   Adjustable quantization parameters for convolution or matrix multiply.
   This class holds parameters that may be modified while deciding
   how to quantize the calculation.

   The parameters relate a real-number calculation

       c = a * w + b

   to a quantized calculation (the actual calculation is not selected
   here, and it may be different from this formula)

       Qc = S * (Qa * Qw) / 2^h + constant_terms

   by

       Qw = w * Sw
       Qa = a * Sa
       Qc = c * Sc + Zc
       S = 2^h * Sc / (Sa * Sw).

   The factor of 2^h is a right-shift that is included in the integer convolution.

   :param weight: Scale factor Sw relating real weight w to quantized weight Qw.  It may contain 0.
   :param output: Quantization (Sc, Zc) relating real output c to quantized output Qc
   :param requant: Requantization S relating quantized product to output Qc
   :param intrinsic_shift: Right-shift h, used to produce an additional scale factor in
      the convolution product


   .. py:attribute:: weight
      :type:  ChannelScale


   .. py:attribute:: output
      :type:  afe.ir.defines.Quantization


   .. py:attribute:: requant
      :type:  ConvPlanRequantization


   .. py:attribute:: intrinsic_shift
      :type:  numpy.ndarray


   .. py:method:: deepcopy() -> ConvPlanQuantizations

      Make an independent copy of this object.


   .. py:method:: set_intrinsic_shift(value: numpy.ndarray)

      Set the intrinsic shift, h, to the given value.


   .. py:method:: set_weight_zero(positions: numpy.ndarray)

      Set the weight scale, Sw, to 0 at the given channel positions.


   .. py:method:: set_requant_one(positions: numpy.ndarray)

      Set the requantization scale to 1 at the given channel positions.


   .. py:method:: scale_weight_pow2(exponent: Union[numpy.ndarray, int])

      Multiply the weight quantization scale, Sw, by 2**exponent.


   .. py:method:: scale_output_pow2(exponent: int)

      Multiply the output quantization scale, Sc, by 2**exponent.


   .. py:method:: scale_requant_pow2(exponent: numpy.ndarray)

      Multiply the requantization, S, by 2**exponent.


.. py:function:: select_convolution_scales(weight: numpy.ndarray, input_quant: afe.ir.defines.Quantization, output_distribution: afe.ir.attributes.ObservedDistribution, *, precision: ConvolutionPrecision, asymmetry: bool, per_channel: bool) -> ConvPlanQuantizations

   Choose quantization parameters for a generalized matrix multiply based on
   the input's quantization and the optimal quantization of the weight and output.

   This choice does not account for value ranges of other integer constants and
   intermediate results.  Those should be handled separately.

   :param weight: A weight tensor.
   :param input_quant: Quantization that was selected for the input of generalized matrix multiply.
   :param output_distribution: Value distribution of the output of generalized matrix multiply.
   :param precision: Precision to quantize for.
   :param asymmetry: Whether to use asymmetric quantization.
   :param per_channel: Whether to do per-channel quantization.  If true, the scales
      will be a tensor with one value per channel.  If false, the scales will be scalars.
   :return: Weight tensor scale, requantization scale, and quantization of the convolution output.


.. py:class:: ConvBacktrackingParameters

   Quantization parameters that are fixed at the beginning of the quantization algorithm, such that
   the algorithm has to restart if they are changed.  These values may be modified in the backtracking loop.

   :param precision: Precision to use for output calculations.
   :param relu_fallback_precision: Alternative precision to use if "precision" can't be supported
      due to limitations in the backend's implementation of ReLU.  If this is None, "precision" is
      assumed to be fully supported.
   :param intrinsic_shift_adjustment: Locations where extra right-shift is used with the int15
      convolution algorithm.  When the input is int8, it must be a 0D array of False.
      It is an array of bool, where True means to use extra right-shift.
      It is 0D for per-tensor or 1D for per-channel.
   :param weight_adjustment: Extra right-shift applied to weights.  Values greater than zero reduce
      the weight's precision to fewer than 8 bits.  It is an array of int.
      It is 0D for per-tensor or 1D for per-channel.


   .. py:attribute:: precision
      :type:  ConvolutionPrecision


   .. py:attribute:: relu_fallback_precision
      :type:  Optional[ConvolutionPrecision]


   .. py:attribute:: intrinsic_shift_adjustment
      :type:  numpy.ndarray


   .. py:attribute:: weight_adjustment
      :type:  numpy.ndarray


   .. py:method:: default_intrinsic_shift_adjustment(n_channels: int, per_channel: bool, use_int15: bool) -> numpy.ndarray
      :staticmethod:


      Default value of intrinsic shift.  The default is not to use any extra right-shift.

      :param n_channels: Number of channels in the convolution output
      :param per_channel: Whether per-channel quantization is used
      :param use_int15: Whether the int15 convolution algorithm is used
      :return: Default value of intrinsic shift


   .. py:method:: default_weight_adjustment(n_channels: int, per_channel: bool) -> numpy.ndarray
      :staticmethod:


      Default weight adjustment.  The default is not to use any extra right-shift.

      :param n_channels: Number of channels in the convolution output
      :param per_channel: Whether per-channel quantization is used
      :return: Default value of weight adjustment


.. py:function:: run_backtracking_loop(f: Callable[[afe.ir.defines.NodeReporter], _A], backtracking_limit: int, backtracking_error_message: str, error_reporter: Optional[afe.ir.defines.NodeReporter] = None) -> _A

   Retry the backtracking computation in f until it succeeds.

   The callable object in f represents a restartable function that uses some mutable state
   to represent its starting condition.  It may update its mutable state and raise
   a _Retry exception to restart; the state change should help it make progress after
   it restarts.  It may return a value to end the loop.

   :param f: Backtracking computation to run
   :param backtracking_limit: Maximum number of times to attempt f.  If f is attempted
      this many times without returning a result, an exception will be raised.
   :param backtracking_error_message: Error message to use if f does not return.
   :param error_reporter: Used for reporting errors.
   :return: Return value of f.


.. py:function:: adjust_plan_zero_weights(weights: numpy.ndarray, quantizations: ConvPlanQuantizations, per_channel: bool, error_reporter: afe.ir.defines.NodeReporter)

   Adjust the convolution plan where the weights would be zero after quantization.

   :param weights: Floating-point weights.
   :param quantizations: Quantization parameters.  Will be modified.
   :param per_channel: Whether to do per-channel quantization.
   :param error_reporter: Error reporter used for quantization warnings.


.. py:function:: try_increase_intrinsic_shift(backtracking_parameters: ConvBacktrackingParameters, positions: numpy.ndarray) -> None

   Set backtracking_parameters.intrinsic_shift_adjustment to True where positions is True.
   Raise _Retry() if any backtracking parameters were changed.

   :param backtracking_parameters: Mutable variables for backtracking.  May be modified.
   :param positions: Array of bool, containing True where the intrinsic shift adjustment should be set to True.


.. py:function:: try_adjust_plan_shift_value(backtracking_parameters: ConvBacktrackingParameters, quantizations: ConvPlanQuantizations, use_int15: bool, error_reporter: afe.ir.defines.NodeReporter) -> None

   Adjust the convolution plan where the shift value is out of range or where
   the shift is so large that it causes severe precision loss.
   Raise _Retry() if any backtracking parameters were changed.

   :param backtracking_parameters: Mutable variables for backtracking.  May be modified.
   :param quantizations: Quantization parameters.  May be modified.
   :param use_int15: Whether the plan is for int15 convolution.
   :param error_reporter: Error reporter used for quantization warnings.


.. py:function:: try_adjust_plan_product_value(backtracking_parameters: ConvBacktrackingParameters, quantizations: ConvPlanQuantizations, use_int15: bool, error_reporter: afe.ir.defines.NodeReporter) -> None

   Adjust the convolution plan where the integer convolution result
   is not in the representable range.
   Raise _Retry() if any backtracking parameters were changed.

   :param backtracking_parameters: Mutable variables for backtracking.  May be modified.
   :param quantizations: Quantization parameters.  May be modified.
   :param use_int15: Whether the plan is for int15 convolution.
   :param error_reporter: Error reporter used for quantization warnings.


.. py:function:: quantize_convolution_scales(quantizations: ConvPlanQuantizations, precision: ConvolutionPrecision, allow_full_output_precision: bool) -> Tuple[ChannelScale, ChannelScale, ml_kernels.requantization.BaseRequantization[numpy.ndarray], afe.ir.tensor_type.ScalarType, afe.ir.defines.Quantization]

   Adjust the quantization parameters based on zero values,
   limits on integer constants, and limits on integer
   intermediate results.

   The final choice of weight scale, bias scale, requantization,
   and output quantization are returned.

   :param quantizations: Quantization parameters.
   :param precision: The precision to use for quantizing convolution.
   :param allow_full_output_precision: Whether 16-bit precision can be widened to 32-bit output.
     If false, quantizing with 16-bit precision will always produce 16-bit output.
   :return: New quantization scale of weights,
      requantization to perform after convolution,
      type of output, and quantization of output.


.. py:function:: quantize_weight_tensor(weight: numpy.ndarray, weight_scale: ChannelScale, bits: int = 8) -> Tuple[numpy.ndarray, numpy.ndarray]

   Create a quantized weight tensor.

   :param weight: np.ndarray, weights value being quantized
   :param weight_scale: np.ndarray Scale of the weights.
   :param bits: Number of bits used for quantized weights.
   :return: Tuple of np.ndarray. First returned value is quantized weights, while fake_quantized weights are
       calculated by dividing quantized weights by scale, thus returning them to similar fp32 values, and exposing
       quantization difference that is caused by rounding and clipping during quantization.


.. py:function:: try_quantize_bias_tensor(backtracking_parameters: ConvBacktrackingParameters, bias: Optional[numpy.ndarray], zp_correction: numpy.ndarray, bias_scale: ChannelScale, use_int15: bool, per_channel: bool) -> numpy.ndarray

   Quantize a bias tensor.  If it can't be quantized due to integer overflow,
   adjust backtracking parameters.
   Raise _Retry() if any backtracking parameters were changed.

   :param backtracking_parameters: Mutable variables for backtracking.  May be modified.
   :param bias: Floating-point bias tensor.
   :param zp_correction: Integer zero point correction to be added to the bias.
      This may include correction for the input zero point and/or output zero point,
      depending on the quantization scheme.
   :param bias_scale: Quantization scale to use for bias.
   :param use_int15: Whether int15 convolution is used.
   :param per_channel: Whether per-channel quantization is used.
   :return: Quantized bias tensor.


.. py:function:: quantized_product_zero_value(q_weight: numpy.ndarray, zero_point: int, intrinsic_shift: Union[numpy.ndarray, int]) -> numpy.ndarray

   Calculate the result of quantized generalized matrix multiply when the input is filled
   with the zero point value.  This represents the zero point result, which should be
   subtracted to get the true product.

   :param q_weight: Quantized weight tensor
   :param zero_point: Zero point of input tensor
   :param intrinsic_shift: Right-shift that is performed by the convolution algorithm.
   :return: Convolution result as a 1D tensor


.. py:function:: output_zp_correction_in_bias(precision: ConvolutionPrecision, output_quant: afe.ir.defines.Quantization, requantization: ml_kernels.requantization.BaseRequantization[numpy.ndarray]) -> int

   Calculate the zero point correction to add to the convolution or matrix multiply's
   bias array so that the output has the desired quantization.

   If the convolution will not combine zero point correction with bias, but instead
   will do two separate additions, then the result is 0.
   Otherwise, the result is the output's zero point, scaled based on the requantization.

   :param precision: Convolution precision type
   :param output_quant: Quantization of convolution's output
   :param requantization: Requantization that is performed at the end of convolution
   :return: Zero point correction that should be added to the bias array


.. py:function:: quantize_convolution_parameters(input_quant: afe.ir.defines.Quantization, output_distribution: afe.ir.attributes.ObservedDistribution, weight: numpy.ndarray, bias: Optional[numpy.ndarray], *, per_channel: bool, bias_corrector: afe.ir.bias_correction.BiasCorrector, asymmetry: bool, use_int15: bool, use_sima_relu_workaround: bool, precision: ConvolutionPrecision, allow_full_output_precision: bool, error_reporter: Optional[afe.ir.defines.NodeReporter] = None) -> Tuple[numpy.ndarray, numpy.ndarray, ml_kernels.requantization.BaseRequantization[numpy.ndarray], afe.ir.tensor_type.ScalarType, afe.ir.defines.Quantization, bool]

   Select quantized parameters for convolution or matrix multiply.

   :param input_quant: Quantization that was selected for the input of convolution.
   :param output_distribution: Value distribution of the output of convolution.
   :param weight: Weight tensor.
   :param bias: A bias tensor.  If it is None, a bias tensor will still be returned
      containing the bias correction that was introduced by quantization.
   :param per_channel: Whether to do per-channel quantization.  If true, the scale
      will be a tensor with one value per channel.
   :param bias_corrector: How to calculate a bias correction term.
   :param use_int15: Whether to quantize for the int15 convolution algorithm.  If false,
      quantize for the int8 convolution algorithm.
   :param use_sima_relu_workaround: Whether to use a workaround for int8 SiMa quantization with
        relu activation.  If True, and relu cannot be executed by the backend, then
        use TFLite quantization.  This parameter is only relevant when precision is
        sima_int8 or sima_int16, and it must be False otherwise.
   :param precision: The precision to use for quantizing convolution output.
   :param allow_full_output_precision: Whether 16-bit precision can be widened to 32-bit output.
      If false, quantizing with 16-bit precision will always produce 16-bit output.
   :param error_reporter: Used for warnings about bad quantization.
   :return: A tuple containing the chosen quantization-related parameters:
      the quantized weight tensor, the quantized bias tensor,
      the requantization, the scalar type of the output, the quantization of the output,
      and the msb_left_shift flag value.


.. py:function:: get_bfloat16_with_int_weights_quant_params(attrs: afe.ir.attributes.ConvAddActivationAttrs, per_channel: bool, bits: int) -> tuple[numpy.ndarray, numpy.ndarray | None, ml_kernels.requantization.BaseRequantization]

   Get quantized weights and bias if present and requantization.
   Weights are quantized to int8 or int4 and bias if present is unquantized,
   this allows the requantization scale factor to be just 1/weight_scale as
   requantization is done after adding bias.

   :param attrs: Weights.
   :param per_channel: Whether per-channel quantization scheme is used for weights.
   :param bits: Number of bits to be used.
   :return: Quantized weights, Optional(quantized bias) and requantization.