afe.ir.quantization_utils
=========================

.. py:module:: afe.ir.quantization_utils


Attributes
----------

.. autoapisummary::

   afe.ir.quantization_utils.DTYPE_BOUNDS


Classes
-------

.. autoapisummary::

   afe.ir.quantization_utils.QNNDtype


Functions
---------

.. autoapisummary::

   afe.ir.quantization_utils.round_op
   afe.ir.quantization_utils.calculate_normalization_shift
   afe.ir.quantization_utils.get_bound
   afe.ir.quantization_utils.clip_to_targeted_range
   afe.ir.quantization_utils.compute_scale
   afe.ir.quantization_utils.compute_zero_point
   afe.ir.quantization_utils.significant_bits_signed
   afe.ir.quantization_utils.compute_power_of_2_scale_and_shift
   afe.ir.quantization_utils.compute_weight_scale
   afe.ir.quantization_utils.compute_weight_scale_per_channel
   afe.ir.quantization_utils.linear_scale
   afe.ir.quantization_utils.linear_scale_per_channel
   afe.ir.quantization_utils.linear_quantize
   afe.ir.quantization_utils.linear_quantize_with_quantization
   afe.ir.quantization_utils.quantize_value
   afe.ir.quantization_utils.dequantize_value
   afe.ir.quantization_utils.get_zero_kernel_mask_per_channel
   afe.ir.quantization_utils.dequantize
   afe.ir.quantization_utils.requantize
   afe.ir.quantization_utils.float_requantization
   afe.ir.quantization_utils.power_of_2_requantization
   afe.ir.quantization_utils.requantization
   afe.ir.quantization_utils.requantization_tflite
   afe.ir.quantization_utils.is_quantized
   afe.ir.quantization_utils.dequantize_tensor
   afe.ir.quantization_utils.quantize_tensor
   afe.ir.quantization_utils.dequantize_input_dict
   afe.ir.quantization_utils.quantize_input_dict
   afe.ir.quantization_utils.quantize_alpha
   afe.ir.quantization_utils.quantize_add_subtract
   afe.ir.quantization_utils.quantize_multiply
   afe.ir.quantization_utils.quantize_batch_matmul
   afe.ir.quantization_utils.quantize_udf
   afe.ir.quantization_utils.get_input_quantization_func
   afe.ir.quantization_utils.quantize_clip_attrs
   afe.ir.quantization_utils.quantize_activation
   afe.ir.quantization_utils.requantize_activation
   afe.ir.quantization_utils.requantize_quantization
   afe.ir.quantization_utils.quantize_prelu
   afe.ir.quantization_utils.quantize_reciprocal
   afe.ir.quantization_utils.quantize_lrn
   afe.ir.quantization_utils.quantize_softmax
   afe.ir.quantization_utils.quantize_layer_norm
   afe.ir.quantization_utils.quantize_instance_norm
   afe.ir.quantization_utils.quantize_rms_norm
   afe.ir.quantization_utils.quantization_data_value_to_output_list
   afe.ir.quantization_utils.fix_requantization
   afe.ir.quantization_utils.cast_calibration_inputs
   afe.ir.quantization_utils.create_requantization_from_cast


Module Contents
---------------

.. py:class:: QNNDtype


   Data types used in QNN operations


   .. py:attribute:: INT8
      :value: 'int8'


   .. py:attribute:: UINT8
      :value: 'uint8'


   .. py:attribute:: INT32
      :value: 'int32'


.. py:data:: DTYPE_BOUNDS

.. py:function:: round_op(x: float, rounding_type: ml_kernels.math_helpers.RoundType = RoundType.TOEVEN) -> float

   Rounding to the nearest larger integer
   :param x: A float32 number to be rounded
   return: Rounded result


.. py:function:: calculate_normalization_shift(scale: Union[float, numpy.ndarray], rounding: ml_kernels.math_helpers.RoundType = RoundType.TRUNC) -> Union[float, numpy.ndarray]

   Calculate the number of shifts to normalize a scale.
   The original scale will be normalized, depending on the rounding type, after dividing (2**shift).


.. py:function:: get_bound(bits: int, signed: bool = True) -> int

.. py:function:: clip_to_targeted_range(x: Union[int, numpy.ndarray], bits: int, restricted_range: bool = False) -> Union[int, numpy.ndarray]

   Clip the x with targeted range determined by the given bit number.
   :param x: Numpy array or int
   :param bits: Number of bits used to determine the min and max number
   :param restricted_range: If true, the abs(a_min) == abs(a_max)


.. py:function:: compute_scale(asymmetry: bool, layer_bits: int, min_val: float, max_val: float, include_real_zero_point: bool = False) -> float

   Compute a linear quantization scale for mapping the range (min_val, max_val) onto the quantized integer range
   determined by layer_bits, include_real_zero_point, and asymmetry.

   The computed scale is the reciprocal of the scale in TFLite's convention.

   :param asymmetry: If true, do asymmetric quantization.
   :param layer_bits: Number of bits used for quantization.
   :param min_val: Minimum value.
   :param max_val: Maximum value.
   :param include_real_zero_point: If True, force the float dynamic range
                                   covering zero.
   return: Computed scale s such that real numbers r are converted to integers q by the formula q = round(s * r).


.. py:function:: compute_zero_point(asymmetry: bool, layer_bits: int, min_val: float, max_val: float, restricted_range: bool = False) -> int

   Given min and max value, compute the zero point.
   :param asymmetry: If true, do asymmetric quantization.
   :param layer_bits: Number of bits used for quantization.
   :param min_val: Minimum value.
   :param max_val: Maximum value.
   :param restricted_range: If True, the dynamic range will be equal
                            at negative and positive side.
   return: Zero point.


.. py:function:: significant_bits_signed(n: int) -> int

   Get the smallest signed integer bit width that can represent the given integer.

       > significant_bits_signed(-129) = 9
       > significant_bits_signed(-128) = 8
       > significant_bits_signed(127) = 8
       > significant_bits_signed(128) = 9


.. py:function:: compute_power_of_2_scale_and_shift(scale: Union[float, numpy.ndarray], input_bit: int, output_bit: int) -> Union[Tuple[int, int], Tuple[numpy.ndarray, numpy.ndarray]]

   Given a float scale or a vector of scale and quantized bit number for input and output,
   return a quantized scale and right shift
   :param scale: Union[float, np.ndarray]
   :param input_bit: int. Number of bit used for input quantization
   :param output_bit: int. Number of bit used for output quantization
   :return: Union[Tuple[int, int], Tuple[np.ndarray, np.ndarray]. Tuple of (scale, right shift)


.. py:function:: compute_weight_scale(weight: numpy.ndarray, bits: int) -> float

   Compute weight scale. Weights are always quantized symmetrically.
   :param weight: Weight tensor.
   :param bits: Number of bits used to quantize weight.
   return: Scale of weight.


.. py:function:: compute_weight_scale_per_channel(weight: numpy.ndarray, bits: int) -> numpy.ndarray

   Compute per-channel weight scales. The expected layout of weight is AwesomeConvWeightLayout.
   :param weight: Weight tensor in AwesomeConvWeightLayout format.
   :param bits: Number of bits used to quantize weight.
   return: An array of scales of weight.


.. py:function:: linear_scale(input: numpy.ndarray, scale: float, bits: int, clip: bool = True) -> numpy.ndarray

   Linear scale the input based on the scale. Clip the scaled input based on the bit number
   :param input: A numpy array.
   :param scale: A scale factor that used to scale the input to a target range.
   :param bits: Number of bit used to clip the scaled input.
   :param clip: If true, clip the linear scale result to the given dynamic range.
   return: Scaled input.


.. py:function:: linear_scale_per_channel(input: numpy.ndarray, scale: numpy.ndarray, bits: int, clip: bool = True) -> numpy.ndarray

   Linear scale the input based on the scale. Clip the scaled input based on the bit number
   The output channel has to be at the last dimension.
   :param input: A numpy array.
   :param scale: A numpy array of scale factors that used to scale the input to a different
                 target ranges in different channels.
   :param bits: Number of bit used to clip the scaled input.
   :param clip: If true, clip the linear scale results to the given dynamic range.
   return: Scaled input.


.. py:function:: linear_quantize(input: numpy.ndarray, scale: float, zp: int, bits: int) -> numpy.ndarray

   quantized_input = (input / S) + zero_point.
   :param input: A numpy array.
   :param scale: scale = (1/S) in the above equation.
   :param zp: Zero point of the quantized input.
   :param bits: Number of bit used to clip the scaled input.
   return Quantized input.


.. py:function:: linear_quantize_with_quantization(input: numpy.ndarray, quantization: afe.ir.defines.Quantization) -> numpy.ndarray

   Apply a quantization to a floating-point tensor to produce a quantized tensor.

   :param input: Floating-point tensor
   :param quantization: Quantization to apply
   :return: Quantized tensor


.. py:function:: quantize_value(value: Any, q: afe.ir.defines.DataValue[Optional[afe.ir.defines.Quantization]]) -> Any

   Quantize a value according to the given quantization.
   Values consist of arrays and tuples.

   :param value: Value to quantize.  It must consist of numpy arrays and tuples.
   :param q: Quantization of the value.  None means that the value is not quantized
     and so it will be returned unchanged.
   :return: Quantized value.  It has the same tuple structure as the input.


.. py:function:: dequantize_value(value: Any, q: afe.ir.defines.DataValue[Optional[afe.ir.defines.Quantization]]) -> Any

   Dequantize a value according to the given quantization.
   Values consist of arrays and tuples.

   :param value: Value to dequantize.  It must consist of numpy arrays and tuples.
   :param q: Quantization of the value.  None means that the value is not quantized
     and so it will be returned unchanged.
   :return: Dequantized value.  It has the same tuple structure as the input.


.. py:function:: get_zero_kernel_mask_per_channel(weight: numpy.ndarray, threshold: float) -> numpy.ndarray

   Return the mask of zero kernel. The kernel layout of weight must be in AwesomeConvWeightLayout.
   :param weight: Weights for convolution in AwesomeConvWeightLayout layout.
   :param threshold: If the sum of kernel's absolute value is smaller than the threshold, the kernel will
                     be treated as a zero kernel.
   return: Mask of zero kernel. True means the kernel is a zero kernel.


.. py:function:: dequantize(input: numpy.ndarray, scale: float, zp: int) -> numpy.ndarray

   Original equation:
       quantized_input = (input / S) + zero_point.
   Reverse it to get dequantize equation:
       dequantized input = (quantized_input - zero_point) * S
   :param input: A numpy array.
   :param scale: scale = (1 / S) in the above equation.
   :param zp: Zero point of the quantized input.
   return Dequantized input.


.. py:function:: requantize(data: numpy.ndarray, bits: int, right_shifts: Union[int, numpy.ndarray], zp: Optional[int] = None, per_channel: bool = False, axis: int = -1, rounding_type: ml_kernels.math_helpers.RoundType = RoundType.UPWARD, *, result_type: afe.ir.tensor_type.ScalarType = ScalarType.int8) -> numpy.ndarray

   Requantize a quantized tensor to another quantization domain
   :param data: A numpy array.
   :param bits: Number of bit used to clip the scaled input.
   :param right_shifts: A numpy array. Each ouput channel has a number of bit shifted to the right.
                        This acts as a hardware friendly multiple of 2 scale.
   :param zp: Zero point of the quantized input.
   :param per_channel: Default is False. If True, each output channel has one right_shift.
   :param result_type: Numeric type of requantized tensor.
   return: Requantized tensor in chosen numeric type.


.. py:function:: float_requantization(input_quantization: afe.ir.defines.Quantization, output_quantization: afe.ir.defines.Quantization) -> Tuple[float, float]

   Calculate floating-point correction parameters to requantize integer data using
   floating-point intermediate values.

   It returns S and Z such that data can be requantized by the calculation:

       quantized_output = round(S * float(quantized_input) + Z)

   :param input_quantization: Quantization of input data
   :param output_quantization: Quantization of output data
   :return: Requantization scale correction and zero point correction


.. py:function:: power_of_2_requantization(input_quantization: afe.ir.defines.Quantization, output_quantization: afe.ir.defines.Quantization) -> int

   Calculate a shift factor to requantize data by a power of 2 in integer arithmetic.

   This should only be used if the input and output quantization were chosen
   for power of 2 requantization.  It is not a good approximation in general.

   It returns a shift such that data can be requantized by the calculation:

       quantized_output = quantized_input >> shift

   The shift should use rounding to nearest, with any tie-breaking method.

   :param input_quantization: Quantization of input data
   :param output_quantization: Quantization of output data
   :param bits: Integer precision of temporary values
   :return: Amount to shift right.  May be negative.


.. py:function:: requantization(input_quantization: afe.ir.defines.Quantization, output_quantization: afe.ir.defines.Quantization, bits: int = 32, *, sc_correction_bits: int = 32) -> Tuple[int, int, int]

   Calculate correction factors to requantize data in integer arithmetic.

   It returns S, Z, and shift such that data can be requantized by the calculation:

       quantized_output = ((S * quantized_input) + Z) >> shift

   The shift should use rounding to nearest, with any tie-breaking method.

   :param input_quantization: Quantization of input data
   :param output_quantization: Quantization of output data
   :param bits: Integer precision of temporary values
   :param sc_correction_bits: Integer precision of the scale correction.
      The returned scale correction, taken as a signed integer, will not exceed this many bits.
   :return: Requantization scale correction, zero point correction, and right shift


.. py:function:: requantization_tflite(input_quantization: afe.ir.defines.Quantization, output_quantization: afe.ir.defines.Quantization) -> Tuple[int, int, int]

   Calculate correction factors to do TFLite requantization.

   It returns S, Z, and shift such that data can be requantized by the calculation:

       quantized_output = ((S * quantized_input) >> shift) + Z

   The shift should use rounding to nearest, with any tie-breaking method.
   The product (S * quantized_input) is assumed not to overflow.  It is
   designed for a datapath that calculates this product in 64-bit precision.

   :param input_quantization: Quantization of input data.  The input data's zero point must be 0.
   :param output_quantization: Quantization of output data
   :return: Requantization scale correction, zero point correction, and right shift


.. py:function:: is_quantized(data: numpy.ndarray) -> bool

.. py:function:: dequantize_tensor(data: Union[List[numpy.ndarray], Tuple[numpy.ndarray, Ellipsis], numpy.ndarray], scales: List[float], zps: List[int]) -> Union[List[numpy.ndarray], Tuple[numpy.ndarray, Ellipsis], numpy.ndarray]

   Dequantize tensor. A tensor can be a List[int], a Tuple[np.ndarray, ...], or a np.ndarray.


.. py:function:: quantize_tensor(data: Union[Tuple[numpy.ndarray, Ellipsis], numpy.ndarray], scales: List[Union[float, List[float]]], zps: List[Union[int, List[int]]], layer_bits: List[Union[int, List[int]]]) -> Union[Tuple[numpy.ndarray, Ellipsis], numpy.ndarray]

   Quantize tensor. A tensor can be Tuple[np.ndarray, ...] or a np.ndarray.


.. py:function:: dequantize_input_dict(input_dict: Dict[afe.ir.defines.NodeName, Union[numpy.ndarray, Tuple[numpy.ndarray, Ellipsis]]], scales: List[Union[float, List[float]]], zps: List[Union[int, List[int]]]) -> Dict[afe.ir.defines.NodeName, Union[numpy.ndarray, Tuple[numpy.ndarray, Ellipsis]]]

   Given a input_dict, input scales, and input zero points, dequantize each input in the input_dict
   to float if the data type is QuantizedTensor.
   :param input_dict: Dict[NodeName, Union[np.ndarray, Tuple[np.ndarray, ...]]]. Input dictionary
                      with (key: value) = (input_name: data)
   :param scales: List[Union[float, List[float]]]. Input scale for each input data
   :param zps: List[Union[int, List[int]]]. Input zero point for each input data
   :return: A dequantized input_dict


.. py:function:: quantize_input_dict(input_dict: Dict[afe.ir.defines.NodeName, Union[numpy.ndarray, Tuple[numpy.ndarray, Ellipsis]]], scales: List[Union[float, List[float]]], zps: List[Union[int, List[int]]], layer_bits: List[Union[int, List[int]]]) -> Dict[afe.ir.defines.NodeName, Union[numpy.ndarray, Tuple[numpy.ndarray, Ellipsis]]]

   Given a input_dict, input scales, and input zero points, quantize each input in the input_dict
   to QuantizedTensor if the data type is not QuantizedTensor.
   :param input_dict: Dict[NodeName, Union[np.ndarray, Tuple[np.ndarray, ...]]]. Input dictionary
                      with (key: value) = (input_name: data)
   :param scales: List[Union[float, List[float]]]. Input scale for each input data
   :param zps: List[Union[int, List[int]]]. Input zero point for each input data
   :param layer_bits: Int, number of bit precision for QuantizedTensor
   :return: A quantized input_dict


.. py:function:: quantize_alpha(alpha: numpy.ndarray, bits: int = 8) -> Tuple[numpy.ndarray, int]

   Quantize the alpha for PreluOp

   :param alpha: Alpha
   :param bits: Number of bits used for quantization
   :return: Quantized alpha, shift value


.. py:function:: quantize_add_subtract(is_subtract: bool, input_scales: List[float], input_zps: List[int], scale: float, zero_point: int, layer_bits: int, in1_scale_const: int = 1, in2_scale_const: int = 1) -> Tuple[List[int], int, int]

   Quantize the add/subtact operator
   :param is_subtract: If True function is used to quantize subtract
       operator, otherwise add operator.
   :param input_scales: Scales of the input nodes.
   :param input_zps: Zero points of the input nodes.
   :param scale: Scale of the current node.
   :param zero_point: Zero point of the current node.
   :param layer_bits: Number of bits used for quantization.
   :param attrs: AwesomeAttributes class
   :param activ_attrs: Activation function used in case of composite operations.
   :param in1_scale_const: Const to be folded in 1st input scale.
   :param in2_scale_const: Const to be folded in 2nd input scale.


.. py:function:: quantize_multiply(lhs_quant: afe.ir.defines.Quantization, rhs_quant: afe.ir.defines.Quantization, output_quant: afe.ir.defines.Quantization, allow_full_output_precision: bool) -> Tuple[int, ml_kernels.requantization.BaseRequantization[numpy.ndarray], afe.ir.defines.Quantization]

   Quantize the multiply operator.

   :param lhs_quant: Quantization of the first input of multiply
   :param rhs_quant: Quantization of the second input of multiply
   :param output_quant: Quantization of the output of multiply.
      It may be ignored if allow_full_output_precision is True.
   :param allow_full_output_precision: Whether 32-bit output is allowed.  If True, then
      this function may ignore output_quant and output a 32-bit quantization.  If false,
      then this function will quantize according to output_quant.
   :return: Tuple of intrinsic shift amount, requantization to perform, and quantization of the output.


.. py:function:: quantize_batch_matmul(lhs_quant: afe.ir.defines.Quantization, rhs_quant: afe.ir.defines.Quantization, output_quant: afe.ir.defines.Quantization) -> Tuple[int, ml_kernels.requantization.BaseRequantization[numpy.ndarray], afe.ir.defines.Quantization]

.. py:function:: quantize_udf(input_quant: afe.ir.defines.Quantization, output_quant: afe.ir.defines.Quantization, input_type: type, output_type: type, func: Callable[[numpy.ndarray], numpy.ndarray], invert_scales: bool = True) -> numpy.ndarray

   Create a lookup table for a user-defined function.

   :param input_quant: Quantization of the input.
   :param output_quant: Quantization of the output.
   :param input_type: Type of LUT input.
   :param output_type: Type of LUT output.
   :param func: Function to be approximated by the lookup table.
   :param invert_scales: If true, the input scale factors are inverted.
   :return: Lookup table representing func for the quantized input and output.
      It is a numpy array of int8 or int16 values.


.. py:function:: get_input_quantization_func(scale: float, zp: int, layer_bit: int) -> Callable[[numpy.ndarray], numpy.ndarray]

   Return a function that takes a numpy array and using the scale
   and zero point to quantize the data using the equation below:
       quantized_input = (input / S) + zero_point
   :param input: A numpy array.
   :param scale: scale = (1/S) in the above equation.
   :param zp: Zero point of the quantized input.
   :param bits: Number of bit used to clip the scaled input.


.. py:function:: quantize_clip_attrs(attrs: afe.ir.attributes.ClipAttrs, scalar_type: afe.ir.tensor_type.ScalarType, quant: afe.ir.defines.Quantization) -> afe.ir.attributes.ClipQuantAttrs

   Quantize the attributes of clip operator

   Calculate the boundaries of the clip operator based on its quantization
   parameters and data type.

   :param attrs: Attributes of the clip operator
   :param scalar_type: Scalar data type of the quantized clip operator
   :param quant: Quantization parameters to apply to clip operator

   :returns: Attributes of the quantized clip operator containing boundary parameters
             calculated for quantized operator.


.. py:function:: quantize_activation(attrs: Union[afe.ir.attributes.ClipAttrs, afe.ir.attributes.ReluAttrs, None], quantization: afe.ir.defines.Quantization, scalar_type: afe.ir.tensor_type.ScalarType, *, quant_config: Optional[afe.core.configs.QuantizationConfigs] = None) -> Union[afe.ir.attributes.ClipQuantAttrs, afe.ir.attributes.ReluQuantAttrs, None]

   Quantize a simple activation function (clip, relu, or nothing) and simplify it if possible.

   No requantization is introduced to these activation functions; the input and output quantization scales are
   always the same.  Quantization may simplify an activation function by taking advantage of the
   clipping behavior of saturating arithmetic.

   :param attrs: Attributes of the activation function to quantize
   :param quantization: Quantization to apply to this activation function
   :param scalar_type: Scalar data type that the activation function will be evaluated on
   :param scalar_type: ScalarType used to initialize ReluAttrs. Has to be integer type.
   :param quant_config: Parameters that were used to choose 'quantization'.  Used for error checking.
   :return: Attributes of the quantized activation function.  It may be a different type than the input.


.. py:function:: requantize_activation(attrs: Union[afe.ir.attributes.ClipQuantAttrs, afe.ir.attributes.ReluQuantAttrs, None], zero_point: int, requantization: ml_kernels.requantization.BaseRequantization[numpy.ndarray], scalar_type: afe.ir.tensor_type.ScalarType) -> Union[afe.ir.attributes.ClipQuantAttrs, afe.ir.attributes.ReluQuantAttrs, None]

   Requantize an activation function.

   This represents transforming the expression requant(activ(x)), where the
   activation is evaluated before requantization, to an equivalent expression
   newactiv(requant(x)), where the new activation is evaluated after requantization.
   The new activation could be simpler by taking advantage of integer saturation.

   :param attrs: Activation function's attributes.  This must be for a quantized activation.
   :param zero_point: Original zero point of the activation function, before requantization.
      Ignored if attrs is None.
   :param requantization: Requantization to perform.  The input type of the
      requantization is assumed to be int16.
   :param scalar_type: ScalarType used to initialize ReluAttrs. Has to be integer type.
   :return: Transformed activation function's attributes (clip, relu, or nothing).


.. py:function:: requantize_quantization(quantization: afe.ir.defines.Quantization, requant: ml_kernels.requantization.BaseRequantization[numpy.ndarray]) -> afe.ir.defines.Quantization

   Get the quantization of the result of requantizing a tensor.
   This would be the quantization at the output of a Requantize node, for
   the given input and requantization.

   :param quantization: Quantization of input tensor
   :param requant: Requantization to perform
   :return: Quantization of the result of applying requant to the input tensor


.. py:function:: quantize_prelu(layer_bits: int, alpha: Union[numpy.ndarray, float]) -> Tuple[int, int]

   Quantized the PRelu alphas and return the quantized alphas and right shifts
   :param layer_bits: Number of bits used for quantization
   :param alpha: Union[np.ndarray, float]. alpha in float data type
   return: Tuple[np.ndarray, np.ndarray]. Tuple of (quantized alpha, right shift)


.. py:function:: quantize_reciprocal(input_qtype: afe.ir.attributes.QuantResultTensorType) -> afe.ir.attributes.AwesomeCalibAttrs

   Quantize the reciprocal part of divide
   :param input_qtype: quantization for rhs argument of divide.
   :return: calibration attributes AwesomeCalibAttrs which are used in ReciprocalOp UDF.


.. py:function:: quantize_lrn(attrs: afe.ir.attributes.LRNAttrs, input_quant: afe.ir.defines.Quantization, quant: afe.ir.defines.Quantization) -> afe.ir.attributes.LRNQuantAttrs

   Quantize LRN which is implemented based on quantized_local_response_normalization from ml_kernels repo:
   out = lut(square_sum(x)) * x
   where lut function is:
   lambda x: (bias + alpha / size * x) ** (beta)

   :param attrs: LRN attributes.
   :param input_quant: Quantization of input data
   :param quant: Layer quantization
   :return: Tuple[List[int], List[int], List[int]]. A tuple of
            (re-scaled input scales, corrected input zero points, right shifts)


.. py:function:: quantize_softmax(attrs: afe.ir.attributes.SoftmaxAttrs, input_quant: afe.ir.defines.Quantization, quant: afe.ir.defines.Quantization, intermediate_min_max: Dict[str, Tuple[float, float]], enable_int16: bool) -> afe.ir.attributes.SoftmaxQuantAttrs

   Quantize Softmax which is implemented based on softmax implementation from ml_kernels repo:
   exp = lut_exp(x)  # lut_exp(x) = exp(x)
   exp_sum_rec = lut_rec(np.sum(exp))  # lut_rec(x) = 1/x
   ofm = exp * exp_sum_rec

   :param attrs: Softmax attributes.
   :param input_quant: Quantization of input data
   :param quant: Layer quantization
   :param intermediate_min_max: Dict of intermediates min/max values.
   :param enable_int16: Whether to use int8 or int16 quantization.
   :return: Quantized Softmax attributes


.. py:function:: quantize_layer_norm(attrs: afe.ir.attributes.LayerNormAttrs, input_quant: afe.ir.defines.Quantization, quant: afe.ir.defines.Quantization, intermediate_min_max: dict[str, tuple[float, float]]) -> afe.ir.attributes.LayerNormQuantAttrs

   Quantize LayerNorm which is implemented based on layer norm implementation from ml_kernels repo:
   LayerNorm(input, axis, epsilon) = (input - m) / Sqrt(var + epsilon), where
       m = ReduceMean(input, axis, keepdims=True),
       var = ReduceMean((input - m) ** 2, axis, keepdims=True).
       Use LUT for reciprocal of the sqrt function.

   :param attrs: LayerNormAttrs attributes.
   :param input_quant: Quantization of input data.
   :param quant: Layer quantization.
   :param intermediate_min_max: Dict of intermediates min/max values.
   :return: Quantized LayerNormAttrs attributes.


.. py:function:: quantize_instance_norm(attrs: afe.ir.attributes.InstanceNormAttrs, input_quant: afe.ir.defines.Quantization, mean_quant: afe.ir.defines.Quantization, variance_quant: afe.ir.defines.Quantization, quant: afe.ir.defines.Quantization)

   Quantize Instance Normalization operator: (input - mean) / sqrt(variance + epsilon).

   :param attrs: Instance Normalization attributes.
   :param input_quant: Quantization of the input data.
   :param mean_quant: Quantization of the mean input data.
   :param variance_quant: Quantization of the variance input data.
   :param quant: Layer quantization.

   :returns: Quantized Instance Normalization attributes.


.. py:function:: quantize_rms_norm(attrs: afe.ir.attributes.RMSNormAttrs, input_quant: afe.ir.defines.Quantization, quant: afe.ir.defines.Quantization, intermediate_min_max: Dict[str, Tuple[float, float]], enable_lut_int16: bool) -> afe.ir.attributes.RMSNormQuantAttrs

   Quantize RMS Normalization which is implemented based on rms norm implementation from ml_kernels repo:
   RMSNorm(x, axis, epsilon) = x / Sqrt(ReduceMean(x ** 2, axis, keepdims=True) + epsilon)
   Use LUT for reciprocal of the sqrt function.

   :param attrs: RMSNorm attributes.
   :param input_quant: Quantization of input data.
   :param quant: Layer quantization.
   :param intermediate_min_max: Dict of intermediates min/max values.
   :param enable_lut_int16: If True, quantize LUT to int16 otherwise to int8.
   :return: Quantized RMSNorm attributes.


.. py:function:: quantization_data_value_to_output_list(quantization: afe.ir.defines.DataValue[afe.ir.defines.Quantization]) -> Tuple[List[float], List[int], List[int], List[int], List[int]]

   Convert a Data value of Quantization object(s) to lists of quantization-related values.
   This is used for interfacing to code that stores quantization information in five separate lists.

   :param: quantization: DataValue of Quantization object(s) to convert to quantization parameters Tuple.
   :return: Lists of scales, zero points, bits, minimum and maximum values.


.. py:function:: fix_requantization(requantization: ml_kernels.requantization.BaseRequantization[numpy.ndarray]) -> ml_kernels.requantization.BaseRequantization[numpy.ndarray]

   Change the data type of the right_shift array, if it is present, to uint8.


.. py:function:: cast_calibration_inputs(values: List[numpy.ndarray], cast: afe.ir.defines.QuantizationCast)

   Quantizes a list of tensors according to casts. Identity cast returns the original values.


.. py:function:: create_requantization_from_cast(cast: afe.ir.defines.RequantCast) -> ml_kernels.requantization.BaseRequantization[numpy.ndarray]

   Get the Requantization that implements the given cast.

   :param cast: Cast to perform
   :return: Requantization