afe.ir.quantization_utils ========================= .. py:module:: afe.ir.quantization_utils Attributes ---------- .. autoapisummary:: afe.ir.quantization_utils.DTYPE_BOUNDS Classes ------- .. autoapisummary:: afe.ir.quantization_utils.QNNDtype Functions --------- .. autoapisummary:: afe.ir.quantization_utils.round_op afe.ir.quantization_utils.calculate_normalization_shift afe.ir.quantization_utils.get_bound afe.ir.quantization_utils.clip_to_targeted_range afe.ir.quantization_utils.compute_scale afe.ir.quantization_utils.compute_zero_point afe.ir.quantization_utils.significant_bits_signed afe.ir.quantization_utils.compute_power_of_2_scale_and_shift afe.ir.quantization_utils.compute_weight_scale afe.ir.quantization_utils.compute_weight_scale_per_channel afe.ir.quantization_utils.linear_scale afe.ir.quantization_utils.linear_scale_per_channel afe.ir.quantization_utils.linear_quantize afe.ir.quantization_utils.linear_quantize_with_quantization afe.ir.quantization_utils.quantize_value afe.ir.quantization_utils.dequantize_value afe.ir.quantization_utils.get_zero_kernel_mask_per_channel afe.ir.quantization_utils.dequantize afe.ir.quantization_utils.requantize afe.ir.quantization_utils.float_requantization afe.ir.quantization_utils.power_of_2_requantization afe.ir.quantization_utils.requantization afe.ir.quantization_utils.requantization_tflite afe.ir.quantization_utils.is_quantized afe.ir.quantization_utils.dequantize_tensor afe.ir.quantization_utils.quantize_tensor afe.ir.quantization_utils.dequantize_input_dict afe.ir.quantization_utils.quantize_input_dict afe.ir.quantization_utils.quantize_alpha afe.ir.quantization_utils.quantize_add_subtract afe.ir.quantization_utils.quantize_multiply afe.ir.quantization_utils.quantize_batch_matmul afe.ir.quantization_utils.quantize_udf afe.ir.quantization_utils.get_input_quantization_func afe.ir.quantization_utils.quantize_clip_attrs afe.ir.quantization_utils.quantize_activation afe.ir.quantization_utils.requantize_activation afe.ir.quantization_utils.requantize_quantization afe.ir.quantization_utils.quantize_prelu afe.ir.quantization_utils.quantize_reciprocal afe.ir.quantization_utils.quantize_lrn afe.ir.quantization_utils.quantize_softmax afe.ir.quantization_utils.quantize_layer_norm afe.ir.quantization_utils.quantize_instance_norm afe.ir.quantization_utils.quantize_rms_norm afe.ir.quantization_utils.quantization_data_value_to_output_list afe.ir.quantization_utils.fix_requantization afe.ir.quantization_utils.cast_calibration_inputs afe.ir.quantization_utils.create_requantization_from_cast Module Contents --------------- .. py:class:: QNNDtype Data types used in QNN operations .. py:attribute:: INT8 :value: 'int8' .. py:attribute:: UINT8 :value: 'uint8' .. py:attribute:: INT32 :value: 'int32' .. py:data:: DTYPE_BOUNDS .. py:function:: round_op(x: float, rounding_type: ml_kernels.math_helpers.RoundType = RoundType.TOEVEN) -> float Rounding to the nearest larger integer :param x: A float32 number to be rounded return: Rounded result .. py:function:: calculate_normalization_shift(scale: Union[float, numpy.ndarray], rounding: ml_kernels.math_helpers.RoundType = RoundType.TRUNC) -> Union[float, numpy.ndarray] Calculate the number of shifts to normalize a scale. The original scale will be normalized, depending on the rounding type, after dividing (2**shift). .. py:function:: get_bound(bits: int, signed: bool = True) -> int .. py:function:: clip_to_targeted_range(x: Union[int, numpy.ndarray], bits: int, restricted_range: bool = False) -> Union[int, numpy.ndarray] Clip the x with targeted range determined by the given bit number. :param x: Numpy array or int :param bits: Number of bits used to determine the min and max number :param restricted_range: If true, the abs(a_min) == abs(a_max) .. py:function:: compute_scale(asymmetry: bool, layer_bits: int, min_val: float, max_val: float, include_real_zero_point: bool = False) -> float Compute a linear quantization scale for mapping the range (min_val, max_val) onto the quantized integer range determined by layer_bits, include_real_zero_point, and asymmetry. The computed scale is the reciprocal of the scale in TFLite's convention. :param asymmetry: If true, do asymmetric quantization. :param layer_bits: Number of bits used for quantization. :param min_val: Minimum value. :param max_val: Maximum value. :param include_real_zero_point: If True, force the float dynamic range covering zero. return: Computed scale s such that real numbers r are converted to integers q by the formula q = round(s * r). .. py:function:: compute_zero_point(asymmetry: bool, layer_bits: int, min_val: float, max_val: float, restricted_range: bool = False) -> int Given min and max value, compute the zero point. :param asymmetry: If true, do asymmetric quantization. :param layer_bits: Number of bits used for quantization. :param min_val: Minimum value. :param max_val: Maximum value. :param restricted_range: If True, the dynamic range will be equal at negative and positive side. return: Zero point. .. py:function:: significant_bits_signed(n: int) -> int Get the smallest signed integer bit width that can represent the given integer. > significant_bits_signed(-129) = 9 > significant_bits_signed(-128) = 8 > significant_bits_signed(127) = 8 > significant_bits_signed(128) = 9 .. py:function:: compute_power_of_2_scale_and_shift(scale: Union[float, numpy.ndarray], input_bit: int, output_bit: int) -> Union[Tuple[int, int], Tuple[numpy.ndarray, numpy.ndarray]] Given a float scale or a vector of scale and quantized bit number for input and output, return a quantized scale and right shift :param scale: Union[float, np.ndarray] :param input_bit: int. Number of bit used for input quantization :param output_bit: int. Number of bit used for output quantization :return: Union[Tuple[int, int], Tuple[np.ndarray, np.ndarray]. Tuple of (scale, right shift) .. py:function:: compute_weight_scale(weight: numpy.ndarray, bits: int) -> float Compute weight scale. Weights are always quantized symmetrically. :param weight: Weight tensor. :param bits: Number of bits used to quantize weight. return: Scale of weight. .. py:function:: compute_weight_scale_per_channel(weight: numpy.ndarray, bits: int) -> numpy.ndarray Compute per-channel weight scales. The expected layout of weight is AwesomeConvWeightLayout. :param weight: Weight tensor in AwesomeConvWeightLayout format. :param bits: Number of bits used to quantize weight. return: An array of scales of weight. .. py:function:: linear_scale(input: numpy.ndarray, scale: float, bits: int, clip: bool = True) -> numpy.ndarray Linear scale the input based on the scale. Clip the scaled input based on the bit number :param input: A numpy array. :param scale: A scale factor that used to scale the input to a target range. :param bits: Number of bit used to clip the scaled input. :param clip: If true, clip the linear scale result to the given dynamic range. return: Scaled input. .. py:function:: linear_scale_per_channel(input: numpy.ndarray, scale: numpy.ndarray, bits: int, clip: bool = True) -> numpy.ndarray Linear scale the input based on the scale. Clip the scaled input based on the bit number The output channel has to be at the last dimension. :param input: A numpy array. :param scale: A numpy array of scale factors that used to scale the input to a different target ranges in different channels. :param bits: Number of bit used to clip the scaled input. :param clip: If true, clip the linear scale results to the given dynamic range. return: Scaled input. .. py:function:: linear_quantize(input: numpy.ndarray, scale: float, zp: int, bits: int) -> numpy.ndarray quantized_input = (input / S) + zero_point. :param input: A numpy array. :param scale: scale = (1/S) in the above equation. :param zp: Zero point of the quantized input. :param bits: Number of bit used to clip the scaled input. return Quantized input. .. py:function:: linear_quantize_with_quantization(input: numpy.ndarray, quantization: afe.ir.defines.Quantization) -> numpy.ndarray Apply a quantization to a floating-point tensor to produce a quantized tensor. :param input: Floating-point tensor :param quantization: Quantization to apply :return: Quantized tensor .. py:function:: quantize_value(value: Any, q: afe.ir.defines.DataValue[Optional[afe.ir.defines.Quantization]]) -> Any Quantize a value according to the given quantization. Values consist of arrays and tuples. :param value: Value to quantize. It must consist of numpy arrays and tuples. :param q: Quantization of the value. None means that the value is not quantized and so it will be returned unchanged. :return: Quantized value. It has the same tuple structure as the input. .. py:function:: dequantize_value(value: Any, q: afe.ir.defines.DataValue[Optional[afe.ir.defines.Quantization]]) -> Any Dequantize a value according to the given quantization. Values consist of arrays and tuples. :param value: Value to dequantize. It must consist of numpy arrays and tuples. :param q: Quantization of the value. None means that the value is not quantized and so it will be returned unchanged. :return: Dequantized value. It has the same tuple structure as the input. .. py:function:: get_zero_kernel_mask_per_channel(weight: numpy.ndarray, threshold: float) -> numpy.ndarray Return the mask of zero kernel. The kernel layout of weight must be in AwesomeConvWeightLayout. :param weight: Weights for convolution in AwesomeConvWeightLayout layout. :param threshold: If the sum of kernel's absolute value is smaller than the threshold, the kernel will be treated as a zero kernel. return: Mask of zero kernel. True means the kernel is a zero kernel. .. py:function:: dequantize(input: numpy.ndarray, scale: float, zp: int) -> numpy.ndarray Original equation: quantized_input = (input / S) + zero_point. Reverse it to get dequantize equation: dequantized input = (quantized_input - zero_point) * S :param input: A numpy array. :param scale: scale = (1 / S) in the above equation. :param zp: Zero point of the quantized input. return Dequantized input. .. py:function:: requantize(data: numpy.ndarray, bits: int, right_shifts: Union[int, numpy.ndarray], zp: Optional[int] = None, per_channel: bool = False, axis: int = -1, rounding_type: ml_kernels.math_helpers.RoundType = RoundType.UPWARD, *, result_type: afe.ir.tensor_type.ScalarType = ScalarType.int8) -> numpy.ndarray Requantize a quantized tensor to another quantization domain :param data: A numpy array. :param bits: Number of bit used to clip the scaled input. :param right_shifts: A numpy array. Each ouput channel has a number of bit shifted to the right. This acts as a hardware friendly multiple of 2 scale. :param zp: Zero point of the quantized input. :param per_channel: Default is False. If True, each output channel has one right_shift. :param result_type: Numeric type of requantized tensor. return: Requantized tensor in chosen numeric type. .. py:function:: float_requantization(input_quantization: afe.ir.defines.Quantization, output_quantization: afe.ir.defines.Quantization) -> Tuple[float, float] Calculate floating-point correction parameters to requantize integer data using floating-point intermediate values. It returns S and Z such that data can be requantized by the calculation: quantized_output = round(S * float(quantized_input) + Z) :param input_quantization: Quantization of input data :param output_quantization: Quantization of output data :return: Requantization scale correction and zero point correction .. py:function:: power_of_2_requantization(input_quantization: afe.ir.defines.Quantization, output_quantization: afe.ir.defines.Quantization) -> int Calculate a shift factor to requantize data by a power of 2 in integer arithmetic. This should only be used if the input and output quantization were chosen for power of 2 requantization. It is not a good approximation in general. It returns a shift such that data can be requantized by the calculation: quantized_output = quantized_input >> shift The shift should use rounding to nearest, with any tie-breaking method. :param input_quantization: Quantization of input data :param output_quantization: Quantization of output data :param bits: Integer precision of temporary values :return: Amount to shift right. May be negative. .. py:function:: requantization(input_quantization: afe.ir.defines.Quantization, output_quantization: afe.ir.defines.Quantization, bits: int = 32, *, sc_correction_bits: int = 32) -> Tuple[int, int, int] Calculate correction factors to requantize data in integer arithmetic. It returns S, Z, and shift such that data can be requantized by the calculation: quantized_output = ((S * quantized_input) + Z) >> shift The shift should use rounding to nearest, with any tie-breaking method. :param input_quantization: Quantization of input data :param output_quantization: Quantization of output data :param bits: Integer precision of temporary values :param sc_correction_bits: Integer precision of the scale correction. The returned scale correction, taken as a signed integer, will not exceed this many bits. :return: Requantization scale correction, zero point correction, and right shift .. py:function:: requantization_tflite(input_quantization: afe.ir.defines.Quantization, output_quantization: afe.ir.defines.Quantization) -> Tuple[int, int, int] Calculate correction factors to do TFLite requantization. It returns S, Z, and shift such that data can be requantized by the calculation: quantized_output = ((S * quantized_input) >> shift) + Z The shift should use rounding to nearest, with any tie-breaking method. The product (S * quantized_input) is assumed not to overflow. It is designed for a datapath that calculates this product in 64-bit precision. :param input_quantization: Quantization of input data. The input data's zero point must be 0. :param output_quantization: Quantization of output data :return: Requantization scale correction, zero point correction, and right shift .. py:function:: is_quantized(data: numpy.ndarray) -> bool .. py:function:: dequantize_tensor(data: Union[List[numpy.ndarray], Tuple[numpy.ndarray, Ellipsis], numpy.ndarray], scales: List[float], zps: List[int]) -> Union[List[numpy.ndarray], Tuple[numpy.ndarray, Ellipsis], numpy.ndarray] Dequantize tensor. A tensor can be a List[int], a Tuple[np.ndarray, ...], or a np.ndarray. .. py:function:: quantize_tensor(data: Union[Tuple[numpy.ndarray, Ellipsis], numpy.ndarray], scales: List[Union[float, List[float]]], zps: List[Union[int, List[int]]], layer_bits: List[Union[int, List[int]]]) -> Union[Tuple[numpy.ndarray, Ellipsis], numpy.ndarray] Quantize tensor. A tensor can be Tuple[np.ndarray, ...] or a np.ndarray. .. py:function:: dequantize_input_dict(input_dict: Dict[afe.ir.defines.NodeName, Union[numpy.ndarray, Tuple[numpy.ndarray, Ellipsis]]], scales: List[Union[float, List[float]]], zps: List[Union[int, List[int]]]) -> Dict[afe.ir.defines.NodeName, Union[numpy.ndarray, Tuple[numpy.ndarray, Ellipsis]]] Given a input_dict, input scales, and input zero points, dequantize each input in the input_dict to float if the data type is QuantizedTensor. :param input_dict: Dict[NodeName, Union[np.ndarray, Tuple[np.ndarray, ...]]]. Input dictionary with (key: value) = (input_name: data) :param scales: List[Union[float, List[float]]]. Input scale for each input data :param zps: List[Union[int, List[int]]]. Input zero point for each input data :return: A dequantized input_dict .. py:function:: quantize_input_dict(input_dict: Dict[afe.ir.defines.NodeName, Union[numpy.ndarray, Tuple[numpy.ndarray, Ellipsis]]], scales: List[Union[float, List[float]]], zps: List[Union[int, List[int]]], layer_bits: List[Union[int, List[int]]]) -> Dict[afe.ir.defines.NodeName, Union[numpy.ndarray, Tuple[numpy.ndarray, Ellipsis]]] Given a input_dict, input scales, and input zero points, quantize each input in the input_dict to QuantizedTensor if the data type is not QuantizedTensor. :param input_dict: Dict[NodeName, Union[np.ndarray, Tuple[np.ndarray, ...]]]. Input dictionary with (key: value) = (input_name: data) :param scales: List[Union[float, List[float]]]. Input scale for each input data :param zps: List[Union[int, List[int]]]. Input zero point for each input data :param layer_bits: Int, number of bit precision for QuantizedTensor :return: A quantized input_dict .. py:function:: quantize_alpha(alpha: numpy.ndarray, bits: int = 8) -> Tuple[numpy.ndarray, int] Quantize the alpha for PreluOp :param alpha: Alpha :param bits: Number of bits used for quantization :return: Quantized alpha, shift value .. py:function:: quantize_add_subtract(is_subtract: bool, input_scales: List[float], input_zps: List[int], scale: float, zero_point: int, layer_bits: int, in1_scale_const: int = 1, in2_scale_const: int = 1) -> Tuple[List[int], int, int] Quantize the add/subtact operator :param is_subtract: If True function is used to quantize subtract operator, otherwise add operator. :param input_scales: Scales of the input nodes. :param input_zps: Zero points of the input nodes. :param scale: Scale of the current node. :param zero_point: Zero point of the current node. :param layer_bits: Number of bits used for quantization. :param attrs: AwesomeAttributes class :param activ_attrs: Activation function used in case of composite operations. :param in1_scale_const: Const to be folded in 1st input scale. :param in2_scale_const: Const to be folded in 2nd input scale. .. py:function:: quantize_multiply(lhs_quant: afe.ir.defines.Quantization, rhs_quant: afe.ir.defines.Quantization, output_quant: afe.ir.defines.Quantization, allow_full_output_precision: bool) -> Tuple[int, ml_kernels.requantization.BaseRequantization[numpy.ndarray], afe.ir.defines.Quantization] Quantize the multiply operator. :param lhs_quant: Quantization of the first input of multiply :param rhs_quant: Quantization of the second input of multiply :param output_quant: Quantization of the output of multiply. It may be ignored if allow_full_output_precision is True. :param allow_full_output_precision: Whether 32-bit output is allowed. If True, then this function may ignore output_quant and output a 32-bit quantization. If false, then this function will quantize according to output_quant. :return: Tuple of intrinsic shift amount, requantization to perform, and quantization of the output. .. py:function:: quantize_batch_matmul(lhs_quant: afe.ir.defines.Quantization, rhs_quant: afe.ir.defines.Quantization, output_quant: afe.ir.defines.Quantization) -> Tuple[int, ml_kernels.requantization.BaseRequantization[numpy.ndarray], afe.ir.defines.Quantization] .. py:function:: quantize_udf(input_quant: afe.ir.defines.Quantization, output_quant: afe.ir.defines.Quantization, input_type: type, output_type: type, func: Callable[[numpy.ndarray], numpy.ndarray], invert_scales: bool = True) -> numpy.ndarray Create a lookup table for a user-defined function. :param input_quant: Quantization of the input. :param output_quant: Quantization of the output. :param input_type: Type of LUT input. :param output_type: Type of LUT output. :param func: Function to be approximated by the lookup table. :param invert_scales: If true, the input scale factors are inverted. :return: Lookup table representing func for the quantized input and output. It is a numpy array of int8 or int16 values. .. py:function:: get_input_quantization_func(scale: float, zp: int, layer_bit: int) -> Callable[[numpy.ndarray], numpy.ndarray] Return a function that takes a numpy array and using the scale and zero point to quantize the data using the equation below: quantized_input = (input / S) + zero_point :param input: A numpy array. :param scale: scale = (1/S) in the above equation. :param zp: Zero point of the quantized input. :param bits: Number of bit used to clip the scaled input. .. py:function:: quantize_clip_attrs(attrs: afe.ir.attributes.ClipAttrs, scalar_type: afe.ir.tensor_type.ScalarType, quant: afe.ir.defines.Quantization) -> afe.ir.attributes.ClipQuantAttrs Quantize the attributes of clip operator Calculate the boundaries of the clip operator based on its quantization parameters and data type. :param attrs: Attributes of the clip operator :param scalar_type: Scalar data type of the quantized clip operator :param quant: Quantization parameters to apply to clip operator :returns: Attributes of the quantized clip operator containing boundary parameters calculated for quantized operator. .. py:function:: quantize_activation(attrs: Union[afe.ir.attributes.ClipAttrs, afe.ir.attributes.ReluAttrs, None], quantization: afe.ir.defines.Quantization, scalar_type: afe.ir.tensor_type.ScalarType, *, quant_config: Optional[afe.core.configs.QuantizationConfigs] = None) -> Union[afe.ir.attributes.ClipQuantAttrs, afe.ir.attributes.ReluQuantAttrs, None] Quantize a simple activation function (clip, relu, or nothing) and simplify it if possible. No requantization is introduced to these activation functions; the input and output quantization scales are always the same. Quantization may simplify an activation function by taking advantage of the clipping behavior of saturating arithmetic. :param attrs: Attributes of the activation function to quantize :param quantization: Quantization to apply to this activation function :param scalar_type: Scalar data type that the activation function will be evaluated on :param scalar_type: ScalarType used to initialize ReluAttrs. Has to be integer type. :param quant_config: Parameters that were used to choose 'quantization'. Used for error checking. :return: Attributes of the quantized activation function. It may be a different type than the input. .. py:function:: requantize_activation(attrs: Union[afe.ir.attributes.ClipQuantAttrs, afe.ir.attributes.ReluQuantAttrs, None], zero_point: int, requantization: ml_kernels.requantization.BaseRequantization[numpy.ndarray], scalar_type: afe.ir.tensor_type.ScalarType) -> Union[afe.ir.attributes.ClipQuantAttrs, afe.ir.attributes.ReluQuantAttrs, None] Requantize an activation function. This represents transforming the expression requant(activ(x)), where the activation is evaluated before requantization, to an equivalent expression newactiv(requant(x)), where the new activation is evaluated after requantization. The new activation could be simpler by taking advantage of integer saturation. :param attrs: Activation function's attributes. This must be for a quantized activation. :param zero_point: Original zero point of the activation function, before requantization. Ignored if attrs is None. :param requantization: Requantization to perform. The input type of the requantization is assumed to be int16. :param scalar_type: ScalarType used to initialize ReluAttrs. Has to be integer type. :return: Transformed activation function's attributes (clip, relu, or nothing). .. py:function:: requantize_quantization(quantization: afe.ir.defines.Quantization, requant: ml_kernels.requantization.BaseRequantization[numpy.ndarray]) -> afe.ir.defines.Quantization Get the quantization of the result of requantizing a tensor. This would be the quantization at the output of a Requantize node, for the given input and requantization. :param quantization: Quantization of input tensor :param requant: Requantization to perform :return: Quantization of the result of applying requant to the input tensor .. py:function:: quantize_prelu(layer_bits: int, alpha: Union[numpy.ndarray, float]) -> Tuple[int, int] Quantized the PRelu alphas and return the quantized alphas and right shifts :param layer_bits: Number of bits used for quantization :param alpha: Union[np.ndarray, float]. alpha in float data type return: Tuple[np.ndarray, np.ndarray]. Tuple of (quantized alpha, right shift) .. py:function:: quantize_reciprocal(input_qtype: afe.ir.attributes.QuantResultTensorType) -> afe.ir.attributes.AwesomeCalibAttrs Quantize the reciprocal part of divide :param input_qtype: quantization for rhs argument of divide. :return: calibration attributes AwesomeCalibAttrs which are used in ReciprocalOp UDF. .. py:function:: quantize_lrn(attrs: afe.ir.attributes.LRNAttrs, input_quant: afe.ir.defines.Quantization, quant: afe.ir.defines.Quantization) -> afe.ir.attributes.LRNQuantAttrs Quantize LRN which is implemented based on quantized_local_response_normalization from ml_kernels repo: out = lut(square_sum(x)) * x where lut function is: lambda x: (bias + alpha / size * x) ** (beta) :param attrs: LRN attributes. :param input_quant: Quantization of input data :param quant: Layer quantization :return: Tuple[List[int], List[int], List[int]]. A tuple of (re-scaled input scales, corrected input zero points, right shifts) .. py:function:: quantize_softmax(attrs: afe.ir.attributes.SoftmaxAttrs, input_quant: afe.ir.defines.Quantization, quant: afe.ir.defines.Quantization, intermediate_min_max: Dict[str, Tuple[float, float]], enable_int16: bool) -> afe.ir.attributes.SoftmaxQuantAttrs Quantize Softmax which is implemented based on softmax implementation from ml_kernels repo: exp = lut_exp(x) # lut_exp(x) = exp(x) exp_sum_rec = lut_rec(np.sum(exp)) # lut_rec(x) = 1/x ofm = exp * exp_sum_rec :param attrs: Softmax attributes. :param input_quant: Quantization of input data :param quant: Layer quantization :param intermediate_min_max: Dict of intermediates min/max values. :param enable_int16: Whether to use int8 or int16 quantization. :return: Quantized Softmax attributes .. py:function:: quantize_layer_norm(attrs: afe.ir.attributes.LayerNormAttrs, input_quant: afe.ir.defines.Quantization, quant: afe.ir.defines.Quantization, intermediate_min_max: dict[str, tuple[float, float]]) -> afe.ir.attributes.LayerNormQuantAttrs Quantize LayerNorm which is implemented based on layer norm implementation from ml_kernels repo: LayerNorm(input, axis, epsilon) = (input - m) / Sqrt(var + epsilon), where m = ReduceMean(input, axis, keepdims=True), var = ReduceMean((input - m) ** 2, axis, keepdims=True). Use LUT for reciprocal of the sqrt function. :param attrs: LayerNormAttrs attributes. :param input_quant: Quantization of input data. :param quant: Layer quantization. :param intermediate_min_max: Dict of intermediates min/max values. :return: Quantized LayerNormAttrs attributes. .. py:function:: quantize_instance_norm(attrs: afe.ir.attributes.InstanceNormAttrs, input_quant: afe.ir.defines.Quantization, mean_quant: afe.ir.defines.Quantization, variance_quant: afe.ir.defines.Quantization, quant: afe.ir.defines.Quantization) Quantize Instance Normalization operator: (input - mean) / sqrt(variance + epsilon). :param attrs: Instance Normalization attributes. :param input_quant: Quantization of the input data. :param mean_quant: Quantization of the mean input data. :param variance_quant: Quantization of the variance input data. :param quant: Layer quantization. :returns: Quantized Instance Normalization attributes. .. py:function:: quantize_rms_norm(attrs: afe.ir.attributes.RMSNormAttrs, input_quant: afe.ir.defines.Quantization, quant: afe.ir.defines.Quantization, intermediate_min_max: Dict[str, Tuple[float, float]], enable_lut_int16: bool) -> afe.ir.attributes.RMSNormQuantAttrs Quantize RMS Normalization which is implemented based on rms norm implementation from ml_kernels repo: RMSNorm(x, axis, epsilon) = x / Sqrt(ReduceMean(x ** 2, axis, keepdims=True) + epsilon) Use LUT for reciprocal of the sqrt function. :param attrs: RMSNorm attributes. :param input_quant: Quantization of input data. :param quant: Layer quantization. :param intermediate_min_max: Dict of intermediates min/max values. :param enable_lut_int16: If True, quantize LUT to int16 otherwise to int8. :return: Quantized RMSNorm attributes. .. py:function:: quantization_data_value_to_output_list(quantization: afe.ir.defines.DataValue[afe.ir.defines.Quantization]) -> Tuple[List[float], List[int], List[int], List[int], List[int]] Convert a Data value of Quantization object(s) to lists of quantization-related values. This is used for interfacing to code that stores quantization information in five separate lists. :param: quantization: DataValue of Quantization object(s) to convert to quantization parameters Tuple. :return: Lists of scales, zero points, bits, minimum and maximum values. .. py:function:: fix_requantization(requantization: ml_kernels.requantization.BaseRequantization[numpy.ndarray]) -> ml_kernels.requantization.BaseRequantization[numpy.ndarray] Change the data type of the right_shift array, if it is present, to uint8. .. py:function:: cast_calibration_inputs(values: List[numpy.ndarray], cast: afe.ir.defines.QuantizationCast) Quantizes a list of tensors according to casts. Identity cast returns the original values. .. py:function:: create_requantization_from_cast(cast: afe.ir.defines.RequantCast) -> ml_kernels.requantization.BaseRequantization[numpy.ndarray] Get the Requantization that implements the given cast. :param cast: Cast to perform :return: Requantization