.. _Defining ANY Model: Defining a Model ################ Describes how to determine whether a given model will be supported on the SiMa MLSoC device. The SiMa MLSoC device will support any model (irrespective of the framework it has been written in) that can be mapped to the ONNX operator verison 16 or 17, or to the TFLite operator version 2.10.0. Some of the supported models cannot be expressed using the supported ONNX subset. For example, a set of Caffe operators are supported although they do not have ONNX equivalents. The only difference is that the support for the ONNX operator subset defined in this section has been systematically qualified. Pre-quantized TFLite models are also supported. Terminology *********** .. list-table:: **Terms and Definitions** :widths: 30 70 :header-rows: 1 * - **Term** - **Definition** * - Operator - Refers to an ONNX operator included in opset 16, 17, or a TFLite operator of TFLite version 2.10.0 * - Operator Instance - A specific instantiation of an operator which fully defines the following: * The use of optional and variadic tensors * All tensor shapes and tensor element types * All operator attribute values * - Model - An inference model exclusively consisting of operators from ONNX Opset version 16, 17, or operators pre-quantized with TFLite v2.10.0 * - ONNX - Any mention of ONNX refers to ONNX version 1.11 or 1.12 which corresponds to ONNX opset version 16 or 17, respectively. Testing has been performed using ONNX runtime version 1.15.0 * - TFLite - Any mention of TFLite refers to TFLite version 2.10.0 or a TFLite operator. .. note:: An operator and an operator instance can be thought of as an 'abstract operator' and a 'concrete operator', respectively. Supported Operators ******************* Each ONNX operator falls into one of the below supported categories: #. **No Support**: No instance of the operator is supported. For example, Droupout. #. **A65 Support**: Some instantiations of the operator are supported by the A65, but not supported by the MLA. For example, Reshape. #. **MLA Support**: Some instantiations of the operator are supported by the MLA. Custom Implementation ===================== If an operator falls under the No Support category, it can be implemented using custom code on A65/EV74. Only automatic compilation scenarios are considered where an operator will fall into the categories described above. .. note:: * Any operator instance which is supported by the MLA is also supported by the A65. * Virtually all operators in categories 2 and 3 above also have instantiations which are not supported. This is true even for relatively simple operators. For example, depending on the tensor element type used, the Add operator is either supported on the MLA or only supported on the A65, or not supported. .. image:: media/onnxoperatorsupportcategories.png :scale: 55% :align: center Unsupported Operators ********************* The ONNX Opset 17 (a superset of Opset 16) consists of a total of 200 operators. Of these, 72 are unsupported. This section describes the specific criteria used to identify the unsupported operators. We use the schema of ONNX operators to programmatically extract the vast majority of the information needed to determine the support level of each operator. An ONNX operator is not supported if one or more of the following criteria are met: #. The operator is deprecated. That is, ``schema.deprecated`` is ``True`` for the operator schema. #. The operator is experimental. That is, ``schema.support_level`` is not equal to ``onnx.defs.OpSchema.SupportType.COMMON``. #. The operator belongs to the ``ai.onnx.preview.training`` or the ``ai.onnx.ml domain``, as defined by ``schema.domain``. These two domains contain operators specific to training and tree-based ML learning, respectively, and are therefore not relevant. #. The operator applies to training only (and is not in the ``ai.onnx.preview.training domain``). #. The operator is specific to RNNs, which do not apply to computer vision. #. The operator is used to quantize, de-quantize, or operate on quantized data. These are not relevant since our stack does all the needed de-quantization. See the A65 Supported Operators section. ADDLINK #. The operator defines control flow. #. The operator has a non-optional (i.e., required as a Single or Variadic tensor) which only supports sequences, maps, or tensor element types which are not supported. See the A65 Supported Operators section. ADDLINK #. The operator is not supported by TVM, which we use internally. This only excludes four operators which are rarely used. There are no restrictions on attribute values. We support all attribute values supported by ONNX. In other words, the element type restrictions mentioned earlier do not apply to attributes. A65 Supported Operators *********************** When a model meets the requirements below, it is supported by A65. The element type of each tensor must be static and cannot change at runtime. Of the 16 tensor element types, 12 are supported. The unsupported types are: * TensorProto.STRING * TensorProto.BFLOAT16 * TensorProto.COMPLEX64 * TensorProto.COMPLEX128 Operator attribute values must be static and known at compile time. That is, an attribute cannot be changed at runtime. Some operators are inherently dynamic in nature in the sense that input tensor values impact model properties such as output tensor shapes and element types. For example, for the Reshape operator, the values of the input tensor shape defines the shape of the output tensor reshaped. The A65 provides limited support for dynamic operators, subject to the above constraints. MLA Supported Operators *********************** Any operator instance must satisfy the following requirements to be supported by the MLA, that is, these requirements are necessary, but not sufficient: * The operator instance must be supported by the A65. In other words, all A65 support requirements also apply to the MLA supported operators. * All properties of the operator instance must be static and known at compile time. However, a dynamic operator such as Reshape is still supported on the MLA in the special case where the operator instantiation is static. For example, Reshape is supported if its input shape tensor is a constant. * All input tensors must be 4D, except for the ones explicitly listed in the table below. As a consequence, numpy-style broadcasting, as described in broadcasting in ONNX, is not supported. * All tensors, of ONNX models, must have the element type TensorProto.FLOAT32. The table below shows the operators and input tensors excluded from the 4D requirement. The table below shows the operators and the input tensors excluded from the 4D requirement. .. list-table:: **Operators and Input Tensors Excluded From the 4D Requirement** :widths: 30 70 :header-rows: 1 * - **Operator** - **Input Tensors Excluded from the 4D Requirement** * - Add, Sub - Either A or B can be a scalar constant * - Conv - B * - ConvTranspose - B * - Mul - Either A or B can be a scalar constant * - Pad - Pads, axes * - Resize - Scales, sizes * - Slice - Tind The MLA supports the NHWC layout format only. However, this is not a user-facing limitation since layout transforms are automatically inserted as needed. MLA Supported Categories ************************ For the sake of defining the MLA support for ONNX, consider a kernel to be a translation unit which is mapped in its entirety to either the A65 or the MLA. The software stack then maps an ONNX model graph to a kernel graph (via multiple intermediate representations). The MLA support for an ONNX operator can now be divided into the following three categories, depending on the ONNX-to-kernel mapping: * 1-to-1: One ONNX operator instance is mapped to one kernel. This instance then executes exclusively either on the A65 or the MLA. * 1-to-N: One ONNX operator instance is mapped to multiple kernels, that is, a kernel subgraph. This subgraph may be mapped to the A65 or the MLA, or it may be split across the A65 and MLA. In the latter case, we consider the ONNX operator instance not supported by the MLA, since the full instance implementation will not be running on the MLA. * N-to-1: Two or more ONNX operator instances, that is, an ONNX subgraph is mapped to a single kernel. This is referred to as a fused operator. The figure below illustrates the Operator “Sum” as an example of a 1-to-N mapping which maps exclusively to the MLA. It also illustrates the variadic input tensors. .. image:: media/operatorsum.png :scale: 60% :align: center The figure below illustrates the Operator “ReduceLogSum” as an example of 1-to-N mapping which is split between the MLA and the A65. Such operators are considered as not supported by the MLA. .. image:: media/operatorreducelogsum.png :scale: 70% :align: center Individual Operators ******************** A single ONNX operator maps to one or more kernels, as described in the section above on MLA Supported Categories. The table below lists all operators which are fully supported on the MLA, that is, each of these operators maps to a kernel graph in which all kernels will execute on the MLA. For some operators, the MLA support requires additional constraints which must be met (as specified in the table below) using the pseudo Python syntax. There is one requirement per line. All requirements must be met, that is, they must be combined using logical “and.” .. list-table:: Fully Supported Operators on the MLA with Constraints :widths: 30 70 :header-rows: 1 * - **Operator** - **Additional Constraints** * - Conv - 2D only * - ConvTranspose - Notation: G = number of groups, C = number of channels 2D only Each value *in strides* must be in [1, 2, 4, 8, 16] If G == 1 then each value in *dilations* must be 1 (default). G in [1, C] * - AveragePool - 2D only * - GlobalAveragePool - 2D only * - MaxPool - 2D only Optional output tensor Indices cannot be used. *ceil_mode* == 0 (default) *dilations* == [1, 1] (default) (kernel_shape[0] < 128 and kernel_shape[1] < 128) or (kernel_shape[0] == H and kernel_shape[1] == W) * - GlobalMaxPool - 2D only * - ReduceMean - *keepdims* == 1 (default) (0 not in data.shape) or (*data*.shape[0] == 1) (kernel_shape[0] < 128 and kernel_shape[1] < 128) or (kernel_shape[0] == H and kernel_shape[1] == W) * - ArgMax - N == 1 (batch size) C <= 2032 (channel dimension) H == 1 (height) W == 1 (width) axis == index of channel dimension (1 or 3) *keepdims* == 1 (default) **Note**: Output is always int32, whereas in the ONNX specification, the output is always int64. * - Pad - mode == "constant" * - LRN - None * - Relu, LeakyRelu, Tanh, Sigmoid, HardSigmoid, HardSwish - axis == index of channel dimension (1 or 3) * - Add, Sub, Mul, Neg, Sqrt, Exp, Log, Erf - None * - Concat - axis != 0 (batch dimension) * - Slice - None * - Resize - coordinate_transformation_mode != tf_crop_and_resize mode in [‘nearest’, ‘linear’] Can scale in H (height) and W (width) dimensions only Both scaling factors must be a power of 2 If mode == ‘linear’ and coordinate_transformation_mode == ‘half_pixel’ then image can only be scaled with factors 1, 2 or 4 in each dimension. Fused Operators *************** The most frequently occurring operator sequences are supported on the MLA by the fused operator implementations as described in the section above on MLA Supported Categories. In the following list of fused operators, the notation “Conv + Add” or “ConvTranspose + Add” means that bias is added by the Conv(Transpose) operator itself by using the optional input tensor B. So this notation represents a single Conv or ConvTranspose operation. The notation “+ [Clip, Relu, Mul]” means that the preceding operator is followed by a separate Clip, Relu or Mul operation. The requirements for any fused operator is the union of the requirements of each of its constituent operators. The fused operators are: * Add + Relu * Constant + Add * Constant + Sub * Constant + Mul * Conv + Clip * Conv + Relu * Conv + Add * Conv + Add + Clip * Conv + Add + Relu * Conv+Add+Mul * ConvTranspose + Clip * ConvTranspose + Relu * ConvTranspose + Add * ConvTranspose + Add + Clip * ConvTranspose + Add + Relu TFLite Models ************* The support for pre-quantized TFLite models is compliant with the TFLite specification. Note that quantization is only done to the int8 datatype. The provided support is subject to all of the constraints and limitations described for the ONNX models, above.