sima_utils.transformer.onnx_builder
===================================

.. py:module:: sima_utils.transformer.onnx_builder


Attributes
----------

.. autoapisummary::

   sima_utils.transformer.onnx_builder.OnnxNode


Classes
-------

.. autoapisummary::

   sima_utils.transformer.onnx_builder.OnnxBuilder


Module Contents
---------------

.. py:data:: OnnxNode

.. py:class:: OnnxBuilder

   Helper class to build onnx model.

   .. attribute:: IR_VERSION

      IR version of the onnx model.

   .. attribute:: OPSET_ID

      Operator set id of the onnx model.

   .. attribute:: onnx_file_name

      File name of the onnx file.

   .. attribute:: get_param_func

      A function that returns a parameter tensor with the provided parameter name.

   .. attribute:: check_param_func

      A function that checks if a parameter tensor with the provided parameter name exists.

   .. attribute:: input_nodes

      Input nodes of the onnx model.

   .. attribute:: output_nodes

      Output nodes of the onnx model.

   .. attribute:: _initializer_map

      A mapping from a name to an onnx initializer.

   .. attribute:: _node_map

      A mapping from a name to an onnx node.


   .. py:attribute:: IR_VERSION
      :type:  ClassVar[int]
      :value: 8


   .. py:attribute:: OPSET_ID
      :type:  ClassVar[int]
      :value: 17


   .. py:attribute:: onnx_file_name
      :type:  pathlib.Path


   .. py:attribute:: get_param_func
      :type:  collections.abc.Callable[[str], numpy.ndarray] | None
      :value: None


   .. py:attribute:: check_param_func
      :type:  collections.abc.Callable[[str], bool] | None
      :value: None


   .. py:attribute:: input_nodes
      :type:  list[onnx.ValueInfoProto]
      :value: []


   .. py:attribute:: output_nodes
      :type:  list[onnx.ValueInfoProto]
      :value: []


   .. py:method:: create_and_save_model(do_simplify: bool = True)

      Creates and saves the model.

      :param do_simplify: Set true to simplify the created onnx graph.


   .. py:method:: create_model() -> onnx.ModelProto

      Creates the model and performs shape inference.


   .. py:method:: save_model(model: onnx.ModelProto)

      Saves the model to a file.

      :param model: The onnx model to be saved.


   .. py:method:: create_input_node(name: str, shape: collections.abc.Sequence[int], dtype: type = np.float32)

      Creates an input node with the provided name, shape and data type.


   .. py:method:: create_output_node(name: str, shape: collections.abc.Sequence[int], dtype: type = np.float32)

      Creates an output node with the provided name, shape and data type.


   .. py:method:: get_node_output_names(node: OnnxNode) -> list[str]

      Gets a list of the output names of the given node.


   .. py:method:: get_node_output_name(node: OnnxNode) -> str

      Gets the output name of the given node with only one output.


   .. py:method:: create_initializer(name: str, value: int | float | numpy.ndarray | None = None, reshape_str: str | None = None) -> OnnxNode | None

      Creates an initializer with the name.

      :param name: Initializer name.
      :param value: Value of the initializer. If value is None, then look up the value using
                    get_param_func; if get_param_func is None, then look up the value using the
                    pre-defined file name.
      :param reshape_str: A string to reshape the value.

      :returns: Return an initializer if a valid value is found. Otherwise, return None.


   .. py:method:: reshape_data(data: numpy.ndarray, reshape_str: str | None = None) -> numpy.ndarray


   .. py:method:: build_op(base_name: str, input_nodes: collections.abc.Sequence[OnnxNode], op_type: str, **kwargs) -> OnnxNode

      Builds an ONNX node.

      :param base_name: Base name of the operator. This is used to create the node and the
                        initializer.
      :param input_nodes: A list of input nodes.
      :param op_type: Name of the operator type.
      :param \*\*kwargs: Operator attributes.

      :returns: Created ONNX node.


   .. py:method:: build_conv(base_name: str, input_node: OnnxNode, is_fc: bool = True, **kwargs) -> OnnxNode

      Builds a convolution node.

      :param base_name: Base name of the operator. This is used to create the node and the
                        initializers.
      :param input_node: The input node of the convolution node.
      :param is_fc: Set True to indicate that the original operator is a fully-connected layer or a
                    matrix multiplication where the weight needs to be reshaped to build the
                    convolution.
      :param \*\*kwargs: Convolution attributes.

      :returns: Created convolution node.


   .. py:method:: build_split_and_concat(base_name: str, input_node: OnnxNode, num_splits: int, split_axis: int, concat_axis: int) -> OnnxNode

      Builds nodes for a split-and-concat operation.

      :param base_name: Base name of the operator. This is used to create the nodes.
      :param input_node: The input node of the split node.
      :param num_splits: Number of splits.
      :param split_axis: The axis for the input node to be split.
      :param concat_axis: The axis for the split nodes to be concatenated.

      :returns: Created split and concatenate nodes.


   .. py:method:: build_split_expand_concat(base_name: str, input_node: OnnxNode, num_splits: int, num_repeats: int, split_axis: int, concat_axis: int, concat_shape: tuple[int, Ellipsis]) -> OnnxNode

      Builds nodes for a split-expand-concat operation.

      This is to support Group Quary Attention (GQA), where the number of KV heads
      is less than the number of attention heads. The KV tensors out of a KV cache
      need to be repeated to match attention heads.

      :param base_name: Base name of the operator. This is used to create the nodes.
      :param input_node: The input node of the split node.
      :param num_splits: Number of splits.
      :param num_repeats: Number of repeats for each split.
      :param split_axis: The axis for the input node to be split.
      :param concat_axis: The axis for the split nodes to be concatenated.

      :returns: Created split, expand, and concatenate nodes.


   .. py:method:: build_layer_norm(base_name: str, input_node: OnnxNode, epsilon: float = 1e-05) -> OnnxNode

      Builds nodes for layer norm operation.

      :param base_name: Base name of the operator. This is used to create the nodes.
      :param input_node: The input node of the split node.

      :returns: Created layer norm nodes.


   .. py:method:: build_rms_norm(base_name: str, input_node: OnnxNode, epsilon: float, weight_offset: float) -> OnnxNode

      Builds nodes for RMS norm operation.

      :param base_name: Base name of the operator. This is used to create the nodes.
      :param input_node: The input node of the split node.
      :param epsilon: Epsilon of the RMS normalization.
      :param weight_offset: Offset to the weights.

      :returns: Created RMS norm nodes.


   .. py:method:: build_logit_softcapping(base_name: str, input_node: OnnxNode, scalar: float) -> OnnxNode

      Build nodes for logit soft capping.

      Logit soft capping is used in GEMMA2 to prevent overconfident predictions.

          softcapping(x) = scalar * tanh(x/scalar)
                         = scalar * [2*sigmoid(2x/scalar)-1]
                         = (2*scalar) * sigmoid(x * (2/scalar)) - scalar
          operations: x - mul - sigmoid - mul - sub


   .. py:method:: build_activation(base_name: str, input_node: OnnxNode, act_type: str) -> OnnxNode

      Build nodes for activation.

      LLAMA uses "silu" which uses sigmoid.
      GEMMA uses "gelu_pytorch_tanh" which uses Gaussian ERF with tanh approximation.

      Because tanh(x) = 2*sigmoid(2x)-1, gelu can also be approximated by sigmoid.
          gelu_tanh(x) = 0.5 * x * [1 + tanh(root(2/PI)*x*(1 + 0.044715 * x * x))]
                       = x * sigmoid(x*(A + B * x * x))
          where A = 2 * root(2/PI), B = A * 0.044715


   .. py:method:: build_matmul_and_split_heads(base_name: str, input_node: OnnxNode, num_heads: int, seq_len: int, kv_len: int | None = None, post_matmul_scale: float = 1.0) -> list[OnnxNode]


   .. py:method:: build_merge_heads_and_matmul(base_name: str, input_nodes: list[OnnxNode], num_heads: int) -> OnnxNode


   .. py:method:: build_attention(base_name: str, input_nodes: list[OnnxNode], num_heads: int, head_dim: int, seq_len: int, kv_len: int | None = None, skip_kv_projs_and_split_head: bool = False, mask_node: OnnxNode | None = None, output_kv_projs: bool = False) -> list[OnnxNode]


   .. py:method:: build_encoder_decoder_mlp(base_name: str, input_node: OnnxNode, act_type: str) -> OnnxNode