sima_utils.transformer.onnx_builder =================================== .. py:module:: sima_utils.transformer.onnx_builder Attributes ---------- .. autoapisummary:: sima_utils.transformer.onnx_builder.OnnxNode Classes ------- .. autoapisummary:: sima_utils.transformer.onnx_builder.OnnxBuilder Module Contents --------------- .. py:data:: OnnxNode .. py:class:: OnnxBuilder Helper class to build onnx model. .. attribute:: IR_VERSION IR version of the onnx model. .. attribute:: OPSET_ID Operator set id of the onnx model. .. attribute:: onnx_file_name File name of the onnx file. .. attribute:: get_param_func A function that returns a parameter tensor with the provided parameter name. .. attribute:: check_param_func A function that checks if a parameter tensor with the provided parameter name exists. .. attribute:: input_nodes Input nodes of the onnx model. .. attribute:: output_nodes Output nodes of the onnx model. .. attribute:: _initializer_map A mapping from a name to an onnx initializer. .. attribute:: _node_map A mapping from a name to an onnx node. .. py:attribute:: IR_VERSION :type: ClassVar[int] :value: 8 .. py:attribute:: OPSET_ID :type: ClassVar[int] :value: 17 .. py:attribute:: onnx_file_name :type: pathlib.Path .. py:attribute:: get_param_func :type: collections.abc.Callable[[str], numpy.ndarray] | None :value: None .. py:attribute:: check_param_func :type: collections.abc.Callable[[str], bool] | None :value: None .. py:attribute:: input_nodes :type: list[onnx.ValueInfoProto] :value: [] .. py:attribute:: output_nodes :type: list[onnx.ValueInfoProto] :value: [] .. py:method:: create_and_save_model(do_simplify: bool = True) Creates and saves the model. :param do_simplify: Set true to simplify the created onnx graph. .. py:method:: create_model() -> onnx.ModelProto Creates the model and performs shape inference. .. py:method:: save_model(model: onnx.ModelProto) Saves the model to a file. :param model: The onnx model to be saved. .. py:method:: create_input_node(name: str, shape: collections.abc.Sequence[int], dtype: type = np.float32) Creates an input node with the provided name, shape and data type. .. py:method:: create_output_node(name: str, shape: collections.abc.Sequence[int], dtype: type = np.float32) Creates an output node with the provided name, shape and data type. .. py:method:: get_node_output_names(node: OnnxNode) -> list[str] Gets a list of the output names of the given node. .. py:method:: get_node_output_name(node: OnnxNode) -> str Gets the output name of the given node with only one output. .. py:method:: create_initializer(name: str, value: int | float | numpy.ndarray | None = None, reshape_str: str | None = None) -> OnnxNode | None Creates an initializer with the name. :param name: Initializer name. :param value: Value of the initializer. If value is None, then look up the value using get_param_func; if get_param_func is None, then look up the value using the pre-defined file name. :param reshape_str: A string to reshape the value. :returns: Return an initializer if a valid value is found. Otherwise, return None. .. py:method:: reshape_data(data: numpy.ndarray, reshape_str: str | None = None) -> numpy.ndarray .. py:method:: build_op(base_name: str, input_nodes: collections.abc.Sequence[OnnxNode], op_type: str, **kwargs) -> OnnxNode Builds an ONNX node. :param base_name: Base name of the operator. This is used to create the node and the initializer. :param input_nodes: A list of input nodes. :param op_type: Name of the operator type. :param \*\*kwargs: Operator attributes. :returns: Created ONNX node. .. py:method:: build_conv(base_name: str, input_node: OnnxNode, is_fc: bool = True, **kwargs) -> OnnxNode Builds a convolution node. :param base_name: Base name of the operator. This is used to create the node and the initializers. :param input_node: The input node of the convolution node. :param is_fc: Set True to indicate that the original operator is a fully-connected layer or a matrix multiplication where the weight needs to be reshaped to build the convolution. :param \*\*kwargs: Convolution attributes. :returns: Created convolution node. .. py:method:: build_split_and_concat(base_name: str, input_node: OnnxNode, num_splits: int, split_axis: int, concat_axis: int) -> OnnxNode Builds nodes for a split-and-concat operation. :param base_name: Base name of the operator. This is used to create the nodes. :param input_node: The input node of the split node. :param num_splits: Number of splits. :param split_axis: The axis for the input node to be split. :param concat_axis: The axis for the split nodes to be concatenated. :returns: Created split and concatenate nodes. .. py:method:: build_split_expand_concat(base_name: str, input_node: OnnxNode, num_splits: int, num_repeats: int, split_axis: int, concat_axis: int, concat_shape: tuple[int, Ellipsis]) -> OnnxNode Builds nodes for a split-expand-concat operation. This is to support Group Quary Attention (GQA), where the number of KV heads is less than the number of attention heads. The KV tensors out of a KV cache need to be repeated to match attention heads. :param base_name: Base name of the operator. This is used to create the nodes. :param input_node: The input node of the split node. :param num_splits: Number of splits. :param num_repeats: Number of repeats for each split. :param split_axis: The axis for the input node to be split. :param concat_axis: The axis for the split nodes to be concatenated. :returns: Created split, expand, and concatenate nodes. .. py:method:: build_layer_norm(base_name: str, input_node: OnnxNode, epsilon: float = 1e-05) -> OnnxNode Builds nodes for layer norm operation. :param base_name: Base name of the operator. This is used to create the nodes. :param input_node: The input node of the split node. :returns: Created layer norm nodes. .. py:method:: build_rms_norm(base_name: str, input_node: OnnxNode, epsilon: float, weight_offset: float) -> OnnxNode Builds nodes for RMS norm operation. :param base_name: Base name of the operator. This is used to create the nodes. :param input_node: The input node of the split node. :param epsilon: Epsilon of the RMS normalization. :param weight_offset: Offset to the weights. :returns: Created RMS norm nodes. .. py:method:: build_logit_softcapping(base_name: str, input_node: OnnxNode, scalar: float) -> OnnxNode Build nodes for logit soft capping. Logit soft capping is used in GEMMA2 to prevent overconfident predictions. softcapping(x) = scalar * tanh(x/scalar) = scalar * [2*sigmoid(2x/scalar)-1] = (2*scalar) * sigmoid(x * (2/scalar)) - scalar operations: x - mul - sigmoid - mul - sub .. py:method:: build_activation(base_name: str, input_node: OnnxNode, act_type: str) -> OnnxNode Build nodes for activation. LLAMA uses "silu" which uses sigmoid. GEMMA uses "gelu_pytorch_tanh" which uses Gaussian ERF with tanh approximation. Because tanh(x) = 2*sigmoid(2x)-1, gelu can also be approximated by sigmoid. gelu_tanh(x) = 0.5 * x * [1 + tanh(root(2/PI)*x*(1 + 0.044715 * x * x))] = x * sigmoid(x*(A + B * x * x)) where A = 2 * root(2/PI), B = A * 0.044715 .. py:method:: build_matmul_and_split_heads(base_name: str, input_node: OnnxNode, num_heads: int, seq_len: int, kv_len: int | None = None, post_matmul_scale: float = 1.0) -> list[OnnxNode] .. py:method:: build_merge_heads_and_matmul(base_name: str, input_nodes: list[OnnxNode], num_heads: int) -> OnnxNode .. py:method:: build_attention(base_name: str, input_nodes: list[OnnxNode], num_heads: int, head_dim: int, seq_len: int, kv_len: int | None = None, skip_kv_projs_and_split_head: bool = False, mask_node: OnnxNode | None = None, output_kv_projs: bool = False) -> list[OnnxNode] .. py:method:: build_encoder_decoder_mlp(base_name: str, input_node: OnnxNode, act_type: str) -> OnnxNode