sima_utils.transformer
======================

.. py:module:: sima_utils.transformer


Submodules
----------

.. toctree::
   :maxdepth: 1

   /pages/api_reference/python-autoapi/sima_utils/transformer/default_llm_config/index
   /pages/api_reference/python-autoapi/sima_utils/transformer/default_vision_config/index
   /pages/api_reference/python-autoapi/sima_utils/transformer/devkit/index
   /pages/api_reference/python-autoapi/sima_utils/transformer/gguf_conversion/index
   /pages/api_reference/python-autoapi/sima_utils/transformer/hf_transformer/index
   /pages/api_reference/python-autoapi/sima_utils/transformer/llm_tokenizer/index
   /pages/api_reference/python-autoapi/sima_utils/transformer/model/index
   /pages/api_reference/python-autoapi/sima_utils/transformer/onnx_builder/index
   /pages/api_reference/python-autoapi/sima_utils/transformer/preproc/index
   /pages/api_reference/python-autoapi/sima_utils/transformer/prompt_template/index
   /pages/api_reference/python-autoapi/sima_utils/transformer/tokenizer/index
   /pages/api_reference/python-autoapi/sima_utils/transformer/utils/index
   /pages/api_reference/python-autoapi/sima_utils/transformer/vision_preprocessor/index
   /pages/api_reference/python-autoapi/sima_utils/transformer/vlm_config/index
   /pages/api_reference/python-autoapi/sima_utils/transformer/whisper_config/index


Classes
-------

.. autoapisummary::

   sima_utils.transformer.VlmArchType
   sima_utils.transformer.VlmConfig
   sima_utils.transformer.VlmHelper


Package Contents
----------------

.. py:class:: VlmArchType


   VLM architecture type.


   .. py:attribute:: VLM_LLAVA
      :value: 'vlm-llava'


   .. py:attribute:: VLM_PALIGEMMA
      :value: 'vlm-paligemma'


   .. py:attribute:: VLM_GEMMA3
      :value: 'vlm-gemma3'


   .. py:attribute:: VLM_CUSTOM
      :value: 'vlm-custom'


   .. py:attribute:: LLM_LLAMA2
      :value: 'llm-llama2'


   .. py:attribute:: LLM_LLAMA3_1
      :value: 'llm-llama3.1'


   .. py:attribute:: LLM_LLAMA3_2
      :value: 'llm-llama3.2'


   .. py:attribute:: LLM_GEMMA1
      :value: 'llm-gemma1'


   .. py:attribute:: LLM_GEMMA2
      :value: 'llm-gemma2'


   .. py:attribute:: LLM_GEMMA3
      :value: 'llm-gemma3'


   .. py:attribute:: LLM_PHI3_5
      :value: 'llm-phi3.5'


.. py:class:: VlmConfig


   Configuration of Vision Language Model.

   .. attribute:: model_name

      The name of the model.

      :type: str

   .. attribute:: model_type

      The type of the model.

      :type: str

   .. attribute:: vm_cfg

      The settings of vision model.

      :type: VisionModelConfig | None

   .. attribute:: mm_cfg

      The settings of multi-modal connection.

      :type: MMConnectionConfig | None

   .. attribute:: lm_cfg

      The settings of language model.

      :type: LanguageModelConfig

   .. attribute:: pipeline_cfg

      The settings of application pipeline.

      :type: PipelineConfig


   .. py:attribute:: model_name
      :type:  str
      :value: ''


   .. py:attribute:: model_type
      :type:  VlmArchType | None
      :value: None


   .. py:attribute:: vm_cfg
      :type:  VisionModelConfig | None
      :value: None


   .. py:attribute:: mm_cfg
      :type:  MMConnectionConfig | None
      :value: None


   .. py:attribute:: lm_cfg
      :type:  LanguageModelConfig


   .. py:attribute:: pipeline_cfg
      :type:  PipelineConfig


   .. py:method:: load(vlm_cfg: dict) -> VlmConfig
      :staticmethod:


   .. py:method:: set_default_config(dtype: LlmDataType, vm_arch: VisionArchType | None, lm_arch: LlmArchType, gen: LlmArchVersion, b_size: str)


   .. py:method:: set_tokenizer_path(tokenizer_path: pathlib.Path)


   .. py:method:: from_hf_config(model_path: pathlib.Path, model_cfg: dict) -> VlmConfig
      :staticmethod:


      Generate SiMa's configuration for VLM
          from a HuggingFace config dict and MLA constraints.

      :param model_path: The path of the source model.
      :param model_cfg: The config dict of the source model.

      :returns: VlmConfig for the model.


   .. py:property:: is_multimodal


   .. py:method:: update_special_tokens(cfg: dict)


   .. py:method:: update_vision_model_params(cfg: dict)


   .. py:method:: update_mm_connection_params(cfg: dict)


   .. py:method:: update_language_model_params(cfg: dict)


   .. py:method:: config_pipeline(system_prompt: str | None, max_num_tokens: int, tokenizer: sima_utils.transformer.llm_tokenizer.LlmTokenizer, estimated_max_num_query_tokens: int = 100)


.. py:class:: VlmHelper(vlm_cfg: VlmConfig, system_prompt: str | None = None)

   VLM helper class with processors.


   .. py:attribute:: tokenizer
      :type:  sima_utils.transformer.llm_tokenizer.LlmTokenizer


   .. py:attribute:: prompt_formatter
      :type:  sima_utils.transformer.prompt_template.PromptFormatter


   .. py:attribute:: image_preprocessor
      :type:  sima_utils.transformer.vision_preprocessor.ImageProcessor | None


   .. py:method:: preprocess(query: str, image: pathlib.Path | str | numpy.ndarray | None) -> tuple[str, numpy.ndarray, numpy.ndarray | None]

      Preprocess the input query and the image.

      :param query: Input query string.
      :param image: Path to the image or the loaded image in numpy array. Set to None if no image.

      :returns: Tuple of formatted prompt, tokenized input query and preprocessed image.


   .. py:method:: postprocess(output_tokens: numpy.ndarray | list[int]) -> str