sima_utils.transformer ====================== .. py:module:: sima_utils.transformer Submodules ---------- .. toctree:: :maxdepth: 1 /pages/api_reference/python-autoapi/sima_utils/transformer/default_llm_config/index /pages/api_reference/python-autoapi/sima_utils/transformer/default_vision_config/index /pages/api_reference/python-autoapi/sima_utils/transformer/devkit/index /pages/api_reference/python-autoapi/sima_utils/transformer/gguf_conversion/index /pages/api_reference/python-autoapi/sima_utils/transformer/hf_transformer/index /pages/api_reference/python-autoapi/sima_utils/transformer/llm_tokenizer/index /pages/api_reference/python-autoapi/sima_utils/transformer/model/index /pages/api_reference/python-autoapi/sima_utils/transformer/onnx_builder/index /pages/api_reference/python-autoapi/sima_utils/transformer/preproc/index /pages/api_reference/python-autoapi/sima_utils/transformer/prompt_template/index /pages/api_reference/python-autoapi/sima_utils/transformer/tokenizer/index /pages/api_reference/python-autoapi/sima_utils/transformer/utils/index /pages/api_reference/python-autoapi/sima_utils/transformer/vision_preprocessor/index /pages/api_reference/python-autoapi/sima_utils/transformer/vlm_config/index /pages/api_reference/python-autoapi/sima_utils/transformer/whisper_config/index Classes ------- .. autoapisummary:: sima_utils.transformer.VlmArchType sima_utils.transformer.VlmConfig sima_utils.transformer.VlmHelper Package Contents ---------------- .. py:class:: VlmArchType VLM architecture type. .. py:attribute:: VLM_LLAVA :value: 'vlm-llava' .. py:attribute:: VLM_PALIGEMMA :value: 'vlm-paligemma' .. py:attribute:: VLM_GEMMA3 :value: 'vlm-gemma3' .. py:attribute:: VLM_CUSTOM :value: 'vlm-custom' .. py:attribute:: LLM_LLAMA2 :value: 'llm-llama2' .. py:attribute:: LLM_LLAMA3_1 :value: 'llm-llama3.1' .. py:attribute:: LLM_LLAMA3_2 :value: 'llm-llama3.2' .. py:attribute:: LLM_GEMMA1 :value: 'llm-gemma1' .. py:attribute:: LLM_GEMMA2 :value: 'llm-gemma2' .. py:attribute:: LLM_GEMMA3 :value: 'llm-gemma3' .. py:attribute:: LLM_PHI3_5 :value: 'llm-phi3.5' .. py:class:: VlmConfig Configuration of Vision Language Model. .. attribute:: model_name The name of the model. :type: str .. attribute:: model_type The type of the model. :type: str .. attribute:: vm_cfg The settings of vision model. :type: VisionModelConfig | None .. attribute:: mm_cfg The settings of multi-modal connection. :type: MMConnectionConfig | None .. attribute:: lm_cfg The settings of language model. :type: LanguageModelConfig .. attribute:: pipeline_cfg The settings of application pipeline. :type: PipelineConfig .. py:attribute:: model_name :type: str :value: '' .. py:attribute:: model_type :type: VlmArchType | None :value: None .. py:attribute:: vm_cfg :type: VisionModelConfig | None :value: None .. py:attribute:: mm_cfg :type: MMConnectionConfig | None :value: None .. py:attribute:: lm_cfg :type: LanguageModelConfig .. py:attribute:: pipeline_cfg :type: PipelineConfig .. py:method:: load(vlm_cfg: dict) -> VlmConfig :staticmethod: .. py:method:: set_default_config(dtype: LlmDataType, vm_arch: VisionArchType | None, lm_arch: LlmArchType, gen: LlmArchVersion, b_size: str) .. py:method:: set_tokenizer_path(tokenizer_path: pathlib.Path) .. py:method:: from_hf_config(model_path: pathlib.Path, model_cfg: dict) -> VlmConfig :staticmethod: Generate SiMa's configuration for VLM from a HuggingFace config dict and MLA constraints. :param model_path: The path of the source model. :param model_cfg: The config dict of the source model. :returns: VlmConfig for the model. .. py:property:: is_multimodal .. py:method:: update_special_tokens(cfg: dict) .. py:method:: update_vision_model_params(cfg: dict) .. py:method:: update_mm_connection_params(cfg: dict) .. py:method:: update_language_model_params(cfg: dict) .. py:method:: config_pipeline(system_prompt: str | None, max_num_tokens: int, tokenizer: sima_utils.transformer.llm_tokenizer.LlmTokenizer, estimated_max_num_query_tokens: int = 100) .. py:class:: VlmHelper(vlm_cfg: VlmConfig, system_prompt: str | None = None) VLM helper class with processors. .. py:attribute:: tokenizer :type: sima_utils.transformer.llm_tokenizer.LlmTokenizer .. py:attribute:: prompt_formatter :type: sima_utils.transformer.prompt_template.PromptFormatter .. py:attribute:: image_preprocessor :type: sima_utils.transformer.vision_preprocessor.ImageProcessor | None .. py:method:: preprocess(query: str, image: pathlib.Path | str | numpy.ndarray | None) -> tuple[str, numpy.ndarray, numpy.ndarray | None] Preprocess the input query and the image. :param query: Input query string. :param image: Path to the image or the loaded image in numpy array. Set to None if no image. :returns: Tuple of formatted prompt, tokenized input query and preprocessed image. .. py:method:: postprocess(output_tokens: numpy.ndarray | list[int]) -> str