sima_utils.transformer
Submodules
- sima_utils.transformer.default_llm_config
- sima_utils.transformer.default_vision_config
- sima_utils.transformer.devkit
- sima_utils.transformer.gguf_conversion
- sima_utils.transformer.hf_transformer
- sima_utils.transformer.llm_tokenizer
- sima_utils.transformer.model
- sima_utils.transformer.onnx_builder
- sima_utils.transformer.preproc
- sima_utils.transformer.prompt_template
- sima_utils.transformer.tokenizer
- sima_utils.transformer.utils
- sima_utils.transformer.vision_preprocessor
- sima_utils.transformer.vlm_config
- sima_utils.transformer.whisper_config
Classes
VLM architecture type. |
|
Configuration of Vision Language Model. |
|
VLM helper class with processors. |
Package Contents
- class sima_utils.transformer.VlmArchType
VLM architecture type.
- VLM_LLAVA = 'vlm-llava'
- VLM_PALIGEMMA = 'vlm-paligemma'
- VLM_GEMMA3 = 'vlm-gemma3'
- VLM_CUSTOM = 'vlm-custom'
- LLM_LLAMA2 = 'llm-llama2'
- LLM_LLAMA3_1 = 'llm-llama3.1'
- LLM_LLAMA3_2 = 'llm-llama3.2'
- LLM_GEMMA1 = 'llm-gemma1'
- LLM_GEMMA2 = 'llm-gemma2'
- LLM_GEMMA3 = 'llm-gemma3'
- LLM_PHI3_5 = 'llm-phi3.5'
- class sima_utils.transformer.VlmConfig
Configuration of Vision Language Model.
- model_name
The name of the model.
- Type:
str
- model_type
The type of the model.
- Type:
str
- vm_cfg
The settings of vision model.
- Type:
VisionModelConfig | None
- mm_cfg
The settings of multi-modal connection.
- Type:
MMConnectionConfig | None
- lm_cfg
The settings of language model.
- Type:
- pipeline_cfg
The settings of application pipeline.
- Type:
- model_name: str = ''
- model_type: VlmArchType | None = None
- vm_cfg: VisionModelConfig | None = None
- mm_cfg: MMConnectionConfig | None = None
- lm_cfg: LanguageModelConfig
- pipeline_cfg: PipelineConfig
- set_default_config(dtype: LlmDataType, vm_arch: VisionArchType | None, lm_arch: LlmArchType, gen: LlmArchVersion, b_size: str)
- set_tokenizer_path(tokenizer_path: pathlib.Path)
- static from_hf_config(model_path: pathlib.Path, model_cfg: dict) VlmConfig
- Generate SiMa’s configuration for VLM
from a HuggingFace config dict and MLA constraints.
- Parameters:
model_path – The path of the source model.
model_cfg – The config dict of the source model.
- Returns:
VlmConfig for the model.
- property is_multimodal
- update_special_tokens(cfg: dict)
- update_vision_model_params(cfg: dict)
- update_mm_connection_params(cfg: dict)
- update_language_model_params(cfg: dict)
- config_pipeline(system_prompt: str | None, max_num_tokens: int, tokenizer: sima_utils.transformer.llm_tokenizer.LlmTokenizer, estimated_max_num_query_tokens: int = 100)
- class sima_utils.transformer.VlmHelper(vlm_cfg: VlmConfig, system_prompt: str | None = None)
VLM helper class with processors.
- prompt_formatter: sima_utils.transformer.prompt_template.PromptFormatter
- image_preprocessor: sima_utils.transformer.vision_preprocessor.ImageProcessor | None
- preprocess(query: str, image: pathlib.Path | str | numpy.ndarray | None) tuple[str, numpy.ndarray, numpy.ndarray | None]
Preprocess the input query and the image.
- Parameters:
query – Input query string.
image – Path to the image or the loaded image in numpy array. Set to None if no image.
- Returns:
Tuple of formatted prompt, tokenized input query and preprocessed image.
- postprocess(output_tokens: numpy.ndarray | list[int]) str