sima_utils.transformer

Submodules

Classes

VlmArchType

VLM architecture type.

VlmConfig

Configuration of Vision Language Model.

VlmHelper

VLM helper class with processors.

Package Contents

class sima_utils.transformer.VlmArchType

VLM architecture type.

VLM_LLAVA = 'vlm-llava'
VLM_PALIGEMMA = 'vlm-paligemma'
VLM_GEMMA3 = 'vlm-gemma3'
VLM_CUSTOM = 'vlm-custom'
LLM_LLAMA2 = 'llm-llama2'
LLM_LLAMA3_1 = 'llm-llama3.1'
LLM_LLAMA3_2 = 'llm-llama3.2'
LLM_GEMMA1 = 'llm-gemma1'
LLM_GEMMA2 = 'llm-gemma2'
LLM_GEMMA3 = 'llm-gemma3'
LLM_PHI3_5 = 'llm-phi3.5'
class sima_utils.transformer.VlmConfig

Configuration of Vision Language Model.

model_name

The name of the model.

Type:

str

model_type

The type of the model.

Type:

str

vm_cfg

The settings of vision model.

Type:

VisionModelConfig | None

mm_cfg

The settings of multi-modal connection.

Type:

MMConnectionConfig | None

lm_cfg

The settings of language model.

Type:

LanguageModelConfig

pipeline_cfg

The settings of application pipeline.

Type:

PipelineConfig

model_name: str = ''
model_type: VlmArchType | None = None
vm_cfg: VisionModelConfig | None = None
mm_cfg: MMConnectionConfig | None = None
lm_cfg: LanguageModelConfig
pipeline_cfg: PipelineConfig
static load(vlm_cfg: dict) VlmConfig
set_default_config(dtype: LlmDataType, vm_arch: VisionArchType | None, lm_arch: LlmArchType, gen: LlmArchVersion, b_size: str)
set_tokenizer_path(tokenizer_path: pathlib.Path)
static from_hf_config(model_path: pathlib.Path, model_cfg: dict) VlmConfig
Generate SiMa’s configuration for VLM

from a HuggingFace config dict and MLA constraints.

Parameters:
  • model_path – The path of the source model.

  • model_cfg – The config dict of the source model.

Returns:

VlmConfig for the model.

property is_multimodal
update_special_tokens(cfg: dict)
update_vision_model_params(cfg: dict)
update_mm_connection_params(cfg: dict)
update_language_model_params(cfg: dict)
config_pipeline(system_prompt: str | None, max_num_tokens: int, tokenizer: sima_utils.transformer.llm_tokenizer.LlmTokenizer, estimated_max_num_query_tokens: int = 100)
class sima_utils.transformer.VlmHelper(vlm_cfg: VlmConfig, system_prompt: str | None = None)

VLM helper class with processors.

tokenizer: sima_utils.transformer.llm_tokenizer.LlmTokenizer
prompt_formatter: sima_utils.transformer.prompt_template.PromptFormatter
image_preprocessor: sima_utils.transformer.vision_preprocessor.ImageProcessor | None
preprocess(query: str, image: pathlib.Path | str | numpy.ndarray | None) tuple[str, numpy.ndarray, numpy.ndarray | None]

Preprocess the input query and the image.

Parameters:
  • query – Input query string.

  • image – Path to the image or the loaded image in numpy array. Set to None if no image.

Returns:

Tuple of formatted prompt, tokenized input query and preprocessed image.

postprocess(output_tokens: numpy.ndarray | list[int]) str