sima_utils.transformer.prompt_template
Attributes
Classes
Prompt formatter for VLM. |
|
LLAMA 2 prompt formatter. |
|
LLAMA 3 prompt formatter. |
|
LLAVA prompt formatter. |
|
PaliGemma prompt formatter. |
|
GEMMA prompt formatter. |
|
PHI 3 prompt formatter. |
Functions
|
Combine text and vision embedding tensors. |
Module Contents
- sima_utils.transformer.prompt_template.DEFAULT_IMAGE_PLACEHOLDER_TOKEN_ID = -200
- sima_utils.transformer.prompt_template.VLM_PROMPT_TEMPLATE
- sima_utils.transformer.prompt_template.multimodal_concat(text_token_ids: numpy.ndarray, vision_proj: numpy.ndarray, embed_weight: numpy.ndarray) numpy.ndarray
Combine text and vision embedding tensors.
- Parameters:
text_token_ids (np.ndarray) – The text token ids with image placeholder.
vision_proj (np.ndarray) – The vision projection tensor.
embed_weight (np.ndarray) – The embedding weight matrix.
- Returns:
A tensor with combined text and vision embeddings.
- class sima_utils.transformer.prompt_template.PromptFormatter(vlm_arch: str, system_message: str | None = None)
Prompt formatter for VLM.
- vlm_arch: str
- system_message: str | None
- image_placeholder_id: int
- set_system_message(msg: str)
- formatted_prompt(query: str, has_image: bool = False) list[str]
Format a query according to the prompt template.
- Parameters:
query – A text part of a user query.
has_image – Whether the prompt includes a image placeholder token.
- Returns:
The formatted query as a list of strings.
- tokenize_prompt(tokenizer: sima_utils.transformer.llm_tokenizer.LlmTokenizer, messages: list) numpy.ndarray
Tokenize a formatted prompt.
- Parameters:
tokenizer (LlmTokenizer) – The object of tokenizer needed.
messages (list) – The list of formatted query messages.
- Returns:
An array of token ids with possible image placeholders. The first dimension indicates batch, the number of queries.
- class sima_utils.transformer.prompt_template.Llama2PromptFormatter(vlm_arch: str, system_message: str | None = None)
LLAMA 2 prompt formatter.
- B_INST: str = '[INST]'
- E_INST: str = '[/INST]'
- B_SYS: str = Multiline-String
Show Value
"""<<SYS>> """
- E_SYS: str = Multiline-String
Show Value
""" <</SYS>> """
- formatted_prompt(query: str, has_image: bool = False) list[str]
Format a query according to the prompt template.
- Parameters:
query – A text part of a user query.
has_image – Whether the prompt includes a image placeholder token.
- Returns:
The formatted query as a list of strings.
- tokenize_prompt(tokenizer: sima_utils.transformer.llm_tokenizer.LlmTokenizer, messages: list[list[str]]) numpy.ndarray
Tokenize a formatted prompt for LLAMA2.
- class sima_utils.transformer.prompt_template.Llama3PromptFormatter(vlm_arch: str, system_message: str | None = None)
LLAMA 3 prompt formatter.
- tokenize_prompt(tokenizer: sima_utils.transformer.llm_tokenizer.LlmTokenizer, messages: list[list[str]]) numpy.ndarray
Tokenize a formatted prompt for LLAMA3.
- class sima_utils.transformer.prompt_template.LlavaPromptFormatter(vlm_arch: str, system_message: str | None = None)
LLAVA prompt formatter.
- tokenize_prompt(tokenizer: sima_utils.transformer.llm_tokenizer.LlmTokenizer, messages: list[list[str]]) numpy.ndarray
Tokenize a formatted prompt for LLAVA.
- class sima_utils.transformer.prompt_template.PaliGemmaPromptFormatter(vlm_arch: str, system_message: str | None = None)
PaliGemma prompt formatter.
- tokenize_prompt(tokenizer: sima_utils.transformer.llm_tokenizer.LlmTokenizer, messages: list[list[str]]) numpy.ndarray
Tokenize a formatted prompt for PaliGemma.
Template: <image> + <bos> + <prompt> + <
>
- class sima_utils.transformer.prompt_template.GemmaPromptFormatter(vlm_arch: str, system_message: str | None = None)
GEMMA prompt formatter.
- boi_id: int
- eoi_id: int
- tokenize_prompt(tokenizer: sima_utils.transformer.llm_tokenizer.LlmTokenizer, messages: list[list[str]]) numpy.ndarray
Tokenize a formatted prompt for GEMMA.
- class sima_utils.transformer.prompt_template.Phi3PromptFormatter(vlm_arch: str, system_message: str | None = None)
PHI 3 prompt formatter.
Chat format (https://huggingface.co/microsoft/Phi-3.5-mini-instruct) <|system|> You are a helpful assistant.<|end|> <|user|> How to explain Internet for a medieval knight?<|end|> <|assistant|>
- B_SYS: str = '<|system|>'
- B_USER: str = '<|user|>'
- B_ASSISTANT: str = '<|assistant|>'
- END: str = '<|end|>'
- formatted_prompt(query: str, has_image: bool = False) list[str]
Format a query according to the prompt template.
- Parameters:
query – A text part of a user query.
has_image – Whether the prompt includes a image placeholder token.
- Returns:
The formatted query as a list of strings.
- tokenize_prompt(tokenizer: sima_utils.transformer.llm_tokenizer.LlmTokenizer, messages: list[list[str]]) numpy.ndarray
Tokenize a formatted prompt for PHI3.
- sima_utils.transformer.prompt_template.arch