sima_utils.transformer.model.language_cache_model
Classes
Base implementation for the cache model of the language model. |
Module Contents
- class sima_utils.transformer.model.language_cache_model.LanguageCacheModel
Base implementation for the cache model of the language model.
With support for Sliding Window Attention, a cache model has two flavors, depending on layer index: global cache or local cache. Because the cache is managed outside the cache model, the difference is reflected by input shapes of K and V tensors.
- num_tokens
Number of tokens. Set to a value greater than 1 to consume multiple input tokens in one model.
- token_idx
Token index.
- logit_softcapping
Attention logit soft capping for gemma 2.
- num_tokens: int
- token_idx: int
- logit_softcapping: float | None
- gen_onnx_files()