Introduction to LLiMa

Overview

The GenAI Model Compilation feature streamlines the process of compiling GenAI models based on HF safetensors or GGUF model format. For a wide set of different models like Llama, Gemma, Phi, Qwen or Mistral from Hugging Face, the SDK automatically generates all required binary/elf files along with the Python orchestration script, enabling direct execution on the Sima.ai Modalix platform.

To quickly get started, Sima has precompiled several popular LLM models and published them on Hugging Face. You can download and run these models immediately using the following commands:

modalix:~$ cd /media/nvme && mkdir llima && cd llima
modalix:~$ sima-cli install -v 2.0.0 samples/llima -t select

Wait until installation completes then run:

modalix:~$ cd simaai-genai-demo && ./run.sh

This command prompts you to select and download a specific precompiled model for evaluating the Sima.ai Modalix platform. More information can be found in the LLiMa demo application.

Supported Models

The following table shows the supported model architectures and their capabilities:

Model Architecture	Type	Supported Sizes
Llama 2	LLM	7b
Llama 3.1	LLM	8b
Llama 3.2	LLM	1b, 3b
Gemma 1	LLM	2b, 7b
Gemma 2	LLM	2b, 9b
Gemma 3	LLM	1b, 4b
Phi 3.5 mini	LLM	3.8b
Qwen 2.5	LLM	0.5b, 1.5b, 3b, 7b
Qwen 3	LLM	0.6b, 1.7b, 4b, 8b
Mistral 1	LLM	7b
Llava 1.5	VLM	7b
PaliGemma	VLM	3b
Gemma 3	VLM	4b

Limitations

Limitation Type	Description
Model Architecture	Only models based on the architectures listed above are supported.
Model Parameters	Only models with parameter count less than 10B are supported.
HF Models	Models must be downloaded from Hugging Face and contain: `config.json`, `tokenizer.json`, `tokenizer_config.json`, `generation_config.json` and weights in safetensors format
GGUF Models	GGUF format is supported for LLMs only. VLMs must be compiled from the Hugging Face safetensors format.
Gemma3 VLM	Supported with modfied SigLip 448 vision encoder
LLAMA 3.2 Vision	Vision models are not supported