GenAI Model Compilation
Introduction
The GenAI Model Compilation feature streamlines the process of compiling GenAI models.
For a select set of Llama
, Llava
, Gemma
, and PaliGemma
models from Hugging Face,
the SDK automatically generates all required .elf
files along with the Python orchestration
script, enabling direct execution on the Sima.ai Modalix platform.
Sima has precompiled several popular LLM models and published them on Hugging Face. The developer can download these models using the following commands and explore them in the LLiMa demo application.
modalix:~$ cd /media/nvme && mkdir llima && cd llima
modalix:~$ sima-cli install -v 1.7.0 samples/llima -t select
Wait until installation completes then run:
modalix:~$ cd simaai-genai-demo && ./run.sh
This command prompts the developer to select and download a specific precompiled model for evaluating the Sima.ai Modalix platform. If the developer would like to compile and deploy a custom model on the Modalix platform, please continue reading.
Supported Models
The following table shows the supported model architectures and their capabilities:
Model Architecture |
Type |
Supported Versions |
---|---|---|
LLAVA |
Multimodal (Vision + Language) |
1, 2 |
PaliGEMMA |
Multimodal (Vision + Language) |
1, 2 |
LLAMA |
Language Only |
2, 3 |
GEMMA |
Language Only |
1, 2, 3 |
Limitations
Limitation Type |
Description |
---|---|
Model Configuration |
Only default configurations are supported |
Model Parameters |
Only models with parameter count less than 10B are supported. |
Model Files |
Models must be downloaded from Hugging Face and contain: |
Gemma3 VLM |
Supported as language-only models (vision capabilities disabled) |
LLAMA 3.2 Vision |
Vision models are not supported |
System Requirements
Palette SDK needs to be installed on a machine that matches the following requirements.
Parameter |
Description |
---|---|
Operating System |
Ubuntu 22.04 LTS |
Memory |
128GB or more is recommended. |
Storage |
1TB available space |
Note
With 128GB machine, compilation can take several hours to complete depends on the type of model. 64GB may work for models that do not have vision capabilities.
Prerequisites
Ensure that the latest sima-cli version is installed in the Palette SDK.
Have a valid Developer Portal account to download assets from docs.sima.ai.
Have a valid Hugging Face account to download open-source models.
Some models, such as
google/paligemma
, require accepting a license agreement on Hugging Face. Make sure to review and accept the license before attempting to download these models.Authorize the CLI to access Hugging Face using an user access token and
Hugging Face-cli
. Note, installingsima-cli
automatically installsHugging Face-cli
.
Sample Code
Download the sample and the google/paligemma
model with the following commands in the Palette SDK:
sima-user@docker-image-id:~$ cd /home/docker/sima-cli && mkdir genai && cd genai
sima-user@docker-image-id:~$ sima-cli install -v 1.7.0 samples/vlm-codegen
The sima_utils.transformer.model()
package provides everything needed to take open-source models (from Hugging Face) and run them efficiently on the SiMa.ai Modalix platform.
The sample script tut_auto_llm.py demonstrates a complete workflow for compiling and evaluating a Vision Language Model
. It covers:
Loading a Hugging Face model with
from_hf_cache()
.Generating deployment artifacts using
gen_files()
with both default and custom precision settings (FileGenPrecision
).Running inference with
evaluate()
across multiple backends (Hugging Face, ONNX, or SDK).
This workflow allows developers to go from a cached Hugging Face model to SiMa-ready binaries and validate correctness before deploying to the DevKit.
After a successful compilation, the console will display output confirming that the model artifacts have been generated.
sima-user@docker-image-id:/home/docker/sima-ai/genai$ python tut_auto_llm.py
Calibration Progress: |██████████████████████████████| 100.0% 1|1 Complete. 1/1
Running Calibration ...DONE
Running quantization ...DONE