GenAI Model Compilation

Introduction

The GenAI Model Compilation feature streamlines the process of compiling GenAI models. For a select set of Llama, Llava, Gemma, and PaliGemma models from Hugging Face, the SDK automatically generates all required .elf files along with the Python orchestration script, enabling direct execution on the Sima.ai Modalix platform.

Sima has precompiled several popular LLM models and published them on Hugging Face. The developer can download these models using the following commands and explore them in the LLiMa demo application.

modalix:~$ cd /media/nvme && mkdir llima && cd llima
modalix:~$ sima-cli install -v 1.7.0 samples/llima -t select

Wait until installation completes then run:

modalix:~$ cd simaai-genai-demo && ./run.sh

This command prompts the developer to select and download a specific precompiled model for evaluating the Sima.ai Modalix platform. If the developer would like to compile and deploy a custom model on the Modalix platform, please continue reading.

Supported Models

The following table shows the supported model architectures and their capabilities:

Model Architecture	Type	Supported Versions
LLAVA	Multimodal (Vision + Language)	1, 2
PaliGEMMA	Multimodal (Vision + Language)	1, 2
LLAMA	Language Only	2, 3
GEMMA	Language Only	1, 2, 3

Limitations

Limitation Type	Description
Model Configuration	Only default configurations are supported
Model Parameters	Only models with parameter count less than 10B are supported.
Model Files	Models must be downloaded from Hugging Face and contain: `config.json`, `tokenizer.model` and weights in safetensors format
Gemma3 VLM	Supported as language-only models (vision capabilities disabled)
LLAMA 3.2 Vision	Vision models are not supported

System Requirements

Palette SDK needs to be installed on a machine that matches the following requirements.

Parameter	Description
Operating System	Ubuntu 22.04 LTS
Memory	128GB or more is recommended.
Storage	1TB available space

Note

With 128GB machine, compilation can take several hours to complete depends on the type of model. 64GB may work for models that do not have vision capabilities.

Prerequisites

Ensure that the latest sima-cli version is installed in the Palette SDK.
Have a valid Developer Portal account to download assets from docs.sima.ai.
Have a valid Hugging Face account to download open-source models.
Some models, such as google/paligemma, require accepting a license agreement on Hugging Face. Make sure to review and accept the license before attempting to download these models.
Authorize the CLI to access Hugging Face using an user access token and Hugging Face-cli. Note, installing sima-cli automatically installs Hugging Face-cli.

Sample Code

Download the sample and the google/paligemma model with the following commands in the Palette SDK:

sima-user@docker-image-id:~$ cd /home/docker/sima-cli && mkdir genai && cd genai
sima-user@docker-image-id:~$ sima-cli install -v 1.7.0 samples/vlm-codegen

The sima_utils.transformer.model() package provides everything needed to take open-source models (from Hugging Face) and run them efficiently on the SiMa.ai Modalix platform.

The sample script tut_auto_llm.py demonstrates a complete workflow for compiling and evaluating a Vision Language Model. It covers:

Loading a Hugging Face model with from_hf_cache().
Generating deployment artifacts using gen_files() with both default and custom precision settings (FileGenPrecision).
Running inference with evaluate() across multiple backends (Hugging Face, ONNX, or SDK).

This workflow allows developers to go from a cached Hugging Face model to SiMa-ready binaries and validate correctness before deploying to the DevKit.

After a successful compilation, the console will display output confirming that the model artifacts have been generated.

   sima-user@docker-image-id:/home/docker/sima-ai/genai$ python tut_auto_llm.py

   Calibration Progress: |██████████████████████████████| 100.0% 1|1 Complete.  1/1
   Running Calibration ...DONE
   Running quantization ...DONE

The compiled model is located in the sima_files folder. Run the following commands to extract the files and deploy a model folder compatible with the LLiMa demo to the Modalix device:

   sima-user@docker-image-id:/home/docker/sima-ai/genai$ cd sima_files && python3 ../extract_elf_files.py
   sima-user@docker-image-id:/home/docker/sima-ai/genai/sima_files$ ssh sima@<modalix-ip> "mkdir -p /media/nvme/llima/mymodel"
   sima-user@docker-image-id:/home/docker/sima-ai/genai/sima_files$ scp -r elf_files sima@<modalix-ip>:/media/nvme/llima/mymodel/
   sima-user@docker-image-id:/home/docker/sima-ai/genai/sima_files$ scp -r devkit sima@<modalix-ip>:/media/nvme/llima/mymodel/

Replace <modalix-ip> with the actual IP address of your Modalix device.

To run the LLiMa demo with the compiled model, log in to the Modalix shell and execute the run.sh script as you would for the standard LLiMa demo. When prompted to select a model, choose mymodel to proceed.

  modalix:~$ cd /media/nvme/llima/simaai-genai-demo && ./run.sh

For detailed instructions on the LLiMa demo, refer to VLM demo. All Modalix DevKits come with the LLiMa demo preloaded from the factory. If your Modalix board does not have the LLiMa demo installed (for example, if the NVMe was formatted or replaced), follow that article to complete the installation.