GenAI Model Compilation
=======================

Introduction
------------

The GenAI Model Compilation feature streamlines the process of compiling GenAI models.  
For a select set of ``Llama``, ``Llava``, ``Gemma``, and ``PaliGemma`` models from Hugging Face,  
the SDK automatically generates all required ``.elf`` files along with the Python orchestration  
script, enabling direct execution on the Sima.ai Modalix platform.

Sima has precompiled several popular LLM models and published them on `Hugging Face <https://Hugging Face.co/organizations/simaai/activity/all>`_.  
The developer can download these models using the following commands and explore them in the `LLiMa demo application <../overview/hello_sima/run_demos.html#llm-demo>`_.

.. code-block:: console

   modalix:~$ cd /media/nvme && mkdir llima && cd llima
   modalix:~$ sima-cli install -v 1.7.0 samples/llima -t select

Wait until installation completes then run:

.. code-block:: console

   modalix:~$ cd simaai-genai-demo && ./run.sh

This command prompts the developer to select and download a specific precompiled model for evaluating the Sima.ai Modalix platform.  
If the developer would like to compile and deploy a custom model on the Modalix platform, please continue reading.


Supported Models
----------------

The following table shows the supported model architectures and their capabilities:

.. list-table::
   :widths: 40 30 30
   :header-rows: 1

   * - Model Architecture
     - Type
     - Supported Versions
   * - LLAVA
     - Multimodal (Vision + Language)
     - 1, 2
   * - PaliGEMMA
     - Multimodal (Vision + Language)
     - 1, 2
   * - LLAMA
     - Language Only
     - 2, 3
   * - GEMMA
     - Language Only
     - 1, 2, 3

Limitations
-----------

.. list-table::
   :widths: 30 70
   :header-rows: 1
   :class: wrapped-table

   * - Limitation Type
     - Description
   * - Model Configuration
     - Only default configurations are supported
   * - Model Parameters
     - Only models with parameter count less than 10B are supported.
   * - Model Files
     - Models must be downloaded from Hugging Face and contain: ``config.json``, ``tokenizer.model`` and weights in safetensors format
   * - Gemma3 VLM
     - Supported as language-only models (vision capabilities disabled)
   * - LLAMA 3.2 Vision
     - Vision models are not supported

System Requirements
-------------------

Palette SDK needs to be installed on a machine that matches the following requirements.

.. list-table::
  :widths: 30 70
  :header-rows: 1

  * - Parameter
    - Description
  * - **Operating System**
    - Ubuntu 22.04 LTS
  * - **Memory**
    - 128GB or more is recommended.
  * - **Storage**
    - 1TB available space

.. note::

  With 128GB machine, compilation can take several hours to complete depends on the type of model. 64GB may work for models that do not have vision capabilities.

Prerequisites
-------------

- Ensure that the latest :ref:`sima_cli` `version <https://pypi.org/project/sima-cli/#history>`_ is installed in the Palette SDK.
- Have a valid Developer Portal account to download assets from `docs.sima.ai <https://docs.sima.ai>`_.
- Have a valid Hugging Face account to download open-source models.
- Some models, such as ``google/paligemma``, require accepting a license agreement on Hugging Face. Make sure to review and accept the license before attempting to download these models.
- Authorize the CLI to access Hugging Face using an `user access token <https://Hugging Face.co/docs/hub/en/security-tokens>`_ and ``Hugging Face-cli``. Note, installing ``sima-cli`` automatically installs ``Hugging Face-cli``.

Sample Code
-----------

Download the sample and the ``google/paligemma`` model with the following commands in the Palette SDK:

.. code-block:: console

   sima-user@docker-image-id:~$ cd /home/docker/sima-cli && mkdir genai && cd genai
   sima-user@docker-image-id:~$ sima-cli install -v 1.7.0 samples/vlm-codegen

The :py:meth:`sima_utils.transformer.model` package provides everything needed to take open-source models (from Hugging Face) and run them efficiently on the SiMa.ai Modalix platform.

The sample script `tut_auto_llm.py <https://docs.sima.ai/pkg_downloads/SDK1.7.0/samples/vlm-codegen/tut_auto_llm.py>`_ demonstrates a complete workflow for compiling and evaluating a :py:meth:`Vision Language Model <sima_utils.transformer.model.VisionLanguageModel>`. It covers:

- Loading a Hugging Face model with
  :py:meth:`~sima_utils.transformer.model.VisionLanguageModel.from_hf_cache`.
- Generating deployment artifacts using
  :py:meth:`~sima_utils.transformer.model.VisionLanguageModel.gen_files` with
  both default and custom precision settings
  (:py:class:`~sima_utils.transformer.model.FileGenPrecision`).
- Running inference with
  :py:meth:`~sima_utils.transformer.model.VisionLanguageModel.evaluate` across
  multiple backends (Hugging Face, ONNX, or SDK).

This workflow allows developers to go from a cached Hugging Face model to SiMa-ready binaries and validate correctness before deploying to the DevKit.

After a successful compilation, the console will display output confirming that the model artifacts have been generated.


.. code-block:: console
   :class: code-narrow
   
      sima-user@docker-image-id:/home/docker/sima-ai/genai$ python tut_auto_llm.py
      
      Calibration Progress: |██████████████████████████████| 100.0% 1|1 Complete.  1/1
      Running Calibration ...DONE
      Running quantization ...DONE