GenAI Model Compilation
=======================

Introduction
------------

The GenAI Model Compilation feature streamlines the process of compiling GenAI models.  
For a select set of ``Llama``, ``Llava``, ``Gemma``, and ``PaliGemma`` models from Hugging Face,  
the SDK automatically generates all required ``.elf`` files along with the Python orchestration  
script, enabling direct execution on the Sima.ai Modalix platform.

Sima has precompiled several popular LLM models and published them on `Hugging Face <https://Hugging Face.co/organizations/simaai/activity/all>`_.  
The developer can download these models using the following commands and explore them in the `LLiMa demo application <../overview/hello_sima/run_demos.html#llm-demo>`_.

.. code-block:: console

   modalix:~$ cd /media/nvme && mkdir llima && cd llima
   modalix:~$ sima-cli install -v 1.7.0 samples/llima -t select

Wait until installation completes then run:

.. code-block:: console

   modalix:~$ cd simaai-genai-demo && ./run.sh

This command prompts the developer to select and download a specific precompiled model for evaluating the Sima.ai Modalix platform.  
If the developer would like to compile and deploy a custom model on the Modalix platform, please continue reading.


Supported Models
----------------

The following table shows the supported model architectures and their capabilities:

.. list-table::
   :widths: 40 30 30
   :header-rows: 1

   * - Model Architecture
     - Type
     - Supported Versions
   * - LLAVA
     - Multimodal (Vision + Language)
     - 1, 2
   * - PaliGEMMA
     - Multimodal (Vision + Language)
     - 1, 2
   * - LLAMA
     - Language Only
     - 2, 3
   * - GEMMA
     - Language Only
     - 1, 2, 3

Limitations
-----------

.. list-table::
   :widths: 30 70
   :header-rows: 1
   :class: wrapped-table

   * - Limitation Type
     - Description
   * - Model Configuration
     - Only default configurations are supported
   * - Model Parameters
     - Only models with parameter count less than 10B are supported.
   * - Model Files
     - Models must be downloaded from Hugging Face and contain: ``config.json``, ``tokenizer.model`` and weights in safetensors format
   * - Gemma3 VLM
     - Supported as language-only models (vision capabilities disabled)
   * - LLAMA 3.2 Vision
     - Vision models are not supported

System Requirements
-------------------

Palette SDK needs to be installed on a machine that matches the following requirements.

.. list-table::
  :widths: 30 70
  :header-rows: 1

  * - Parameter
    - Description
  * - **Operating System**
    - Ubuntu 22.04 LTS
  * - **Memory**
    - 128GB or more is recommended.
  * - **Storage**
    - 1TB available space

.. note::

  With 128GB machine, compilation can take several hours to complete depends on the type of model. 64GB may work for models that do not have vision capabilities.

Prerequisites
-------------

- Ensure that the latest :ref:`sima_cli` `version <https://pypi.org/project/sima-cli/#history>`_ is installed in the Palette SDK.
- Have a valid Developer Portal account to download assets from `docs.sima.ai <https://docs.sima.ai>`_.
- Have a valid Hugging Face account to download open-source models.
- Some models, such as ``google/paligemma``, require accepting a license agreement on Hugging Face. Make sure to review and accept the license before attempting to download these models.
- Authorize the CLI to access Hugging Face using an `user access token <https://Hugging Face.co/docs/hub/en/security-tokens>`_ and ``Hugging Face-cli``. Note, installing ``sima-cli`` automatically installs ``Hugging Face-cli``.

Sample Code
-----------

Download the sample and the ``google/paligemma`` model with the following commands in the Palette SDK:

.. code-block:: console

   sima-user@docker-image-id:~$ cd /home/docker/sima-cli && mkdir genai && cd genai
   sima-user@docker-image-id:~$ sima-cli install -v 1.7.0 samples/vlm-codegen

The :py:meth:`sima_utils.transformer.model` package provides everything needed to take open-source models (from Hugging Face) and run them efficiently on the SiMa.ai Modalix platform.

The sample script `tut_auto_llm.py <https://docs.sima.ai/pkg_downloads/SDK1.7.0/samples/vlm-codegen/tut_auto_llm.py>`_ demonstrates a complete workflow for compiling and evaluating a :py:meth:`Vision Language Model <sima_utils.transformer.model.VisionLanguageModel>`. It covers:

- Loading a Hugging Face model with
  :py:meth:`~sima_utils.transformer.model.VisionLanguageModel.from_hf_cache`.
- Generating deployment artifacts using
  :py:meth:`~sima_utils.transformer.model.VisionLanguageModel.gen_files` with
  both default and custom precision settings
  (:py:class:`~sima_utils.transformer.model.FileGenPrecision`).
- Running inference with
  :py:meth:`~sima_utils.transformer.model.VisionLanguageModel.evaluate` across
  multiple backends (Hugging Face, ONNX, or SDK).

This workflow allows developers to go from a cached Hugging Face model to SiMa-ready binaries and validate correctness before deploying to the DevKit.

After a successful compilation, the console will display output confirming that the model artifacts have been generated.

.. code-block:: console
   :class: code-narrow
   
      sima-user@docker-image-id:/home/docker/sima-ai/genai$ python tut_auto_llm.py
      
      Calibration Progress: |██████████████████████████████| 100.0% 1|1 Complete.  1/1
      Running Calibration ...DONE
      Running quantization ...DONE

The compiled model is located in the ``sima_files`` folder. Run the following commands to extract the files and deploy a model folder compatible with the LLiMa demo to the Modalix device:

.. code-block:: console
   :class: code-narrow

      sima-user@docker-image-id:/home/docker/sima-ai/genai$ cd sima_files && python3 ../extract_elf_files.py
      sima-user@docker-image-id:/home/docker/sima-ai/genai/sima_files$ ssh sima@<modalix-ip> "mkdir -p /media/nvme/llima/mymodel"
      sima-user@docker-image-id:/home/docker/sima-ai/genai/sima_files$ scp -r elf_files sima@<modalix-ip>:/media/nvme/llima/mymodel/
      sima-user@docker-image-id:/home/docker/sima-ai/genai/sima_files$ scp -r devkit sima@<modalix-ip>:/media/nvme/llima/mymodel/


Replace ``<modalix-ip>`` with the actual IP address of your Modalix device.

To run the LLiMa demo with the compiled model, log in to the Modalix shell and execute the ``run.sh`` script as you would for the standard LLiMa demo. When prompted to select a model, choose ``mymodel`` to proceed.

.. code-block:: console
  :class: code-narrow

    modalix:~$ cd /media/nvme/llima/simaai-genai-demo && ./run.sh

For detailed instructions on the LLiMa demo, refer to  
`VLM demo <../overview/hello_sima/run_demos.html#vlm-demo>`_. All Modalix DevKits come with the LLiMa demo preloaded from the factory. If your Modalix board does not have the LLiMa demo installed (for example, if the NVMe was formatted or replaced), follow that article to complete the installation.