GenAI Model Compilation ======================= Introduction ------------ The GenAI Model Compilation feature streamlines the process of compiling GenAI models. For a select set of ``Llama``, ``Llava``, ``Gemma``, and ``PaliGemma`` models from Hugging Face, the SDK automatically generates all required ``.elf`` files along with the Python orchestration script, enabling direct execution on the Sima.ai Modalix platform. Sima has precompiled several popular LLM models and published them on `Hugging Face `_. The developer can download these models using the following commands and explore them in the `LLiMa demo application <../overview/hello_sima/run_demos.html#llm-demo>`_. .. code-block:: console modalix:~$ cd /media/nvme && mkdir llima && cd llima modalix:~$ sima-cli install -v 1.7.0 samples/llima -t select Wait until installation completes then run: .. code-block:: console modalix:~$ cd simaai-genai-demo && ./run.sh This command prompts the developer to select and download a specific precompiled model for evaluating the Sima.ai Modalix platform. If the developer would like to compile and deploy a custom model on the Modalix platform, please continue reading. Supported Models ---------------- The following table shows the supported model architectures and their capabilities: .. list-table:: :widths: 40 30 30 :header-rows: 1 * - Model Architecture - Type - Supported Versions * - LLAVA - Multimodal (Vision + Language) - 1, 2 * - PaliGEMMA - Multimodal (Vision + Language) - 1, 2 * - LLAMA - Language Only - 2, 3 * - GEMMA - Language Only - 1, 2, 3 Limitations ----------- .. list-table:: :widths: 30 70 :header-rows: 1 :class: wrapped-table * - Limitation Type - Description * - Model Configuration - Only default configurations are supported * - Model Parameters - Only models with parameter count less than 10B are supported. * - Model Files - Models must be downloaded from Hugging Face and contain: ``config.json``, ``tokenizer.model`` and weights in safetensors format * - Gemma3 VLM - Supported as language-only models (vision capabilities disabled) * - LLAMA 3.2 Vision - Vision models are not supported System Requirements ------------------- Palette SDK needs to be installed on a machine that matches the following requirements. .. list-table:: :widths: 30 70 :header-rows: 1 * - Parameter - Description * - **Operating System** - Ubuntu 22.04 LTS * - **Memory** - 128GB or more is recommended. * - **Storage** - 1TB available space .. note:: With 128GB machine, compilation can take several hours to complete depends on the type of model. 64GB may work for models that do not have vision capabilities. Prerequisites ------------- - Ensure that the latest :ref:`sima_cli` `version `_ is installed in the Palette SDK. - Have a valid Developer Portal account to download assets from `docs.sima.ai `_. - Have a valid Hugging Face account to download open-source models. - Some models, such as ``google/paligemma``, require accepting a license agreement on Hugging Face. Make sure to review and accept the license before attempting to download these models. - Authorize the CLI to access Hugging Face using an `user access token `_ and ``Hugging Face-cli``. Note, installing ``sima-cli`` automatically installs ``Hugging Face-cli``. Sample Code ----------- Download the sample and the ``google/paligemma`` model with the following commands in the Palette SDK: .. code-block:: console sima-user@docker-image-id:~$ cd /home/docker/sima-cli && mkdir genai && cd genai sima-user@docker-image-id:~$ sima-cli install -v 1.7.0 samples/vlm-codegen The :py:meth:`sima_utils.transformer.model` package provides everything needed to take open-source models (from Hugging Face) and run them efficiently on the SiMa.ai Modalix platform. The sample script `tut_auto_llm.py `_ demonstrates a complete workflow for compiling and evaluating a :py:meth:`Vision Language Model `. It covers: - Loading a Hugging Face model with :py:meth:`~sima_utils.transformer.model.VisionLanguageModel.from_hf_cache`. - Generating deployment artifacts using :py:meth:`~sima_utils.transformer.model.VisionLanguageModel.gen_files` with both default and custom precision settings (:py:class:`~sima_utils.transformer.model.FileGenPrecision`). - Running inference with :py:meth:`~sima_utils.transformer.model.VisionLanguageModel.evaluate` across multiple backends (Hugging Face, ONNX, or SDK). This workflow allows developers to go from a cached Hugging Face model to SiMa-ready binaries and validate correctness before deploying to the DevKit. After a successful compilation, the console will display output confirming that the model artifacts have been generated. .. code-block:: console :class: code-narrow sima-user@docker-image-id:/home/docker/sima-ai/genai$ python tut_auto_llm.py Calibration Progress: |██████████████████████████████| 100.0% 1|1 Complete. 1/1 Running Calibration ...DONE Running quantization ...DONE