.. _Introduction to LLiMa:

Introduction to LLiMa
=====================

Overview
--------

The GenAI Model Compilation feature streamlines the process of compiling GenAI models based on HF safetensors or GGUF model format. For a wide set of different models like ``Llama``, ``Gemma``, ``Phi``, ``Qwen`` or ``Mistral`` from Hugging Face, the SDK automatically generates all required binary/elf files along with the Python orchestration script, enabling direct execution on the Sima.ai Modalix platform.

To quickly get started, Sima has precompiled several popular LLM models and published them on `Hugging Face <https://huggingface.co/organizations/simaai/activity/all>`_. You can download and run these models immediately using the following commands:

.. code-block:: console

   modalix:~$ cd /media/nvme && mkdir llima && cd llima
   modalix:~$ sima-cli install -v 2.0.0 samples/llima -t select

Wait until installation completes then run:

.. code-block:: console

   modalix:~$ cd simaai-genai-demo && ./run.sh

This command prompts you to select and download a specific precompiled model for evaluating the Sima.ai Modalix platform. 
More information can be found in the `LLiMa demo application <../overview/hello_sima/run_demos.html#llm-demo>`_.


Supported Models
----------------

The following table shows the supported model architectures and their capabilities:

.. list-table::
   :widths: 30 15 55
   :header-rows: 1

   * - Model Architecture
     - Type
     - Supported Sizes
   * - `Llama 2 <https://huggingface.co/collections/meta-llama/llama-2-family>`_
     - LLM
     - `7b <https://huggingface.co/simaai/Llama-2-7b-chat-hf-a16w4>`_
   * - `Llama 3.1 <https://huggingface.co/collections/meta-llama/llama-31>`_
     - LLM
     - `8b <https://huggingface.co/simaai/Llama-3.1-8B-Instruct-a16w4>`_
   * - `Llama 3.2 <https://huggingface.co/collections/meta-llama/llama-32>`_
     - LLM
     - 1b, `3b <https://huggingface.co/simaai/Llama-3.2-3B-Instruct-a16w4>`_
   * - `Gemma 1 <https://huggingface.co/collections/google/gemma-release>`_
     - LLM
     - 2b, 7b
   * - `Gemma 2 <https://huggingface.co/collections/google/gemma-2-release>`_
     - LLM
     - 2b, 9b
   * - `Gemma 3 <https://huggingface.co/collections/google/gemma-3-release>`_
     - LLM
     - `1b <https://huggingface.co/simaai/gemma-3-1b-it-a16w4>`_, `4b <https://huggingface.co/simaai/gemma-3-4b-it-a16w4>`_
   * - `Phi 3.5 mini <https://huggingface.co/microsoft/Phi-3.5-mini-instruct>`_
     - LLM
     - `3.8b <https://huggingface.co/simaai/Phi-3.5-mini-instruct-a16w4>`_
   * - `Qwen 2.5 <https://huggingface.co/collections/Qwen/qwen25>`_
     - LLM
     - `0.5b <https://huggingface.co/simaai/Qwen2.5-0.5B-instruct>`_, `1.5b <https://huggingface.co/simaai/Qwen2.5-1.5B-instruct>`_, 3b, `7b <https://huggingface.co/simaai/Qwen2.5-7B-instruct>`_
   * - `Qwen 3 <https://huggingface.co/collections/Qwen/qwen3>`_
     - LLM
     -  `0.6b <https://huggingface.co/simaai/Qwen3-0.6B>`_, `1.7b <https://huggingface.co/simaai/Qwen3-1.7B>`_, `4b <https://huggingface.co/simaai/Qwen3-4B-Instruct-2507>`_, `8b <https://huggingface.co/simaai/Qwen3-8B>`_
   * - `Mistral 1 <https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.3>`_
     - LLM
     - `7b <https://huggingface.co/simaai/Mistral-7B-Instruct-v0.3-a16w4>`_
   * - `Llava 1.5 <https://huggingface.co/llava-hf/llava-1.5-7b-hf>`_
     - VLM
     - `7b <https://huggingface.co/simaai/llava-1.5-7b-hf-a16w4>`_
   * - `PaliGemma <https://huggingface.co/google/paligemma-3b-pt-224>`_
     - VLM
     - `3b <https://huggingface.co/simaai/paligemma-3b-pt-224-a16w8>`_
   * - `Gemma 3 <https://huggingface.co/simaai/gemma3-siglip448>`_
     - VLM
     - `4b <https://huggingface.co/simaai/gemma3-siglip448-a16w4>`_


Limitations
-----------

.. list-table::
   :widths: 30 70
   :header-rows: 1
   :class: wrapped-table

   * - Limitation Type
     - Description
   * - Model Architecture
     - Only models based on the architectures listed above are supported.
   * - Model Parameters
     - Only models with parameter count less than 10B are supported.
   * - HF Models
     - Models must be downloaded from Hugging Face and contain: ``config.json``, ``tokenizer.json``, ``tokenizer_config.json``, ``generation_config.json`` and weights in safetensors format
   * - GGUF Models
     - GGUF format is supported for LLMs only. VLMs must be compiled from the Hugging Face safetensors format.
   * - Gemma3 VLM
     - Supported with modfied SigLip 448 vision encoder
   * - LLAMA 3.2 Vision
     - Vision models are not supported