DETR on RTSP Stream
===================

This project demonstrates how to use the SiMa PePPi API to build a Python application that performs accelerated object detection inference on a live RTSP video stream using the DETR model.

Purpose
-------

The primary goal of this pipeline is to showcase how to:

- Read live video data from an RTSP source.
- Perform real-time object detection inference using the SiMa MLSoC and the DETR model.
- Render and annotate bounding boxes with class labels.
- Stream the output video over UDP for further visualization or processing.

All inference is accelerated through SiMa’s MLSoC hardware, allowing high throughput and low-latency performance ideal for edge AI applications.

Configuration Overview
----------------------

The application is configured via ``project.yaml``. Below is a breakdown of its parameters.

Input/Output Configuration
~~~~~~~~~~~~~~~~~~~~~~~~~~

================ ==================================== ==================
Parameter        Description                          Example
================ ==================================== ==================
``source.name``  Input type for the video stream      ``"rtspsrc"``
``source.value`` RTSP stream URL                      ``"<RTSP_URL>"``
``udp_host``     Host IP to stream the output via UDP ``"<HOST_IP>"``
``port``         UDP port number for output stream    ``"<PORT_NUM>"``
``pipeline``     Pipeline type used for inference     ``"DetrPipeline"``
================ ==================================== ==================

Model Configuration (``Models[0]``)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

======================= ============================================================== =========================
Parameter               Description                                                    Example
======================= ============================================================== =========================
``name``                Name of the model                                              ``"Detr"``
``targz``               Path to the compiled model archive                             ``"<targz_path>"``
``channel_mean``        Per-channel mean for input normalization                       ``[0.407, 0.446, 0.469]``
``channel_stddev``      Per-channel stddev for input normalization                     ``[0.289, 0.273, 0.277]``
``padding_type``        Type of padding used before inference                          ``"TOP_LEFT"``
``aspect_ratio``        Whether to maintain original aspect ratio during preprocessing ``false``
``topk``                Maximum number of detections returned per frame                ``10``
``detection_threshold`` Minimum confidence score to retain a detection                 ``0.7``
``decode_type``         Postprocessing decode type used with DETR                      ``"detr"``
``normalize``           Whether to normalize image input                               ``true``
``label_file``          Path to the label file with class names                        ``"labels.txt"``
======================= ============================================================== =========================

Main Python Script
------------------

The Python script performs the following steps:

1. Loads configuration from ``project.yaml``.
2. Initializes a video reader and writer using the PePPi API.
3. Loads the DETR model via ``MLSoCSession`` and configures it.
4. Continuously reads frames, performs inference, renders detection results, and streams annotated video via UDP.

The application is packaged using ``mpk create`` and deployed to the target device through SiMa’s deployment workflow.

Model Details
-------------

- Download from `here <https://docs.sima.ai/pkg_downloads/SDK1.6.0/appzoo/peppi/detr_mpk.tar.gz>`__.
- Model: DETR (DEtection TRansformer)
- Input Normalization:

  - Mean: ``[0.407, 0.446, 0.469]``
  - Stddev: ``[0.289, 0.273, 0.277]``

- Detection Threshold: 0.7
- Max Output Per Frame: Top 10 detections
- Bounding Boxes: Rendered using ``SimaBoxRender``