DETR on RTSP Stream =================== This project demonstrates how to use the SiMa PePPi API to build a Python application that performs accelerated object detection inference on a live RTSP video stream using the DETR model. Purpose ------- The primary goal of this pipeline is to showcase how to: - Read live video data from an RTSP source. - Perform real-time object detection inference using the SiMa MLSoC and the DETR model. - Render and annotate bounding boxes with class labels. - Stream the output video over UDP for further visualization or processing. All inference is accelerated through SiMa’s MLSoC hardware, allowing high throughput and low-latency performance ideal for edge AI applications. Configuration Overview ---------------------- The application is configured via ``project.yaml``. Below is a breakdown of its parameters. Input/Output Configuration ~~~~~~~~~~~~~~~~~~~~~~~~~~ ================ ==================================== ================== Parameter Description Example ================ ==================================== ================== ``source.name`` Input type for the video stream ``"rtspsrc"`` ``source.value`` RTSP stream URL ``""`` ``udp_host`` Host IP to stream the output via UDP ``""`` ``port`` UDP port number for output stream ``""`` ``pipeline`` Pipeline type used for inference ``"DetrPipeline"`` ================ ==================================== ================== Model Configuration (``Models[0]``) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ======================= ============================================================== ========================= Parameter Description Example ======================= ============================================================== ========================= ``name`` Name of the model ``"Detr"`` ``targz`` Path to the compiled model archive ``""`` ``channel_mean`` Per-channel mean for input normalization ``[0.407, 0.446, 0.469]`` ``channel_stddev`` Per-channel stddev for input normalization ``[0.289, 0.273, 0.277]`` ``padding_type`` Type of padding used before inference ``"TOP_LEFT"`` ``aspect_ratio`` Whether to maintain original aspect ratio during preprocessing ``false`` ``topk`` Maximum number of detections returned per frame ``10`` ``detection_threshold`` Minimum confidence score to retain a detection ``0.7`` ``decode_type`` Postprocessing decode type used with DETR ``"detr"`` ``normalize`` Whether to normalize image input ``true`` ``label_file`` Path to the label file with class names ``"labels.txt"`` ======================= ============================================================== ========================= Main Python Script ------------------ The Python script performs the following steps: 1. Loads configuration from ``project.yaml``. 2. Initializes a video reader and writer using the PePPi API. 3. Loads the DETR model via ``MLSoCSession`` and configures it. 4. Continuously reads frames, performs inference, renders detection results, and streams annotated video via UDP. The application is packaged using ``mpk create`` and deployed to the target device through SiMa’s deployment workflow. Model Details ------------- - Download from `here `__. - Model: DETR (DEtection TRansformer) - Input Normalization: - Mean: ``[0.407, 0.446, 0.469]`` - Stddev: ``[0.289, 0.273, 0.277]`` - Detection Threshold: 0.7 - Max Output Per Frame: Top 10 detections - Bounding Boxes: Rendered using ``SimaBoxRender``