DETR on RTSP Stream

This project demonstrates how to use the SiMa PePPi API to build a Python application that performs accelerated object detection inference on a live RTSP video stream using the DETR model.

Purpose

The primary goal of this pipeline is to showcase how to:

Read live video data from an RTSP source.
Perform real-time object detection inference using the SiMa MLSoC and the DETR model.
Render and annotate bounding boxes with class labels.
Stream the output video over UDP for further visualization or processing.

All inference is accelerated through SiMa’s MLSoC hardware, allowing high throughput and low-latency performance ideal for edge AI applications.

Configuration Overview

The application is configured via project.yaml. Below is a breakdown of its parameters.

Input/Output Configuration

Parameter	Description	Example
`source.name`	Input type for the video stream	`"rtspsrc"`
`source.value`	RTSP stream URL	`"<RTSP_URL>"`
`udp_host`	Host IP to stream the output via UDP	`"<HOST_IP>"`
`port`	UDP port number for output stream	`"<PORT_NUM>"`
`pipeline`	Pipeline type used for inference	`"DetrPipeline"`

Model Configuration (`Models[0]`)

Parameter	Description	Example
`name`	Name of the model	`"Detr"`
`targz`	Path to the compiled model archive	`"<targz_path>"`
`channel_mean`	Per-channel mean for input normalization	`[0.407, 0.446, 0.469]`
`channel_stddev`	Per-channel stddev for input normalization	`[0.289, 0.273, 0.277]`
`padding_type`	Type of padding used before inference	`"TOP_LEFT"`
`aspect_ratio`	Whether to maintain original aspect ratio during preprocessing	`false`
`topk`	Maximum number of detections returned per frame	`10`
`detection_threshold`	Minimum confidence score to retain a detection	`0.7`
`decode_type`	Postprocessing decode type used with DETR	`"detr"`
`normalize`	Whether to normalize image input	`true`
`label_file`	Path to the label file with class names	`"labels.txt"`

Main Python Script

The Python script performs the following steps:

Loads configuration from project.yaml.
Initializes a video reader and writer using the PePPi API.
Loads the DETR model via MLSoCSession and configures it.
Continuously reads frames, performs inference, renders detection results, and streams annotated video via UDP.

The application is packaged using mpk create and deployed to the target device through SiMa’s deployment workflow.

Model Details

Download from here.
Model: DETR (DEtection TRansformer)
Input Normalization:
- Mean: [0.407, 0.446, 0.469]
- Stddev: [0.289, 0.273, 0.277]
Detection Threshold: 0.7
Max Output Per Frame: Top 10 detections
Bounding Boxes: Rendered using SimaBoxRender