DETR on RTSP Stream
This project demonstrates how to use the SiMa PePPi API to build a Python application that performs accelerated object detection inference on a live RTSP video stream using the DETR model.
Purpose
The primary goal of this pipeline is to showcase how to:
Read live video data from an RTSP source.
Perform real-time object detection inference using the SiMa MLSoC and the DETR model.
Render and annotate bounding boxes with class labels.
Stream the output video over UDP for further visualization or processing.
All inference is accelerated through SiMa’s MLSoC hardware, allowing high throughput and low-latency performance ideal for edge AI applications.
Configuration Overview
The application is configured via project.yaml
. Below is a breakdown of its parameters.
Input/Output Configuration
Parameter |
Description |
Example |
---|---|---|
|
Input type for the video stream |
|
|
RTSP stream URL |
|
|
Host IP to stream the output via UDP |
|
|
UDP port number for output stream |
|
|
Pipeline type used for inference |
|
Model Configuration (Models[0]
)
Parameter |
Description |
Example |
---|---|---|
|
Name of the model |
|
|
Path to the compiled model archive |
|
|
Per-channel mean for input normalization |
|
|
Per-channel stddev for input normalization |
|
|
Type of padding used before inference |
|
|
Whether to maintain original aspect ratio during preprocessing |
|
|
Maximum number of detections returned per frame |
|
|
Minimum confidence score to retain a detection |
|
|
Postprocessing decode type used with DETR |
|
|
Whether to normalize image input |
|
|
Path to the label file with class names |
|
Main Python Script
The Python script performs the following steps:
Loads configuration from
project.yaml
.Initializes a video reader and writer using the PePPi API.
Loads the DETR model via
MLSoCSession
and configures it.Continuously reads frames, performs inference, renders detection results, and streams annotated video via UDP.
The application is packaged using mpk create
and deployed to the target device through SiMa’s deployment workflow.
Model Details
Download from here.
Model: DETR (DEtection TRansformer)
Input Normalization:
Mean:
[0.407, 0.446, 0.469]
Stddev:
[0.289, 0.273, 0.277]
Detection Threshold: 0.7
Max Output Per Frame: Top 10 detections
Bounding Boxes: Rendered using
SimaBoxRender