DETR on RTSP Stream

This project demonstrates how to use the SiMa PePPi API to build a Python application that performs accelerated object detection inference on a live RTSP video stream using the DETR model.

Purpose

The primary goal of this pipeline is to showcase how to:

  • Read live video data from an RTSP source.

  • Perform real-time object detection inference using the SiMa MLSoC and the DETR model.

  • Render and annotate bounding boxes with class labels.

  • Stream the output video over UDP for further visualization or processing.

All inference is accelerated through SiMa’s MLSoC hardware, allowing high throughput and low-latency performance ideal for edge AI applications.

Configuration Overview

The application is configured via project.yaml. Below is a breakdown of its parameters.

Input/Output Configuration

Parameter

Description

Example

source.name

Input type for the video stream

"rtspsrc"

source.value

RTSP stream URL

"<RTSP_URL>"

udp_host

Host IP to stream the output via UDP

"<HOST_IP>"

port

UDP port number for output stream

"<PORT_NUM>"

pipeline

Pipeline type used for inference

"DetrPipeline"

Model Configuration (Models[0])

Parameter

Description

Example

name

Name of the model

"Detr"

targz

Path to the compiled model archive

"<targz_path>"

channel_mean

Per-channel mean for input normalization

[0.407, 0.446, 0.469]

channel_stddev

Per-channel stddev for input normalization

[0.289, 0.273, 0.277]

padding_type

Type of padding used before inference

"TOP_LEFT"

aspect_ratio

Whether to maintain original aspect ratio during preprocessing

false

topk

Maximum number of detections returned per frame

10

detection_threshold

Minimum confidence score to retain a detection

0.7

decode_type

Postprocessing decode type used with DETR

"detr"

normalize

Whether to normalize image input

true

label_file

Path to the label file with class names

"labels.txt"

Main Python Script

The Python script performs the following steps:

  1. Loads configuration from project.yaml.

  2. Initializes a video reader and writer using the PePPi API.

  3. Loads the DETR model via MLSoCSession and configures it.

  4. Continuously reads frames, performs inference, renders detection results, and streams annotated video via UDP.

The application is packaged using mpk create and deployed to the target device through SiMa’s deployment workflow.

Model Details

  • Download from here.

  • Model: DETR (DEtection TRansformer)

  • Input Normalization:

    • Mean: [0.407, 0.446, 0.469]

    • Stddev: [0.289, 0.273, 0.277]

  • Detection Threshold: 0.7

  • Max Output Per Frame: Top 10 detections

  • Bounding Boxes: Rendered using SimaBoxRender