PeopleDetector on RTSP Stream

This project uses the SiMa PePPi API to run a real-time people detection pipeline on a live RTSP video stream. It leverages a CenterNet-based model optimized for detecting human figures and streams the annotated results via UDP.

Purpose

This pipeline showcases how to:

Ingest live RTSP video using SiMa’s PePPi API.
Run people detection using a CenterNet-based model on SiMa’s MLSoC.
Annotate frames with bounding boxes and class labels.
Stream the output to a specified host and port via UDP.

All inference is hardware-accelerated through SiMa’s MLSoC for efficient edge deployment.

Configuration Overview

Settings are defined in project.yaml. The following tables outline the input/output configuration and model-specific parameters.

Input/Output Configuration

Parameter	Description	Example
`source.name`	Type of input source	`"rtspsrc"`
`source.value`	RTSP video stream URL	`"<RTSP_URL>"`
`udp_host`	Host IP for UDP output	`"<HOST_IP>"`
`port`	Port number for UDP stream	`"<PORT_NUM>"`
`pipeline`	Inference pipeline to be used	`"PeopleDetector"`

Model Configuration (`Models[0]`)

Parameter	Description	Value
`name`	Model identifier	`"pd"`
`targz`	Compressed model archive path	`"<targz_path>"`
`label_file`	Path to class label file	`"labels.txt"`
`normalize`	Enable input normalization	`true`
`channel_mean`	Per-channel mean for input normalization	`[0.408, 0.447, 0.470]`
`channel_stddev`	Per-channel stddev for input normalization	`[0.289, 0.274, 0.278]`
`padding_type`	Padding strategy for input preprocessing	`"BOTTOM_LEFT"`
`aspect_ratio`	Whether to maintain original image aspect ratio	`true`
`topk`	Maximum number of detections returned per frame	`10`
`detection_threshold`	Minimum confidence to qualify as a valid detection	`0.7`
`decode_type`	Postprocessing decode method used	`"centernet"`

Main Python Script

The script performs the following operations:

Loads configuration from project.yaml.
Initializes a VideoReader for the RTSP stream and a VideoWriter for UDP output.
Loads the detection model with a SiMa MLSoCSession.
Continuously:
- Reads a frame
- Runs inference
- Annotates detected people
- Streams the annotated frame over UDP

The application is packaged using mpk create and deployed to the target device using SiMa’s standard flow.

Model Details

Download from here.
Model Type: CenterNet-based
Target: People detection
Normalization:
- Mean: [0.408, 0.447, 0.470]
- Stddev: [0.289, 0.274, 0.278]
Detection Confidence Threshold: 0.7
Output: Top 10 people detections per frame