PeopleDetector on RTSP Stream

This project uses the SiMa PePPi API to run a real-time people detection pipeline on a live RTSP video stream. It leverages a CenterNet-based model optimized for detecting human figures and streams the annotated results via UDP.

Purpose

This pipeline showcases how to:

  • Ingest live RTSP video using SiMa’s PePPi API.

  • Run people detection using a CenterNet-based model on SiMa’s MLSoC.

  • Annotate frames with bounding boxes and class labels.

  • Stream the output to a specified host and port via UDP.

All inference is hardware-accelerated through SiMa’s MLSoC for efficient edge deployment.

Configuration Overview

Settings are defined in project.yaml. The following tables outline the input/output configuration and model-specific parameters.

Input/Output Configuration

Parameter

Description

Example

source.name

Type of input source

"rtspsrc"

source.value

RTSP video stream URL

"<RTSP_URL>"

udp_host

Host IP for UDP output

"<HOST_IP>"

port

Port number for UDP stream

"<PORT_NUM>"

pipeline

Inference pipeline to be used

"PeopleDetector"

Model Configuration (Models[0])

Parameter

Description

Value

name

Model identifier

"pd"

targz

Compressed model archive path

"<targz_path>"

label_file

Path to class label file

"labels.txt"

normalize

Enable input normalization

true

channel_mean

Per-channel mean for input normalization

[0.408, 0.447, 0.470]

channel_stddev

Per-channel stddev for input normalization

[0.289, 0.274, 0.278]

padding_type

Padding strategy for input preprocessing

"BOTTOM_LEFT"

aspect_ratio

Whether to maintain original image aspect ratio

true

topk

Maximum number of detections returned per frame

10

detection_threshold

Minimum confidence to qualify as a valid detection

0.7

decode_type

Postprocessing decode method used

"centernet"

Main Python Script

The script performs the following operations:

  1. Loads configuration from project.yaml.

  2. Initializes a VideoReader for the RTSP stream and a VideoWriter for UDP output.

  3. Loads the detection model with a SiMa MLSoCSession.

  4. Continuously:

    • Reads a frame

    • Runs inference

    • Annotates detected people

    • Streams the annotated frame over UDP

The application is packaged using mpk create and deployed to the target device using SiMa’s standard flow.

Model Details

  • Download from here.

  • Model Type: CenterNet-based

  • Target: People detection

  • Normalization:

    • Mean: [0.408, 0.447, 0.470]

    • Stddev: [0.289, 0.274, 0.278]

  • Detection Confidence Threshold: 0.7

  • Output: Top 10 people detections per frame