Effdet on RTSP Stream

This project demonstrates how to use the SiMa PePPi API to build a Python application that performs accelerated object detection inference on a live RTSP video stream using the EfficientDet (Effdet) model.

Purpose

The primary goal of this pipeline is to showcase how to:

Read live video data from an RTSP source.
Perform real-time object detection inference using the SiMa MLSoC and the EfficientDet model.
Render and annotate bounding boxes with class labels.
Stream the output video over UDP for further visualization or processing.

All inference is accelerated through SiMa’s MLSoC hardware, allowing high throughput and low-latency performance ideal for edge AI applications.

Configuration Overview

The application is configured via project.yaml. Below is a breakdown of its parameters.

Input/Output Configuration

Parameter	Description	Example
`source.name`	Input type for the video stream	`"rtspsrc"`
`source.value`	RTSP stream URL	`"<RTSP_URL>"`
`udp_host`	Host IP to stream the output via UDP	`"<HOST_IP>"`
`port`	UDP port number for output stream	`"<PORT_NUM>"`
`pipeline`	Pipeline type used for inference	`"EffDetPipeline"`

Model Configuration (`Models[0]`)

Parameter	Description	Example
`name`	Name of the model	`"Effdet"`
`targz`	Path to the compressed model archive	`"<targz_path>"`
`normalize`	Whether to normalize image input	`true`
`aspect_ratio`	Whether to maintain original aspect ratio during preprocessing	`true`
`channel_mean`	Per-channel mean for input normalization	`[0.485, 0.456, 0.406]`
`channel_stddev`	Per-channel stddev for input normalization	`[0.229, 0.224, 0.225]`
`decode_type`	Postprocessing decode type used with Effdet	`"effdet"`
`detection_threshold`	Minimum confidence score to retain a detection	`0.3`
`scaled_width`	Width to scale image before inference	`512`
`scaled_height`	Height to scale image before inference	`288`
`label_file`	Path to the label file with class names	`"labels.txt"`
`padding_type`	Type of padding used before inference	`"CENTER"`
`topk`	Maximum number of detections returned per frame	`10`
`num_classes`	Number of object classes supported by the model	`90`

Main Python Script

The Python script performs the following steps:

Loads configuration from project.yaml.
Initializes a video reader and writer using the PePPi API.
Loads the Effdet model via MLSoCSession and configures it.
Continuously reads frames, performs inference, renders detection results, and streams annotated video via UDP.

The application is packaged using mpk create and deployed to the target device through SiMa’s deployment workflow.

Model Details

Download from here.
Model: EfficientDet (Effdet)
Input Normalization:
- Mean: [0.485, 0.456, 0.406]
- Stddev: [0.229, 0.224, 0.225]
Detection Threshold: 0.3
Max Output Per Frame: Top 10 detections
Input Resolution: 512×288 (scaled)
Bounding Boxes: Rendered using SimaBoxRender