Effdet on RTSP Stream
This project demonstrates how to use the SiMa PePPi API to build a Python application that performs accelerated object detection inference on a live RTSP video stream using the EfficientDet (Effdet) model.
Purpose
The primary goal of this pipeline is to showcase how to:
Read live video data from an RTSP source.
Perform real-time object detection inference using the SiMa MLSoC and the EfficientDet model.
Render and annotate bounding boxes with class labels.
Stream the output video over UDP for further visualization or processing.
All inference is accelerated through SiMa’s MLSoC hardware, allowing high throughput and low-latency performance ideal for edge AI applications.
Configuration Overview
The application is configured via project.yaml
. Below is a breakdown of its parameters.
Input/Output Configuration
Parameter |
Description |
Example |
---|---|---|
|
Input type for the video stream |
|
|
RTSP stream URL |
|
|
Host IP to stream the output via UDP |
|
|
UDP port number for output stream |
|
|
Pipeline type used for inference |
|
Model Configuration (Models[0]
)
Parameter |
Description |
Example |
---|---|---|
|
Name of the model |
|
|
Path to the compressed model archive |
|
|
Whether to normalize image input |
|
|
Whether to maintain original aspect ratio during preprocessing |
|
|
Per-channel mean for input normalization |
|
|
Per-channel stddev for input normalization |
|
|
Postprocessing decode type used with Effdet |
|
|
Minimum confidence score to retain a detection |
|
|
Width to scale image before inference |
|
|
Height to scale image before inference |
|
|
Path to the label file with class names |
|
|
Type of padding used before inference |
|
|
Maximum number of detections returned per frame |
|
|
Number of object classes supported by the model |
|
Main Python Script
The Python script performs the following steps:
Loads configuration from
project.yaml
.Initializes a video reader and writer using the PePPi API.
Loads the Effdet model via
MLSoCSession
and configures it.Continuously reads frames, performs inference, renders detection results, and streams annotated video via UDP.
The application is packaged using mpk create
and deployed to the target device through SiMa’s deployment workflow.
Model Details
Download from here.
Model: EfficientDet (Effdet)
Input Normalization:
Mean:
[0.485, 0.456, 0.406]
Stddev:
[0.229, 0.224, 0.225]
Detection Threshold: 0.3
Max Output Per Frame: Top 10 detections
Input Resolution: 512×288 (scaled)
Bounding Boxes: Rendered using
SimaBoxRender