YOLOv7 on RTSP Stream
This project demonstrates the use of SiMa’s PePPi API to run real-time object detection on a live RTSP video stream using the YOLOv7 model. The application uses SiMa’s MLSoC hardware for accelerated inference and streams the annotated output via UDP.
Purpose
This pipeline is designed to:
Capture live video from an RTSP source.
Perform object detection using the YOLOv7 model.
Annotate frames with bounding boxes and labels.
Stream the results to a specified host via UDP.
This setup is ideal for edge inference applications requiring high-speed, low-latency visual processing.
Configuration Overview
The runtime configuration is managed through project.yaml
. The following tables explain the input/output and model configuration.
Input/Output Configuration
Parameter |
Description |
Example |
---|---|---|
|
Type of input source |
|
|
RTSP video stream URL |
|
|
Destination host IP for UDP output |
|
|
UDP port number |
|
|
Pipeline name for inference |
|
Model Configuration (Models[0]
)
Parameter |
Description |
Value |
---|---|---|
|
Model identifier |
|
|
Path to the YOLOv7 model archive |
|
|
Class label file path |
|
|
Whether to apply normalization |
|
|
Input channel mean values |
|
|
Input channel stddev values |
|
|
Input padding type |
|
|
Maintain original aspect ratio during preprocessing |
|
|
Max number of detections returned per frame |
|
|
Confidence threshold for detections |
|
|
Decode method used during postprocessing |
|
Main Python Script
The Python script does the following:
Loads
project.yaml
to read configuration.Initializes a
VideoReader
for RTSP input and aVideoWriter
for UDP output.Sets up a YOLOv7 inference session using
MLSoCSession
.In a loop:
Captures a video frame.
Runs inference.
Renders bounding boxes and class labels.
Sends the annotated frame to the UDP endpoint.
The application is packaged using mpk create
and deployed to the target device using SiMa’s deployment tools.
Model Details
Download from here.
Model: YOLOv7
Normalize Input: Yes (mean:
[0.0, 0.0, 0.0]
, stddev:[1.0, 1.0, 1.0]
)Detection Threshold: 0.7
Output: Top 10 detections per frame
Bounding Box Rendering:
SimaBoxRender