YOLOv8 on RTSP Stream

This project demonstrates how to use the SiMa PePPi API to run real-time object detection using the YOLOv8 model on a live RTSP stream. The pipeline is optimized for edge inference on SiMa’s MLSoC, and streams annotated video frames via UDP.

Purpose

This pipeline is designed to:

Read video from an RTSP stream using rtspsrc.
Run detection using the YOLOv8 model on SiMa MLSoC.
Annotate frames with bounding boxes and class labels.
Stream the output frames over UDP for real-time visualization.

This setup is ideal for evaluating high-performance object detection in edge AI deployments.

Configuration Overview

The application is driven by project.yaml. The parameters below describe its structure.

Input/Output Configuration

Parameter	Description	Example
`source.name`	Input source type	`"rtspsrc"`
`source.value`	RTSP stream URL	`"<RTSP_URL>"`
`udp_host`	Destination IP for UDP stream	`"<HOST_IP>"`
`port`	Destination port for UDP stream	`"<PORT_NUM>"`
`pipeline`	Processing pipeline name	`"YoloV8"`

Model Configuration (`Models[0]`)

Parameter	Description	Value
`name`	Model identifier	`"YOLO"`
`targz`	Compressed model archive path	`"<targz_path>"`
`label_file`	Path to label file	`"labels.txt"`
`normalize`	Apply input normalization	`true`
`channel_mean`	Input channel mean values	`[0.0, 0.0, 0.0]`
`channel_stddev`	Input channel stddev values	`[1.0, 1.0, 1.0]`
`padding_type`	Padding type during preprocessing	`"CENTER"`
`aspect_ratio`	Maintain input aspect ratio	`true`
`topk`	Max number of detections per frame	`10`
`detection_threshold`	Score threshold for valid detections	`0.7`
`nms_iou_threshold`	IOU threshold for non-max suppression	`0.3`
`decode_type`	Detection decoding strategy	`"yolo"`
`num_classes`	Number of classes the model can detect	`87`

Main Python Script

The Python script executes the following steps:

Loads project.yaml.
Initializes a VideoReader for RTSP input and a VideoWriter for UDP output.
Sets up the YOLOv8 model using a MLSoCSession configured with SiMa MLSoC.
In a loop:
- Reads an input frame.
- Optionally dumps it to disk for debugging (/tmp/nv12.out).
- Runs the model.
- Renders detection results onto the frame.
- Streams the annotated frame via UDP.

Model Details

Download from here.
Model: YOLOv8
Input Format: NV12
Normalization: Yes (mean = [0.0, 0.0, 0.0], stddev = [1.0, 1.0, 1.0])
Thresholds:
- Detection: 0.7
- NMS IOU: 0.3
Output: Up to 10 detections per frame
Classes: 87 object categories