YOLOv8 on RTSP Stream

This project demonstrates how to use the SiMa PePPi API to run real-time object detection using the YOLOv8 model on a live RTSP stream. The pipeline is optimized for edge inference on SiMa’s MLSoC, and streams annotated video frames via UDP.

Purpose

This pipeline is designed to:

  • Read video from an RTSP stream using rtspsrc.

  • Run detection using the YOLOv8 model on SiMa MLSoC.

  • Annotate frames with bounding boxes and class labels.

  • Stream the output frames over UDP for real-time visualization.

This setup is ideal for evaluating high-performance object detection in edge AI deployments.

Configuration Overview

The application is driven by project.yaml. The parameters below describe its structure.

Input/Output Configuration

Parameter

Description

Example

source.name

Input source type

"rtspsrc"

source.value

RTSP stream URL

"<RTSP_URL>"

udp_host

Destination IP for UDP stream

"<HOST_IP>"

port

Destination port for UDP stream

"<PORT_NUM>"

pipeline

Processing pipeline name

"YoloV8"

Model Configuration (Models[0])

Parameter

Description

Value

name

Model identifier

"YOLO"

targz

Compressed model archive path

"<targz_path>"

label_file

Path to label file

"labels.txt"

normalize

Apply input normalization

true

channel_mean

Input channel mean values

[0.0, 0.0, 0.0]

channel_stddev

Input channel stddev values

[1.0, 1.0, 1.0]

padding_type

Padding type during preprocessing

"CENTER"

aspect_ratio

Maintain input aspect ratio

true

topk

Max number of detections per frame

10

detection_threshold

Score threshold for valid detections

0.7

nms_iou_threshold

IOU threshold for non-max suppression

0.3

decode_type

Detection decoding strategy

"yolo"

num_classes

Number of classes the model can detect

87

Main Python Script

The Python script executes the following steps:

  1. Loads project.yaml.

  2. Initializes a VideoReader for RTSP input and a VideoWriter for UDP output.

  3. Sets up the YOLOv8 model using a MLSoCSession configured with SiMa MLSoC.

  4. In a loop:

    • Reads an input frame.

    • Optionally dumps it to disk for debugging (/tmp/nv12.out).

    • Runs the model.

    • Renders detection results onto the frame.

    • Streams the annotated frame via UDP.

Model Details

  • Download from here.

  • Model: YOLOv8

  • Input Format: NV12

  • Normalization: Yes (mean = [0.0, 0.0, 0.0], stddev = [1.0, 1.0, 1.0])

  • Thresholds:

    • Detection: 0.7

    • NMS IOU: 0.3

  • Output: Up to 10 detections per frame

  • Classes: 87 object categories