YOLOv7 on RTSP Stream

This project demonstrates the use of SiMa’s PePPi API to run real-time object detection on a live RTSP video stream using the YOLOv7 model. The application uses SiMa’s MLSoC hardware for accelerated inference and streams the annotated output via UDP.

Purpose

This pipeline is designed to:

  • Capture live video from an RTSP source.

  • Perform object detection using the YOLOv7 model.

  • Annotate frames with bounding boxes and labels.

  • Stream the results to a specified host via UDP.

This setup is ideal for edge inference applications requiring high-speed, low-latency visual processing.

Configuration Overview

The runtime configuration is managed through project.yaml. The following tables explain the input/output and model configuration.

Input/Output Configuration

Parameter

Description

Example

source.name

Type of input source

"rtspsrc"

source.value

RTSP video stream URL

"<RTSP_URL>"

udp_host

Destination host IP for UDP output

"<HOST_IP>"

port

UDP port number

"<PORT_NUM>"

pipeline

Pipeline name for inference

"yoloV7Pipeline"

Model Configuration (Models[0])

Parameter

Description

Value

name

Model identifier

"yolov7"

targz

Path to the YOLOv7 model archive

"<targz_path>"

label_file

Class label file path

"labels.txt"

normalize

Whether to apply normalization

true

channel_mean

Input channel mean values

[0.0, 0.0, 0.0]

channel_stddev

Input channel stddev values

[1.0, 1.0, 1.0]

padding_type

Input padding type

"CENTER"

aspect_ratio

Maintain original aspect ratio during preprocessing

true

topk

Max number of detections returned per frame

10

detection_threshold

Confidence threshold for detections

0.7

decode_type

Decode method used during postprocessing

"yolo"

Main Python Script

The Python script does the following:

  1. Loads project.yaml to read configuration.

  2. Initializes a VideoReader for RTSP input and a VideoWriter for UDP output.

  3. Sets up a YOLOv7 inference session using MLSoCSession.

  4. In a loop:

    • Captures a video frame.

    • Runs inference.

    • Renders bounding boxes and class labels.

    • Sends the annotated frame to the UDP endpoint.

The application is packaged using mpk create and deployed to the target device using SiMa’s deployment tools.

Model Details

  • Download from here.

  • Model: YOLOv7

  • Normalize Input: Yes (mean: [0.0, 0.0, 0.0], stddev: [1.0, 1.0, 1.0])

  • Detection Threshold: 0.7

  • Output: Top 10 detections per frame

  • Bounding Box Rendering: SimaBoxRender