Effdet on RTSP Stream

This project demonstrates how to use the SiMa PePPi API to build a Python application that performs accelerated object detection inference on a live RTSP video stream using the EfficientDet (Effdet) model.

Purpose

The primary goal of this pipeline is to showcase how to:

  • Read live video data from an RTSP source.

  • Perform real-time object detection inference using the SiMa MLSoC and the EfficientDet model.

  • Render and annotate bounding boxes with class labels.

  • Stream the output video over UDP for further visualization or processing.

All inference is accelerated through SiMa’s MLSoC hardware, allowing high throughput and low-latency performance ideal for edge AI applications.

Configuration Overview

The application is configured via project.yaml. Below is a breakdown of its parameters.

Input/Output Configuration

Parameter

Description

Example

source.name

Input type for the video stream

"rtspsrc"

source.value

RTSP stream URL

"<RTSP_URL>"

udp_host

Host IP to stream the output via UDP

"<HOST_IP>"

port

UDP port number for output stream

"<PORT_NUM>"

pipeline

Pipeline type used for inference

"EffDetPipeline"

Model Configuration (Models[0])

Parameter

Description

Example

name

Name of the model

"Effdet"

targz

Path to the compressed model archive

"<targz_path>"

normalize

Whether to normalize image input

true

aspect_ratio

Whether to maintain original aspect ratio during preprocessing

true

channel_mean

Per-channel mean for input normalization

[0.485, 0.456, 0.406]

channel_stddev

Per-channel stddev for input normalization

[0.229, 0.224, 0.225]

decode_type

Postprocessing decode type used with Effdet

"effdet"

detection_threshold

Minimum confidence score to retain a detection

0.3

scaled_width

Width to scale image before inference

512

scaled_height

Height to scale image before inference

288

label_file

Path to the label file with class names

"labels.txt"

padding_type

Type of padding used before inference

"CENTER"

topk

Maximum number of detections returned per frame

10

num_classes

Number of object classes supported by the model

90

Main Python Script

The Python script performs the following steps:

  1. Loads configuration from project.yaml.

  2. Initializes a video reader and writer using the PePPi API.

  3. Loads the Effdet model via MLSoCSession and configures it.

  4. Continuously reads frames, performs inference, renders detection results, and streams annotated video via UDP.

The application is packaged using mpk create and deployed to the target device through SiMa’s deployment workflow.

Model Details

  • Download from here.

  • Model: EfficientDet (Effdet)

  • Input Normalization:

    • Mean: [0.485, 0.456, 0.406]

    • Stddev: [0.229, 0.224, 0.225]

  • Detection Threshold: 0.3

  • Max Output Per Frame: Top 10 detections

  • Input Resolution: 512×288 (scaled)

  • Bounding Boxes: Rendered using SimaBoxRender