Effdet on RTSP Stream ===================== This project demonstrates how to use the SiMa PePPi API to build a Python application that performs accelerated object detection inference on a live RTSP video stream using the EfficientDet (Effdet) model. Purpose ------- The primary goal of this pipeline is to showcase how to: - Read live video data from an RTSP source. - Perform real-time object detection inference using the SiMa MLSoC and the EfficientDet model. - Render and annotate bounding boxes with class labels. - Stream the output video over UDP for further visualization or processing. All inference is accelerated through SiMa’s MLSoC hardware, allowing high throughput and low-latency performance ideal for edge AI applications. Configuration Overview ---------------------- The application is configured via ``project.yaml``. Below is a breakdown of its parameters. Input/Output Configuration ~~~~~~~~~~~~~~~~~~~~~~~~~~ ================ ==================================== ==================== Parameter Description Example ================ ==================================== ==================== ``source.name`` Input type for the video stream ``"rtspsrc"`` ``source.value`` RTSP stream URL ``""`` ``udp_host`` Host IP to stream the output via UDP ``""`` ``port`` UDP port number for output stream ``""`` ``pipeline`` Pipeline type used for inference ``"EffDetPipeline"`` ================ ==================================== ==================== Model Configuration (``Models[0]``) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ======================= ============================================================== ========================= Parameter Description Example ======================= ============================================================== ========================= ``name`` Name of the model ``"Effdet"`` ``targz`` Path to the compressed model archive ``""`` ``normalize`` Whether to normalize image input ``true`` ``aspect_ratio`` Whether to maintain original aspect ratio during preprocessing ``true`` ``channel_mean`` Per-channel mean for input normalization ``[0.485, 0.456, 0.406]`` ``channel_stddev`` Per-channel stddev for input normalization ``[0.229, 0.224, 0.225]`` ``decode_type`` Postprocessing decode type used with Effdet ``"effdet"`` ``detection_threshold`` Minimum confidence score to retain a detection ``0.3`` ``scaled_width`` Width to scale image before inference ``512`` ``scaled_height`` Height to scale image before inference ``288`` ``label_file`` Path to the label file with class names ``"labels.txt"`` ``padding_type`` Type of padding used before inference ``"CENTER"`` ``topk`` Maximum number of detections returned per frame ``10`` ``num_classes`` Number of object classes supported by the model ``90`` ======================= ============================================================== ========================= Main Python Script ------------------ The Python script performs the following steps: 1. Loads configuration from ``project.yaml``. 2. Initializes a video reader and writer using the PePPi API. 3. Loads the Effdet model via ``MLSoCSession`` and configures it. 4. Continuously reads frames, performs inference, renders detection results, and streams annotated video via UDP. The application is packaged using ``mpk create`` and deployed to the target device through SiMa’s deployment workflow. Model Details ------------- - Download from `here `__. - Model: EfficientDet (Effdet) - Input Normalization: - Mean: ``[0.485, 0.456, 0.406]`` - Stddev: ``[0.229, 0.224, 0.225]`` - Detection Threshold: 0.3 - Max Output Per Frame: Top 10 detections - Input Resolution: 512×288 (scaled) - Bounding Boxes: Rendered using ``SimaBoxRender``