PeopleDetector on RTSP Stream ============================= This project uses the SiMa PePPi API to run a real-time people detection pipeline on a live RTSP video stream. It leverages a CenterNet-based model optimized for detecting human figures and streams the annotated results via UDP. Purpose ------- This pipeline showcases how to: - Ingest live RTSP video using SiMa’s PePPi API. - Run people detection using a CenterNet-based model on SiMa’s MLSoC. - Annotate frames with bounding boxes and class labels. - Stream the output to a specified host and port via UDP. All inference is hardware-accelerated through SiMa’s MLSoC for efficient edge deployment. Configuration Overview ---------------------- Settings are defined in ``project.yaml``. The following tables outline the input/output configuration and model-specific parameters. Input/Output Configuration ~~~~~~~~~~~~~~~~~~~~~~~~~~ ================ ============================= ==================== Parameter Description Example ================ ============================= ==================== ``source.name`` Type of input source ``"rtspsrc"`` ``source.value`` RTSP video stream URL ``""`` ``udp_host`` Host IP for UDP output ``""`` ``port`` Port number for UDP stream ``""`` ``pipeline`` Inference pipeline to be used ``"PeopleDetector"`` ================ ============================= ==================== Model Configuration (``Models[0]``) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ======================= ================================================== ========================= Parameter Description Value ======================= ================================================== ========================= ``name`` Model identifier ``"pd"`` ``targz`` Compressed model archive path ``""`` ``label_file`` Path to class label file ``"labels.txt"`` ``normalize`` Enable input normalization ``true`` ``channel_mean`` Per-channel mean for input normalization ``[0.408, 0.447, 0.470]`` ``channel_stddev`` Per-channel stddev for input normalization ``[0.289, 0.274, 0.278]`` ``padding_type`` Padding strategy for input preprocessing ``"BOTTOM_LEFT"`` ``aspect_ratio`` Whether to maintain original image aspect ratio ``true`` ``topk`` Maximum number of detections returned per frame ``10`` ``detection_threshold`` Minimum confidence to qualify as a valid detection ``0.7`` ``decode_type`` Postprocessing decode method used ``"centernet"`` ======================= ================================================== ========================= Main Python Script ------------------ The script performs the following operations: 1. Loads configuration from ``project.yaml``. 2. Initializes a ``VideoReader`` for the RTSP stream and a ``VideoWriter`` for UDP output. 3. Loads the detection model with a SiMa ``MLSoCSession``. 4. Continuously: - Reads a frame - Runs inference - Annotates detected people - Streams the annotated frame over UDP The application is packaged using ``mpk create`` and deployed to the target device using SiMa’s standard flow. Model Details ------------- - Download from `here `__. - Model Type: CenterNet-based - Target: People detection - Normalization: - Mean: ``[0.408, 0.447, 0.470]`` - Stddev: ``[0.289, 0.274, 0.278]`` - Detection Confidence Threshold: 0.7 - Output: Top 10 people detections per frame