An open-source video analysis structured framework: VideoPipe

In this era of ubiquitous video, we enjoy the convenience of personalized push, and every time we refresh, we can encounter video content that seems to be tailor-made. Was there ever a moment when you were curious about the "wisdom screening" behind this? What is it about the technology that enables the system to accurately capture your interests in the vast sea of videos and find the right video clips for you with just a few keywords? Beneath this seemingly simple browsing experience is the in-depth analysis and understanding of video content by computer science.

How does a computer understand a large number of videos?

Video is essentially a series of sequential image frames that are played at a certain frame rate, resulting in a continuous dynamic effect. The fundamentals of computer analysis of video can be broken down into three core steps:

"1. Decoding: Conversion of video to image frames"

The video is first decoded, a process that breaks down the continuous stream of motion into frames of static images.

"2. Analysis/Reasoning: The Magic of AI Algorithms"

The decoded image frame then proceeds to the analysis phase, which is where AI technology comes into play. Using advanced algorithms such as deep learning and machine learning, computers can not only recognize basic elements such as objects, faces, and text in images, but also further understand the context of the scene, detect actions, and even analyze emotions and intentions. This provides the possibility of tagging video content, summarizing and topic extraction.

"3. Code"

Reassemble image frames that have undergone specific processing (such as adding annotations, filtering, special effects, etc.) into a video.

Although this may seem like a few steps, there are many technical details and complex algorithms involved. For example, how to quickly deploy the trained AI image algorithm model to practical application scenarios? For programmers who have not been exposed to computer vision (hereinafter referred to as CV), or algorithm engineers who are purely engaged in algorithms, it may be a bit difficult to implement + implement AI video analysis related functions. However, as video becomes more and more widely used in daily life, the need to process and analyze video data is gradually increasing.

Today, I would like to introduce you to an open-source video analysis structured framework: VideoPipe, which aims to make developing video analysis applications as easy as writing the web with Django.

https://github.com/sherlockchou86/VideoPipe

VideoPipe is a framework for video analytics and structuring, written in C++ with fewer dependencies and easy integration. It is designed as a pipeline, in which each node is independent of each other and can be matched on its own, which can be used to build different types of video analytics applications, such as video structuring, image search, face recognition, and behavior analysis in the traffic/security field (such as traffic incident detection).

Introduction to VideoPipe

VideoPipe is similar to NVIDIA's DeepStream and Huawei's mxVision frameworks, but it's easier to use and more portable. It's written entirely in native C++ and relies on only a handful of popular third-party modules (such as OpenCV).

VideoPipe adopts a plug-in oriented coding style, which can be matched according to different needs, and we can use independent plug-ins (i.e., Node types in the framework) to build different types of video analysis applications. You just need to prepare the model and understand how to parse its output, and inference can be implemented based on different backends, such as OpenCV::D NN (default), TensorRT, PaddleInference, ONNXRuntime, whatever you like. The following figure shows how VideoPipe works.

As you can see, it offers the following features:

Stream read/push: supports multiple video streaming protocols, such as UDP, RTSP, RTMP, and other real-time transmission protocols.
Video decoding/encoding: Integrates OpenCV and GStreamer libraries to provide high-performance video and image encoding and decoding capabilities, and supports hardware acceleration to ensure real-time and smooth video processing.
Deep learning-based algorithm inference: Built-in support for a variety of deep learning models, including object detection, image classification, feature extraction, etc., provides powerful computing power for intelligent analysis of video content.
目标跟踪：集成IOU（Intersection over Union）、SORT（Simple Online and Realtime Tracking）等先进追踪算法，实现对移动物体的稳定、准确追踪。
Behavior analysis (BA): Based on target tracking technology, further analysis of specific behaviors, such as traffic violations (crossing the line, illegal parking), crowd flow analysis, etc., provides decision-making basis for traffic management and security monitoring.
Data broker: Efficiently forwards the analyzed structured data (such as JSON, XML, or custom formats) to a specified destination for subsequent data storage, analysis, or display.
Recording & Screenshots: Automatically record videos for a specific time period or capture keyframe screenshots as needed.
On-Screen Display (OSD): Overlay the model output results on the video frame (On-Screen Display), such as selecting detected targets and annotating behavior analysis results to improve user interaction experience and system transparency.

Get started quickly

VideoPipe doesn't pick hardware. Whether it's a high-end server equipped with a professional acceleration card, or a regular computer that relies only on a CPU, VideoPipe runs smoothly. At the same time, the VideoPipe project contains a number of detailed sample codes, which show how to use the VideoPipe framework to quickly build a face recognition application.

/*
* 名称：1-1-N sample
* 完整代码位于：samples/1-1-N_sample.cpp
* 功能说明：1个视频输入，1个视频分析任务（人脸检测和识别），2个输出（屏幕输出/RTMP推流输出）
* 注意：模型和视频文件需要自行准备
*/

int main() {
    // 日志配置
    VP_SET_LOG_INCLUDE_CODE_LOCATION(false);
    VP_SET_LOG_INCLUDE_THREAD_ID(false);
    VP_LOGGER_INIT();

    // 1、创建节点
    // 创建视频源节点
    // 从本地视频文件(./test_video/10.mp4)读取视频流
    auto file_src_0 = std::make_shared<vp_nodes::vp_file_src_node>("file_src_0", 0, "./test_video/10.mp4", 0.6);
    // 2、模型推理 Node
    // 一级推理：人脸检测，使用预先训练好的模型face_detection_yunet_2022mar.onnx
    auto yunet_face_detector_0 = std::make_shared<vp_nodes::vp_yunet_face_detector_node>("yunet_face_detector_0", "./models/face/face_detection_yunet_2022mar.onnx");
    // 二级推理：人脸识别，利用模型face_recognition_sface_2021dec.onnx提取人脸特征
    auto sface_face_encoder_0 = std::make_shared<vp_nodes::vp_sface_feature_encoder_node>("sface_face_encoder_0", "./models/face/face_recognition_sface_2021dec.onnx");
    // 3、OSD Node
    // 将人脸识别的结果绘制到视频帧上
    auto osd_0 = std::make_shared<vp_nodes::vp_face_osd_node_v2>("osd_0");
    // 在本地屏幕显示处理后的视频
    auto screen_des_0 = std::make_shared<vp_nodes::vp_screen_des_node>("screen_des_0", 0);
    // 通过RTMP协议推流到指定服务器(rtmp://192.168.77.60/live/10000)
    auto rtmp_des_0 = std::make_shared<vp_nodes::vp_rtmp_des_node>("rtmp_des_0", 0, "rtmp://192.168.77.60/live/10000");

    // 构建管道，将各个节点按处理顺序连接起来，形成了一个数据处理流水线
    // 视频数据从源节点开始，经过人脸检测、人脸识别，最后到OSD节点处理，并同时输出到屏幕和RTMP流
    yunet_face_detector_0->attach_to({file_src_0});
    sface_face_encoder_0->attach_to({yunet_face_detector_0});
    osd_0->attach_to({sface_face_encoder_0});

    // 管道自动拆分，通过屏幕/推流输出结果
    screen_des_0->attach_to({osd_0});
    rtmp_des_0->attach_to({osd_0});

    // 启动管道
    file_src_0->start();

    // 可视化管道
    vp_utils::vp_analysis_board board({file_src_0});
    board.display();
}

From the above code, you can find that the VideoPipe framework abstracts the steps of video analysis/processing into a pipe, and each step of processing is a node in the pipeline, and the processing process is as follows:

Video Reading (Node): Reads video data from a file or network stream and performs preliminary decoding processing to prepare it for subsequent analysis.
Model Inference (Node): Encapsulates the inference process of deep learning models, making the integration of advanced features such as facial recognition straightforward and efficient.
OSD (Node): visualizes the analysis results and graphically superimposes the recognized face information on the video frame.
Build a pipeline: Connect nodes in a logical order to form a complete processing link.
Startup and monitoring: The entire process can be started by starting the source node of the pipeline, such as the video read node. At the same time, it provides visualization functions, so that developers can intuitively monitor the running status of pipelines.

After the code is run, three screens will appear as shown in the figure above. They are the pipeline running state diagram (status auto-refresh), the on-screen display result (GUI), and the player display result (RTMP), so you can use VideoPipe!

An open-source video analysis structured framework: VideoPipe

Introduction to VideoPipe

Get started quickly

Read on

Without Liu Zhiqiang's bait being cyberbullied, Zhu Qing had no choice but to send a long video in response, as a fisherman, he just wanted to win

How do I change the background of my video to the one I want? Video background change method sharing

#电影: New film recommendation ##火速推荐好电影##默杀#比 "Manslaughter" is darker and more brutal, and the hero and heroine fight quite a bit in the style of "The Confidential File of the Black Rat", Ke Wenli's image and

The "7-year war" between short videos and e-commerce

A 15-year-old Internet celebrity boy reported a blogger with millions of followers and plagiarized videos to sell classes

iQIYI was sentenced! To compensate users for 41-day gold members, video platforms should be more sincere and less routine

About 5,000 yuan to take pictures and shoot videos, who is more suitable for mobile phones, action cameras and mirrorless?

Channels Local Life Distribution Strategy: A few simple steps can easily detonate store traffic!

How to use the ChatWiki large model RAG knowledge base to realize the automatic reply of the customer service of the video number store

Procuratorial work | Zhuang Zijun and Zhu Hao: Criminal Liability and Relief of Copyright Infringement by College Students' Online Video Entrepreneurship

The female lawyer was violently dragged by the bailiff, and the follow-up: The court has returned the mobile phone, and the video does not need to be deleted!

It's outrageous! The old man is out of the RV, the live video is exposed, and netizens call it too ruthless!

【AWTK Experience】How to play videos or camera feeds

The roaring dog imitator made money live on the live broadcast, Chen Chuang himself posted a video response, and the comment area was shouting Brother Fugui

See bullying again? 4 minors beat a 14-year-old girl and filmed a video, the police reported: summoned

Video|The 15 big heroes who lifted the car to save people in the electric white water are them!