Giter Club home page Giter Club logo

mipt-mlops's Introduction

Subway Detection Assistant πŸ€–

Structure

БэкСнд img.png

About ❔

The AI solution for detecting individuals crossing the safety line on the subway platform has been implemented.

The project aims to create a solution to enhance the safety of citizens on subway platforms. The solution is built upon two models: the first addresses the Human Pose Estimation task, while the second focuses on segmentation. The peculiarity of the solution lies in the fact that the camera from which we receive the input video stream is not required to be in a fixed position. The neural network for segmentation is trained to detect the shoot line from various perspectives, as the training dataset consists of snapshots from different angles.

To test the developed solution, I created a single-page web application where users can upload subway platform videos and receive output videos detecting individuals who have crossed the safety line.

The results of this work can also be applied in processing real-time streaming video from subway surveillance cameras. To enhance citizen safety on the metro platform, a system of fines for crossing the safety line before the train arrives could be implemented. It would be sufficient to install several cameras along the tracks, focused on the platform, for personal identification purposes.

Example

Example

Website: link

Recordings from the video camera: google drive

Implementation details

Human Pose Estimation 🧘🏻

Architecture πŸ“

I used a state-of-the-art model, YOLOv8, specifically the pose estimation model, to determine the positions of people on the platform. Currently, I only utilize bounding boxes around people, but I focused specifically on the pose estimation task rather than the person detection task because in the future, information about key points of the human body (such as feet) will improve the performance quality of the application. The model of this architecture didn't need to be trained. A pre-trained implementation available online shows excellent results.

Download the model: yolov8m-pose.pt

Segmentation πŸ”Ž

Data collection, data annotation and architecture πŸ’Ύ

For the segmentation architecture, I chose SegNet. The segmentation model identifies segments of the platform from its edge to the safety line. To train this model, I recorded 22 videos on various platforms of the Moscow metro, annotated 17 of them using the Roboflow tool, resulting in 640 annotated images. After augmentation (flipping images along the vertical axis), the dataset expanded to 1125 annotated images.

Dataset of recordings: metro-dataset

Dataset of frames on Roboflow: link

Model training 🦾

I trained the model for 45 epochs, gradually decreasing the learning rate after a certain number of epochs. Training notebook: Segmentation_1125.ipynb

Download the model: segnet_bce_1125_45_epoch.pth

Algorithm

The detection task is handled by the YOLOv8 model in Human Pose Estimation mode, and platform edge segmentation is done using SegNet. For classification purposes, determining whether a person is in a hazardous zone or not relies on the following idea: rectangles around detected individuals, obtained from the output of YOLOv8, are considered. Then, the proportion of edges of each rectangle lying within the segmented area is checked. If this value exceeds a predefined threshold, it is assumed that the person is on the platform edge; otherwise, they are considered to be in the safe zone.

For more precise results, algorithms for constructing convex hulls around the segmented platform area (which may be disjointed), utilizing key points of a person's legs (which may sometimes be invisible), etc., could be employed. However, compared to those alternatives, the presented solution is more efficient in terms of computational resources.

Video stream processing πŸ“Ή

The video stream input to the algorithm is first fragmented according to the specified FPS using OpenCV library tools. Next, depending on the selected operating mode of the application, the extracted frames are fed as input to either HPE models, segmentation models, or both simultaneously. The output frames obtained are then concatenated according to the AVC standard to produce the final result.

Note. The final video recording is created by calling the command:

`out = cv2.VideoWriter(output_video_path.absolute().as_posix(), cv2.VideoWriter_fourcc(*'avc1'), fps, frameSize)`

However, I encountered the following issues during the development of the application:

  1. When working on Windows OS, I encountered a missing dll file openh264-1.8.0-win64.dll, and I received the corresponding error message in the terminal.

  2. Both on Windows and Linux, I faced issues with the codec:

    [ERROR:[email protected]] global cap_ffmpeg_impl.hpp:3130 open Could not find encoder for codec_id=27, error: Encoder not found

    [ERROR:[email protected]] global cap_ffmpeg_impl.hpp:3208 open VIDEOIO/FFMPEG: Failed to initialize VideoWriter

    To solve this problem, I used the OpenCV library implementation provided by the conda package manager.

Web App 🌐

The developed interface allows the user to upload a video recording of the metro platform, select the FPS for segmenting the uploaded recording, as well as one of the video stream processing modes: fragmentation, segmentation, human pose estimation, detection. It's running on a virtual machine rented in the Yandex.Cloud.

Modes

Backend πŸ–₯

I chose FastAPI as the service for processing requests due to its ease of use. Depending on the server, computations are performed either on the CPU or on the GPU.

Frontend βš›οΈ

The frontend is implemented in React according to the following layout: design in Figma

Results πŸ“

The application solves the assigned task. The developed algorithm successfully detects objects on the platform and classifies them depending on whether they are in a hazardous zone or not.

As expected, the segmentation model performs an order of magnitude better on video recordings from the training dataset compared to unannotated video recordings. The problem lies in the specificity of the shoot line on each platform depending on the station. Therefore, for a stable solution at each station, preliminary labeling and further training of the neural network on new annotated data are necessary.

An example of edge segmentation of a platform on an unannotated platform:

TODO:

  • Add tests and quality metrics for the classifier that determines whether a person crossed the line or not
  • Add support for real-time streaming video processing

mipt-mlops's People

Contributors

smirnovlad avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.