Giter Club home page Giter Club logo

person_detection's Introduction

Detection of persons in a Youtube video

^^click here^^

Objective

Given the url of a Youtube video (here the Dior - Eau de Parfum commercial), generate a new video that shows the presence of humans by drawing bounding boxes around them.

Methodology

Overall principle

  1. split the video into frames;
  2. apply a detection model every K frame;
  3. for each frame where no detection has been run, interpolate the bounding boxes;
  4. draw the bounding boxes found for each frame;
  5. recombine the frames into the final video.

Some details

๐Ÿ‘‰ For the detection model, the script loads either RetinaNet or Faster R-CNN from torchivion.models

๐Ÿ‘‰ To speed up the processing time, the detection of persons is performed on every K frame, where K is a parameter chosen by the user (argument --stride=K).

๐Ÿ‘‰ To find the bounding boxes on frames where no detection is performed, the script interpolates boxes between two successive frames where a detection has been made. To decide whether two boxes on two different frames correspond to the same person, it first checks that they have a similar size, and then it selects the pair of boxes that yields the highest IoU scores for each pair of successive interpolated boxes. Finally, it checks that these IoU scores are above some threshold (to avoid interpolating boxes that are too far apart) and that the confidence level of at least one box is high enough (to avoid detection of false positive).

The principle of the algorithm is described below, where T(box) is a box with same size but with its lower-left corner (in Cartesian coordinates) translated to the origin and conf(box) is the confidence level associated to a box:

input:  boxes0, boxes1 (boxes found on successive frames, frame0 and frame1,
        where detection has been run and ordered by decreasing confidence level)
output: final_boxes (list of bounding boxes for each frame between frame0 and frame1)

params: min_IoU, min_conf

for each box1 in boxes1:
    best_IoU <- 0
    best_box <- None
    for each box0 in boxes0:
        if IoU(T(box0), T(box1)) < min_IoU:
            continue
        interpolate(box0, box1)
        score <- min(IoU bewteen successive interpolated boxes)
        if score > best_score:
            best_IoU <- score
            best_box <- box0
    if best_IoU > min_IoU and (conf(best_box) > min_conf or conf(box1) > min_conf):
        boxes0 <- boxes0 \ {best_box}
        add each box in interpolate(best_box, box1) to final_boxes

๐Ÿ‘‰ Previously to the latter algorithm, a first selection is made to filter out bounding boxes for which the confidence level is below some threshold eps.

Some remarks

๐Ÿšฉ The above algorithm will display a box even if its confidence level is below min_conf, provided a similar box with a high confidence level is detected on the previous or the next frame. In other words, it decreases the number of false negative detections (and symmetrically increases the number of false positives).

๐Ÿšฉ On the other hand, if the primary concern is reducing the number of false positive detections, it is better to set eps = min_conf = 0.5 (or any suitable value).

How to run the detection

First, set up a python virtual environment (named env here) with pytorch and torchivision (see details here). The script has been tested with Python==3.8 and Pytorch==1.7. Then, install the required packages:

(env)$ python -m pip install -U -r requirements.txt

To generate the output video, activate the virtual environment and run the script

(env)$ python detect.py --stride=4 --eps=0.3 --min_conf=0.7 --min_iou=0.66 --with_conf

The input video is automatically downloaded in the same folder as the script detect.py, and the output video is created at the same location. To try with another video, add --url='https://youtu.be/<xxx>'.

To use the GPU, add the argument --with_gpu and choose an adequate batch size (say 4) with --batch_size=4.

The default detection model is RetinaNet. To use Faster R-CNN instead, add the argument --model='fasterrcnn'.

person_detection's People

Contributors

antoine-hochart avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.