Giter Club home page Giter Club logo

faster-mobile-retinaface's Introduction

Face Detection @ 500-1000 FPS

Image of PR

Language grade: Python License CVPR

100% Python3 reimplementation of RetinaFace, a solid single-shot face localisation framework in CVPR 2020.

  • Replaced CUDA based anchor generator functions with NumPy APIs.
  • Stored runtime anchors via dict to avoid duplicate counting.
  • Optimized NMS algorithm through vector calculation methods.
  • Reduced FPN layers and anchor density for middle-close range detection.
  • Used low-level Mxnet APIs to speed up the inference process.

Getting Start

Requirements

  • Install GStreamer for reading videos (Optional)
  • Mxnet >= 1.5.0 (preferably CUDA based package)
  • Python >= 3.6
  • opencv-python

While not required, for optimal performance, it is highly recommended to run the code using a CUDA enabled GPU.

Running for Video Files

gst-launch-1.0 -q filesrc location=$YOUR_FILE_PATH !\
  qtdemux ! h264parse ! avdec_h264 !\
  video/x-raw, width=640, height=480 ! videoconvert !\
  video/x-raw, format=BGR ! fdsink | python3 face_detector.py

Real-Time Capturing via Webcam

gst-launch-1.0 -q v4l2src device=/dev/video0 !\
  video/x-raw, width=640, height=480 ! videoconvert !\
  video/x-raw, format=BGR ! fdsink | python3 face_detector.py

Some Tips

  • Be Careful About ! and |
  • Decoding the H.264 (or other format) stream using CPU can cost much. I'd suggest using your NVIDIA GPU for decoding acceleration. See Issues#5 and nvbugs for more details.
  • For Jetson-Nano, following Install MXNet on a Jetson to prepare your envoriment.

Methods and Experiments

For middle-close range face detection, appropriately removing FPN layers and reducing the density of anchors could count-down the overall computational complexity. In addition, low-level APIs are used at preprocessing stage to bypass unnecessary format checks. While inferencing, runtime anchors are cached to avoid repeat calculations. More over, considerable speeding up can be obtained through vector acceleration and NMS algorithm improvement at post-processing stage.

Experiments have been carried out via GTX 1660Ti with CUDA 10.2 on KDE-Ubuntu 19.10.

Scale RetinaFace Faster RetinaFace Speed Up
0.1 2.854ms 2.155ms 32%
0.4 3.481ms 2.916ms 19%
1.0 5.743ms 5.413ms 6.1%
2.0 22.351ms 20.599ms 8.5%

Results of several scale factors at VGA resolution show that our method can speed up by 32%. As real resolution increases, the proportion of feature extraction time spent in the measurement process will increase significantly, which causes our acceleration effect to be diluted.

Plantform Inference Postprocess Throughput Capacity
9750HQ+1660TI 0.9ms 1.5ms 500~1000fps
Jetson-Nano 4.6ms 11.4ms 80~200fps

Theoretically speaking, throughput capacity can reach the highest while the queue is bigger enough.

Citation

@inproceedings{deng2019retinaface,
    title={RetinaFace: Single-stage Dense Face Localisation in the Wild},
    author={Deng, Jiankang and Guo, Jia and Yuxiang, Zhou and Jinke Yu and Irene Kotsia and Zafeiriou, Stefanos},
    booktitle={arxiv},
    year={2019}
}

faster-mobile-retinaface's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

faster-mobile-retinaface's Issues

Optimized NMS

@1996scarlet Hi, thanks for your sharing. would you mind sharing the details of the difference between the optimized NMS and original NMS? Thanks a lot.

landmarks

great work! in order to do more specific post-processing, most pipelines require more than the bounding box. Would it be possible to have a branch where your detector returns landmarks as well? At least for eyes, nose, mouth. Many thanks!

Python 2.7 or 3.6

Hi 1996,
Which version python 2.7 or 3.6 are you using for this code?
Best regards,
PeterPham

how can i detect face in image in face_detect.py? I am a newbie

here my code

    fd = MxnetDetectionModel("weights/16and32", 0,
                             scale=.4, gpu=-1, margin=0.15)
    img = cv2.imread('./images/test.jpg')
    copy = np.array(img)
    detach = fd.detect(copy)

    for res in fd._nms_wrapper(detach):
        cv2.rectangle(img, (res[0], res[1]),(res[2], res[3]), (255, 255, 0))
    cv2.imshow('face',img)
    cv2.waitKey()

and error

inferance: 0.009732400998473167
<generator object BaseDetection.non_maximum_suppression at 0x7f73341e28b8>
Traceback (most recent call last):
  File "face_detector.py", line 282, in <module>
    for res in fd._nms_wrapper(detach):
  File "face_detector.py", line 71, in non_maximum_suppression
    x1, y1, x2, y2, scores = dets.T
AttributeError: 'generator' object has no attribute 'T'

Thanks for your help!!

Replace opencv with gstreamer

How to change this code with opencv instead of gstreamer?
Whats difference of between gstreamer and opencv in cpu usage and speed?
My goal for this project is to have a large number of cameras for high fps processing.
For example 20 cameras
But the problem is that more than 7 cameras,my cpu usage reaches over 98% and it is not possible to add a new one.
How can I do this project on 20 cameras at a time?
490mb is model loaded by gpu for simgle camera and I probably won't get down to the gpu resource source. I think my problem is the cpu resource
gpu: 1080ti(11gb), cpu: core i7 9700k
Or at least how to choose the system I need for this project?

WARNING: erroneous pipeline: no element "v4l2src"

gst-launch-1.0 -q v4l2src device=/dev/video0 ! video/x-raw, width=640, height=480 ! videoconvert ! video/x-raw, format=BGR ! fdsink | python3 face_detector.py
WARNING: erroneous pipeline: no element "v4l2src"
[23:49:41] src/nnvm/legacy_json_util.cc:209: Loading symbol saved by previous version v1.5.0. Attempting to upgrade...
[23:49:41] src/nnvm/legacy_json_util.cc:217: Symbol successfully upgraded!

I'm try run code but it stop here. It no display camera
How to fix?
Thank you very much!!!

Great work and small issue

Hi Scarlet,

First of all, thanks for the great work.
I tried your algorithm yesterday and it was super simple to use.
I have some questions.

I can get the speed boost but I cannot get the same result on Wider face.
When I m trying to detect small and big faces at the same time, it doesn 't seem to work.

Do you have sample that we could test on in full HD ?

Thanks

Issue from command line

Hi 1996,
When i run the command line from your code: gst-launch-1.0 -q v4l2src device=/dev/video0 ! video/x-raw, width=640, height=480 ! videoconvert ! video/x-raw, format=BGR ! fdsink | python3 face_detector.py
I am facing the issue:
[20:26:03] src/nnvm/legacy_json_util.cc:209: Loading symbol saved by previous version v1.5.0. Attempting to upgrade...
[20:26:03] src/nnvm/legacy_json_util.cc:217: Symbol successfully upgraded!
terminate called recursively
Aborted (core dumped)

Best regards,
PeterPham

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.