Giter Club home page Giter Club logo

yolo-hand-detection's Introduction

YOLO-Hand-Detection

Scene hand detection for real world images.

Hand Detection Example

Idea

To detect hand gestures, we first have to detect the hand position in space. This pre-trained network is able to extract hands out of a 2D RGB image, by using the YOLOv3 neural network.

There are already existing models available, mainly for MobileNetSSD networks. The goal of this model is to support a wider range of images and a more stable detector (hopefully ๐Ÿ™ˆ).

Dataset

The first version of this network has been trained on the CMU Hand DB dataset, which is free to access and download. Because the results were ok, but not satisfying, I used it to pre annotate more images and manually then corrected the pre-annotations.

Because Handtracking by Victor Dibia is using the Egohands dataset, I tried to include it into the training-set as well.

In the end, the training set consists of the CMU Hand DB, the Egohands dataset and my own trained images (mainly from marathon runners), called cross-hands.

Training

The training took about 10 hours on a single NVIDIA 1080TI and was performed with the YOLOv3 default architecture. I also trained the slim version of it called YOLOv3-tiny.

YOLOv3

Training Graph

Precision: 0.89 Recall: 0.85 F1-Score: 0.87 IoU: 69.8

YOLOv3-Tiny

Training Graph

Precision: 0.76 Recall: 0.69 F1-Score: 0.72 IoU: 53.67

YOLOv3-Tiny-PRN

The tiny version of YOLO has been improved by the partial residual networks paper. Because of that I trained YOLO-Tiny-PRN and share the results here too. It is interesting to see that the Yolov3-Tiny-PRN performance comes close to the original Yolov3!

Training Graph

Precision: 0.89 Recall: 0.79 F1-Score: 0.83 IoU: 68.47

YOLOv4-Tiny

With the recent version of YOLOv4 it was interesting to see how good it performs against it's predecessor. Same precision, but better recall and IoU.

Training Graph

Precision: 0.89 Recall: 0.89 F1-Score: 0.89 IoU: 91.48

Testing

I could not test the model on the same dataset as for example the Egohands dataset, because I mixed the training and testing samples together and created my own test-dataset out of it.

As soon as I have time, I will publish a comparison of my trained data vs. for example Handtracking.

Inferencing

The models have been trained on an image size 416x416. It is also possible to inference it with a lower model size to increase the speed. A good performance / accuracy mix on CPUs has been discovered by using an image size of 256x256.

The model itself is fully compatible with the opencv dnn module and just ready to use.

Demo

To run the demo, please first install all the dependencies (requirements.txt) into a virtual environment and download the model and weights into the model folder (or run the shell script).

# mac / linux
cd models && sh ./download-models.sh

# windows
cd models && powershell .\download-models.ps1

Then run the following command to start a webcam detector with YOLOv3:

# with python 3
python demo_webcam.py

Or this one to run a webcam detrector with YOLOv3 tiny:

# with python 3
python demo_webcam.py -n tiny

For YOLOv3-Tiny-PRN use the following command:

# with python 3
python demo_webcam.py -n prn

For YOLOv4-Tiny use the following command:

# with python 3
python demo_webcam.py -n v4-tiny

Download

If you are interested in the CMU Hand DB results, please check the release section.

About

Trained by cansik, datasets are described in the readme and fall under the terms and conditions of their owners.

All the demo images have been downloaded from unsplash.com:

Tim Marshall, Zachary Nelson, John Torcasio, Andy Falconer, Sherise, Alexis Brown

yolo-hand-detection's People

Contributors

cansik avatar djthegr8 avatar roachsinai avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

yolo-hand-detection's Issues

How to retrain the model ?

Congrats on your work, I've tested and it had a nice performance in my dataset. On the other hand, to get better results I need to fine-tune this pre-trained model with my own dataset, but I noticed that there isn't a training algorithm. So, my question is: there is a way to retrain this model to fine-tune my model?

How to use the CMU Hand DB?

Hi, Thank for sharing and it's a pretty great projects!

But I hava a question about how to use the CMU hand Datasets?? Because I didn't find the bbox label about hand and it only seems to be related to the key points of the hand.

THX.

Conversion To TensorFlow or Keras

Hey,
So I wanted to do hand detection (on the browser, using TFjS), but for the conversion to a tfjs model, I would require a TensorFlow or Keras model. Could you guide me in converting these models to TF?

No detection with webcam mode

I tried the normal and v4-tiny model with demo_webcam.py
Everything is at default but no detection can be seen...not a single one, and of coz I tried different angles and stuff but no avail.
I am also a follower of yolov4 btw

Datasets: How to reconstruct

Hi!

You wrote

In the end, the training set consists of the CMU Hand DB, the Egohands dataset and my own trained images (mainly from marathon runners), called cross-hands.

How can we find the dataaset to reconstruct the trainning?

Datasets

Any interest in uploaded your datasets? I can contribute by training larger models (and yolov4) and add them to this repo.

the demo with v4-tiny modle can not work

i use cmmand "python demo_webcam.py -n v4-tiny",the detector can not find hands,. And "python demo_webcam.py -n v4-tiny -c 0", it also can not work., how to fix it?

How can I get every detected hand coordinate

Hello, how can I get coordinates of every detected hand x and y coordinates. In your code for every hand given only one x and y.
I want to get hand1: x1, y1; hand2: x2, y2. How can I get this??

Missing source code

Unfortunately there is only a README file but no source code. Is the source code somehow encoded in the README?
Excuse my stupid question, I am new to YOLO and hand tracking.

Failed to parse NetParameter file

When running the code on PyCharm using command: python demo_webcam.py, I receive this error:

cv2.error: OpenCV(4.4.0) C:\Users\appveyor\AppData\Local\Temp\1\pip-req-build-q0nmoxxv\opencv\modules\dnn\src\darknet\darknet_importer.cpp:207: error: (-212:Parsing error) Failed to parse NetParameter file: models/cross-hands.cfg in function 'cv::dnn::dnn4_v20200609::
readNetFromDarknet'

Any way to resolve it?

ZeroDivisionError: integer division or modulo by zero in demo.py

i get this error, when i use demo.py with some image that i took on my phone. does the image needs to be in specific size?

python demo.py -i test.jpg

loading yolo...
extracting tags for each image...
Traceback (most recent call last):
File "demo.py", line 78, in
print("AVG Confidence: %s Count: %s" % (round(conf_sum / detection_count, 2), detection_count))
ZeroDivisionError: integer division or modulo by zero

Bound box is sometimes partially formed

Photos 2_8_2022 5_46_30 PM
I have used your model for detecting on my own photos but sometimes the box is not completed like in the attached photo
what can I do?
also few times hands are not recognized, what processing can I do to solve this case?
Thanks a lot for this effort

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.