Giter Club home page Giter Club logo

nsrmhand's Introduction

NSRMhand

Official implementation of WACV 2020 paper Nonparametric Structure Regularization Machine for 2D Hand Pose Estimation with Pytorch

Abstract

Hand pose estimation is more challenging than body pose estimation due to severe articulation, self-occlusion and high dexterity of the hand. Current approaches often rely on a popular body pose algorithm, such as the Convolutional Pose Machine (CPM), to learn 2D keypoint features. These algorithms cannot adequately address the unique challenges of hand pose estimation, because they are trained solely based on keypoint positions without seeking to explicitly model structural relationship between them. We propose a novel Nonparametric Structure Regularization Machine (NSRM) for 2D hand pose estimation, adopting a cascade multi-task architecture to learn hand structure and keypoint representations jointly. The structure learning is guided by synthetic hand mask representations, which are directly computed from keypoint positions, and is further strengthened by a novel probabilistic representation of hand limbs and an anatomically inspired composition strategy of mask synthesis. We conduct extensive studies on two public datasets - OneHand 10k and CMU Panoptic Hand. Experimental results demonstrate that explicitly enforcing structure learning consistently improves pose estimation accuracy of CPM baseline models, by 1.17% on the first dataset and 4.01% on the second one.

Visualization of our proposed LDM-G1, LPM-G1, and our network structure.
LPM G1 LDM G6

net

Highlights

  • We propose a novel cascade structure regularization methodology for 2D hand pose estimation, which utilizes synthetic hand masks to guide keypoints structure learning.
  • We propose a novel probabilistic representation of hand limbs and an anatomically inspired composition strategy for hand mask synthesis.

Running

Prepare

pytorch >= 1.0  
torchvision >= 0.2 
numpy  
matplotlib 

Inference

  1. Download our trained model (LPM G1&6) by running sh weights/download.sh or you can download it directly from this Dropbox link

  2. For pose estimation on the demo hand image, run

python inference.py

We provide example images in images folder. If set up correctly, the output should look like

input output

Note: this model is only trained on Panoptic hand dataset, thus it may not work very well on other scene.

Training

  1. Please download the Panoptic Hand dataset from their official website, and crop it based on 2.2x ground truth bounding box. For your convenience, you can download our preprocessed dataset from here. Please DO NOT duplicate it for any commercial purposes, and the copyright still belongs to Panoptic. If you want to train your own dataset, please also format it based on this data_sample/ folder.

  2. Specify your configuration in configs/xxx.json.
    You can also use the default parameter settings, but remember to change the data root.

  3. Train model by

python main.py + xxx.json

For example, if you want to train LPM G1, you should run

python main.py LPM_G1.json

Notation

  • The most creativie part of our model is the structure representation, which is generated from keypoints only.
    you can see dataset/hand_ldm.py and dataset/hand_lpm.py for detail and adapt it for other tasks.

  • Since this is a multi task learning problem, the weight ande decay ratio of keypoint confidence map loss and NSRM loss may vary for different dataset, you may need to adjust these parameters for your own dataset.

  • In our experiments and code, we only apply our NSRM to CPM, but we believe it will also work for other pose estimation network, such as Stacked Hourglass, HR-Net.

Citation

If you find this project useful for your research, please use the following BibTeX entry. Thank you!

@inproceedings{chen2020nonparametric,                    
  title={Nonparametric Structure Regularization Machine for 2D Hand Pose Estimation},                 
  author={Chen, Yifei and Ma, Haoyu and Kong, Deying and Yan, Xiangyi and Wu, Jianbao and Fan, Wei and Xie, Xiaohui},             
  booktitle={The IEEE Winter Conference on Applications of Computer Vision},                     
  pages={381--390},                  
  year={2020}              
}            

nsrmhand's People

Contributors

howiema avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

nsrmhand's Issues

How to use NSRM in HR-Net

hi, @HowieMa
Thanks for sharing your code, I am very interested in your ideas. You said in readme.md that NSRM can be used for HR-Net, but the network has no intermediate supervision, so how to perform a cascaded multi-task architecture to learn the hand structure and keypoint representations jointly.Could you give some advice?Looking forward to your reply.

CMU Panoptic file

Hi,

Thank you for such an amazing project and repository!

I was wondering if you can provide me the preprocessing file for the CMU Panoptic dataset as I want to learn more about the preprocessing steps of it and heatmap generation through your code.
Hoping for a positive response!

train dataset download

Thanks for pretty project. Where to download the training data?I am looking forward to receiving your reply。

model structure in code incompatible with in paper

In paper the network layout was defined as using 3 structure stages and 3 keypoint stages, while in code the 6 stages were divided into 1 structure stage and 5 keypoint stages. What is the reason for such difference in terms of implementation or is it merely a typo?

How to get 21 keypoints annotations after cropping the original image?

Hi @HowieMa
Thanks for your great work, I was confused how you get 21 keypoints annotations after cropping the original image. Manual annotation or some trick that transform annotation from the original annotation of CMU dataset? Cause I want to training for another datasets, I want to know how you preprocess the dataset. Looking forward to your reply.

accuracy in Panoptic

Thanks for pretty project. I meet a question about PCK accuracy. Use your project in Panoptic data,In train step ”Current Best EPOCH is : 32, PCK is : 0.9577505546445453“,but In Test step appear
"0.04": 0.5353691038114515,
"0.06": 0.5957357993770676,
"0.08": 0.6186944096586716,
"0.1": 0.6288090421603569,
"0.12": 0.6367080884950065
Test some picture,the result very difficult. hope your answer,Thanks.

What do mask1, mask2 and mask3 mean?

when generating limb structure, we use
mask1 = cross <= 0 # 46 * 46
mask2 = cross >= length2
mask3 = 1 - mask1 | mask2

D2 = np.zeros((self.label_size, self.label_size))
D2 += mask1.astype('float32') * ((x - x1) * (x - x1) + (y - y1) * (y - y1))
D2 += mask2.astype('float32') * ((x - x2) * (x - x2) + (y - y2) * (y - y2))
D2 += mask3.astype('float32') * ((x - px) * (x - px) + (py - y) * (py - y))

loss function

Thank you very much for your work. I have a problem: if nn. MSELoss () is used as the loss function, and the Heatmap output from the network is directly compared with the generated Heatmap, the loss value will be abnormally large. How to deal with this problem?
I found that your code is:
criterion = nn.MSELoss(reduction='sum')
loss = criterion(pred, target)
return loss / (pred.shape[0] * 46.0 * 46.0)
Do you use this loss function to avoid excessive loss?

demo output

I used your trained model and ran the "inference.py" ,
but the output is different from yours,
like this
image
what can I modify?

thank you a lot !

Can the model predict hand boundary boxes on its own?

Hello and thank you for making the code available. I greatly appreciate your efforts.

According to your paper on this work, the NSRMhand model only focuses on hand pose estimation. May I inquire as to whether the proposed methodology is capable of both hand detection and hand landmark localization? Alternatively, does the model need the hand boundary box coordinates or can it predict the boundary boxes on its own? Furthermore, if the model always assumes there is a hand in the input image?

Thank you in advance!

src.augmentation missing

Looks like the augmentation code is missing. And there is no augmentation done in the data loader.

Required dataset

Only the Panoptic Hand dataset is needed to train the model?The paper says that OneHand 10k is still needed, but we have found out how to configure it。

missing limb structure problems

It says Gth of limb representatios to the missing keypoints are set to zero maps, the limb structure in the missing keypoints will be missing too. Will it affect the subsequent pose mudule regression? How can we get the better and stabler hand keypoints you prefer?

./CPM used in training?

Is the model in ./CPM folder used anywhere in the training process of the limb model? I think there are quite a few duplicated code in ./CPM as well as in other parts of the code body such as in main.py and in hand_ldm.py, etc. and I got confused in which files actually were used in training.

How to run the OneHand10K dataset

@HowieMa
I don't know how the program should run the OneHand10K dataset. Can you provide a flow?Thank you very much!
It means there is cmuhand.py, can you provide onehand.py and other relevant file?

Encounter segfault when running inference.py

Running inference.py on Ubuntu and encountered segfault on line
117 state_dict = torch.load(args.resume)
Would you mind suggesting any possible cause that this might be happening?

mediapipe hand landmark detect

hi ,thank you for your greate work ,have you tested the mediapipe handTracking demo? it works very good , I analyze the mediapipe handTracking model, it is very simple ,but I can't do it with the precision,can you give me some ideas about it? thanks

Question about LM loss

Hello, thanks for releasing the code. I really appreciate your work.

I have a question: When I train the code, the LM loss descends gradually at the beginning. However, it ascends gradually after the 50th epoch. Have you ever meet this problem? I look forward to your reply.

hand tightest bounding box?

hi, i have a question about the hand tightest bounding box,what your definition of this? Is the tightest bounding box enclosing all hand joints or expand a length basis of the before?
thanks

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.