bardiadoosti / hope Goto Github PK

Source code of CVPR 2020 paper, "HOPE-Net: A Graph-based Model for Hand-Object Pose Estimation"

Python 98.46% Shell 1.54%

graph-convolution deep-learning pytorch python real-time machine-learning computer-vision cvpr2020 hope-net graph hopenet pose-estimation

hope's Introduction

HOPE-Net: A Graph-based Model for Hand-Object Pose Estimation

Codes for HOPE-Net paper (CVPR 2020), a Graph Convolutional model for Hand-Object Pose Estimation (HOPE).

The goal of Hand-Object Pose Estimation (HOPE) is to jointly estimate the poses of both the hand and a handled object. Our HOPE-Net model can estimate the 2D and 3D hand and object poses in real-time, given a single image.

Architecture

The architecture of HOPE-Net. The model starts with ResNet as the image encoder and for predicting the initial 2D coordinates of the joints and object vertices. The coordinates concatenated with the image features used as the features of the input graph of a 3 layered graph convolution to use the power of neighbors features to estimate the better 2D pose. Finally the 2D coordinates predicted in the previous step are passed to our Adaptive Graph U-Net to find the 3D coordinates of the hand and object.

A schematic of our Adaptive Graph U-Net architecture, which is used to estimate 3D coordinates from 2D coordinates. In each of the pooling layers, we roughly cut the number of nodes in half, while in each unpooling layer, we double the number of nodes in the graph. The red arrows in the image are the skip layer features which are passed to the decoder to be concatenated with the unpooled features.

Datasets

To use the datasets used in the paper download First-Person Hand Action Dataset and HO-3D Dataset and update the root path in the make_data.py file located in each folder and run the make_data.py files to generate the .npy files.

Test Pretrained Model

First download First-Person Hand Action Dataset and make the .npy files. Then download and extract the pretrained model with the command below and then run the model using the pretrained weights.

GraphUNet

wget http://vision.soic.indiana.edu/wp/wp-content/uploads/graphunet.tar.gz
tar -xvf graphunet.tar.gz

python Graph.py \
  --input_file ./datasets/fhad/ \
  --test \
  --batch_size 64 \
  --model_def GraphUNet \
  --gpu \
  --gpu_number 0 \
  --pretrained_model ./checkpoints/graphunet/model-0.pkl

Citation

Please cite our paper if this code helps your research.

@InProceedings{Doosti_2020_CVPR,
author = {Bardia Doosti and Shujon Naha and Majid Mirbagheri and David Crandall},
title = {HOPE-Net: A Graph-based Model for Hand-Object Pose Estimation},
booktitle = {The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2020}
}

hope's People

Contributors

Stargazers

Watchers

hope's Issues

Nothing happen when I try to run the model using the pretrained weights.

I downloaded the dataset and generated the .npy files, and then, I downloaded the pretrained weights file model-0.pkl to run the testing code. But, when I was running this command python HOPE.py --input_file ./datasets/fhad/ --test --batch_size 64 --model_def HopeNet --gpu --gpu_number 0 --pretrained_model ./checkpoints/fhad/model-0.pkl, nothing happened, I can only see this in the terminal,

could you share the visualization code as in your psper?

Could you share the visualization code?

HopeNet

Does the pretrained model of HopeNet be available? Thanks a lot!

Intrinsic/Extrinsic parameters between different datasets

Thank you for this great work, you said in your comment here(#15 (comment)_) "Here the Adaptive Graph U-Net is exactly learning this transformation for a very specific camera and angle condition." but in the paper, in the last paragraph of the introduction, you mentioned that you pretrained the 2D to 3D GraphUNET on synthetic data (Obman) which have a totally different intrinsic/extrinsic parameters, would you please clarify this?

Thank you again for your work.

Percentage of Correct Predictions

Thank you for this great work, can you share the detailed data of picture 5 and picture 6 ? I want to know the detailed results of this paper and the comparative papers as a benchmark and reference.I really hope to hear from you.

How many epochs have you trained? How long does it take?

How many epochs have you trained? How long does it take? Thank you!

RuntimeError: invalid argument 7: expected 3D tensor at /pytorch/aten/src/THC/generic/THCTensorMathBlas.cu:489

I want to use it to test 2D image, but I got this error. What's the problem?
Code here:

def predict(model, test_image_name):
    transform = image_transforms['test']
    image = Image.open(test_image_name)
    image = transform(image)
    if torch.cuda.is_available():
        image = image.cuda()
    image = image.view(1, *image.size())
    image = Variable(image)
    with torch.no_grad():
        model.eval()
        out = model(image)
        print(out)
        
predict(model, "./pic/100.JPG")

Data processing for Obman dataset

Hello again @bardiadoosti !
I found that the Obman dataset is very complex. Would you share the code for data processing and PyTorch Dataloader for Obman dataset?

Can you provide the relevant code about PCK?

Can you provide the relevant code about PCK? Thank you very much.

Problem in the Pre-trained model

The provided pre-trained model is not accurate. I tested it on several samples of the HOnnotate dataset but the results totally deviated from what it should be. even the resultant testing loss was very high.
I know that this model has been trained on the First-hand dataset, but it should also give a close estimation on other datasets.
In the first place, I thought that the problem was on pre-processing the data before feeding it into the network, but this was not the case since I managed to re-train the model on HOnnotate and I managed to get a better estimation on the same dataset used in the training. However, I am still facing the same problem when I test it on data that was not being used in the training.
I am afraid that the code is causing over-fitting.
Please advise.

End to end training

Hi bardiadoosti, I noticed that the currently provided pre-trained model of GraphUNet only converts 2d point coordinates to 3d predictions, but I am more interested in the Resnet part, which converts RGB images to 2d point coordinates in the first place.
It will be greatly appreciated if you could add pretrained Resnet model to github, or the pretrained HOPE-net weight parameters.

Will the pretrained model be provided?

Could you please provide the pretrained model? Thanks!

Can you provide pretrained model?

Larger Test Error

Hi, I download the fhad dataset, make_data and run the code below:

$ python HOPE.py   --input_file ./datasets/fhad/   --test   --batch_size 16   --model_def HopeNet   --gpu   --gpu_number 0   --pretrained_model ./checkpoints/fhad/model-0.pkl

Then I get:

Test files loaded
HopeNet is loaded
Begin testing the network...
test error: 1609.94531

It is completely different to your paper.
My torch==1.5.0, torchvision==0.6

environment

Hi Bardia Doosti! Thank you for your excellent work and willing to share it. I want to know the environment configuration required by this code, I'm a beginner. I hope you don't mind some low-level questions. Looking forward to your reply, thank you!

Average error and PCK

Hi, I cloned the code and ran the pretrained model.
python HOPE.py --input_file ./datasets/fhad/ --test --batch_size 32 --model_def HopeNet --gpu --gpu_number 0 --pretrained_model ./checkpoints/fhad/model-0.pkl
The test loss is 253.
Then I calculated the average error and PCK of the pretrained model.
Average error is 33.6769 pixel.
For PCK, I got 10pixel@ 14.78%, 20pixel@ 42.79%, 30pixel@ 64.81%, 40pixel@ 77.59%, 50pixel@ 84.35%
(torch==1.1.0, torchvision==0.3.0)
It seems different from your results in the paper.
Mabe I got something wrong. Can you provide the average error and PCK of the pretrained model.
Hope for your reply. Thank you.

val loss is extreamly bigger than training loss, why?

I have done training some epoch (less 1000) on the whole HopeNet and Resnet10 part separately，i find training loss is going to converge at begining but val loss seem to falling more slowly and is extreamly larger than training loss even to 100 times.

trainging settings keep same (with bach size=64, learning rate=1e-3), i am not sure what happened ? Thanks for your help.

Test error is bigger than reported

Hi,

I know this issue has been raised in issue 13 and issue 17, and it doesn't seem that they are addressed. I was hoping if you could provide with some feedback.

I followed your github instruction to evaluate on the pre-trained model. The test error I got is:

Test files loaded
HopeNet is loaded
Begin testing the network...
test error: 1042.98572

which is different from the test error 253 from issue 17.

When I do python make_data.py, there are a few sequences without video files (I am not sure if it is expected). See the log:

...
pour_milk                                                                                                                                                                                                          
Error in ====Subject_6, pour_milk, 8====                                                                                                                                                                           
Error in ====Subject_6, pour_milk, 7====                                                                                                                                                                           
Error in ====Subject_6, pour_milk, 6====                                                                                                                                                                           
Error in ====Subject_6, pour_milk, 9====                                                                                                                                                                           
Error in ====Subject_6, pour_milk, 10====                                                                                                                                                                          
open_juice_bottle                                                                                                                                                                                                  
pour_juice_bottle                                                                                                                                                                                                  
open_milk                                                                                                                                                                                                          
Error in ====Subject_6, open_milk, 8====                                                                                                                                                                           
Error in ====Subject_6, open_milk, 7====                                                                                                                                                                           
Error in ====Subject_6, open_milk, 6====                                                                                                                                                                           
Error in ====Subject_6, open_milk, 9====                                                                                                                                                                           
Error in ====Subject_6, open_milk, 10====                                                                                                                                                                          
put_salt                                                                                                                                                                                                           
close_liquid_soap                                                                                                                                                                                                  
Error in ====Subject_6, close_liquid_soap, 8====                                                                                                                                                                   
Error in ====Subject_6, close_liquid_soap, 7====                                                                                                                                                                   
Error in ====Subject_6, close_liquid_soap, 6====                                                                                                                                                                   
Error in ====Subject_6, close_liquid_soap, 9====                                                                                                                                                                   
Error in ====Subject_6, close_liquid_soap, 10====                                                                                                                                                                  
open_liquid_soap   
Error in ====Subject_6, open_liquid_soap, 8====
Error in ====Subject_6, open_liquid_soap, 7====
Error in ====Subject_6, open_liquid_soap, 6====
Error in ====Subject_6, open_liquid_soap, 5====
pour_liquid_soap
Error in ====Subject_6, pour_liquid_soap, 8====
Error in ====Subject_6, pour_liquid_soap, 7====
Error in ====Subject_6, pour_liquid_soap, 6====
Error in ====Subject_6, pour_liquid_soap, 5====
close_juice_bottle
close_milk
Error in ====Subject_6, close_milk, 8====
Error in ====Subject_6, close_milk, 7====
Error in ====Subject_6, close_milk, 6====
Error in ====Subject_6, close_milk, 9====
Error in ====Subject_6, close_milk, 10====
Subject_2
pour_milk
open_juice_bottle
pour_juice_bottle
open_milk
...

If I let model.eval() in HOPE.py, the error drops a bit but I couldn't get it around 200:

Test files loaded
HopeNet is loaded
Begin testing the network...
test error: 969.59106

The test script that I used is:

#!/bin/bash

export CUDA_VISIBLE_DEVICES=0
python HOPE.py \
  --input_file ./datasets/fhad/ \
  --test \
  --batch_size 16 \
  --model_def HopeNet \
  --gpu \
  --gpu_number 0 \
  --pretrained_model ./checkpoints/fhad/model-0.pkl

I use pytorch 1.4, torchvision 0.5, and python 3.7.7

How can I get a model that just estimate keypoint of hand

your model is very effective.
I want get a hand pose net run in laptop or mobilephone which can get a position of hand keypoint.
So how can I seperate your hand pose estimate model to make my app.
(My code ability isn't particularly strong )
I already has a detection model to find specify hand gesture , but I still want a fast hand pose estimate model.
Thank you

how to test with camera or with a image?

The dataset you mentioned in the readme is too large to download. I would like to ask if the author has a quick test method? How to test with rbg camera or a picture?
Thanks！

Graphunet pretraining model

@bardiadoosti Hello, I failed to get the effect in your paper when I was training the Graphunet network. Could you share the training model of graphunet? Looking forward to your reply! Thank you very much!

Resnet50 in the code to resnet10 error?

@bardiadoosti Hello, I changed line 12 of the hopenet.py file in the modules folder to Resnet10. The following error occurred?

Train files loaded
Validation files loaded
HopeNet is loaded
Begin training the network...
Traceback (most recent call last):
File "HOPE.py", line 107, in
outputs2d_init, outputs2d, outputs3d = model(inputs)
File "/home/msl/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/nn/modules/module.py", line 541, in call
result = self.forward(*input, **kwargs)
File "/home/msl/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 150, in forward
return self.module(*inputs[0], **kwargs[0])
File "/home/msl/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/nn/modules/module.py", line 541, in call
result = self.forward(*input, **kwargs)
File "/home/msl/code/HOPE-master/models/hopenet.py", line 21, in forward
points2D = self.graphnet(in_features)
File "/home/msl/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/nn/modules/module.py", line 541, in call
result = self.forward(*input, **kwargs)
File "/home/msl/code/HOPE-master/models/graphunet.py", line 179, in forward
X_0 = self.gconv1(X, self.A_hat)
File "/home/msl/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/nn/modules/module.py", line 541, in call
result = self.forward(*input, **kwargs)
File "/home/msl/code/HOPE-master/models/graphunet.py", line 44, in forward
X = self.fc(torch.bmm(self.laplacian_batch(A_hat), X))
File "/home/msl/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/nn/modules/module.py", line 541, in call
result = self.forward(*input, **kwargs)
File "/home/msl/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/nn/modules/linear.py", line 87, in forward
return F.linear(input, self.weight, self.bias)
File "/home/msl/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/nn/functional.py", line 1372, in linear
output = input.matmul(weight.t())
RuntimeError: size mismatch, m1: [464 x 514], m2: [2050 x 128] at /opt/conda/conda-bld/pytorch_1573049304260/work/aten/src/THC/generic/THCTensorMathBlas.cu:290

Abou labels

Hello, There somes questions in your project.
2D_init_label: Input image size 224*224, However label range(0, 1920-1280), Do not sync label with image resize. I never see this operation.
3D_lables: Use 3d label in camera space directly. This operate means predict absolute depth that 2D image predict absolute include ambiguous. Almost 3D paper predict root-relate depth or use both net to preidct root-relate and absolute respective.
Hope your reply. Thank you.

Just a question

@bardiadoosti , great work! Thanks for sharing your codes!

I'd like to ask some questions. Can we use your pretrained model (which is trained on the fhad dataset) on HO-3D Dataset, or on my own dataset (assuming the objects and hands in my dataset are of similar size to those in fhad dataset) and output good 3D poses of hands and objects? In other words, are you able to train on fhad and HO-3D datasets to generate a model that will work for both datasets?

Thanks very much for your kind reply!

Weight of model

Hello, your research is very valuable, I am trying to apply your research in the field of robotics, and I really need the model parameters of the whole HOPE-Net, I hope you are willing to share the weight you have trained, I will be very grateful to you.

Help, about download fhad dataset

Can someone provide a download Baidu Cloud link for the fhad dataset?

I can not download fhad from https://guiggh.github.io/publications/first-person-hands/
Help~~
Thanks very much!

Could you provide code to run the model on a single image?

Hi! Since the limitation of my computer, I would like to run the model on a single image once a time instead of a whole dataset.

May you provide code to do so?

`obman` pretrained model

Hi,

Thanks for sharing this awesome repo! I am wondering if you could provide the obman model from the pre-training?

Thanks

About Net struct

Thank you for great project.
The Net use Linear struct, output 2d shape: num_keypoints * 2, 3d shape: num_keypoins * 3 respectly. However regression coordinate directly always in previous, almost use heatmap in recently.Have you ever test heatmap in your project?

License

Could you provide a license for the repository?
I want to work on a project and it requires license of the codes that I have used

A question about GraphNet.

I notice that the graph structure used in GraphNet and GraphUnet is identity matrix which means each node only connects itself. I wonder if there a misunderstanding here and I just want to know the reason.
Thank you.

The range or unit of the output result

Hi Bardia Doosti! Thank you for your excellent work and willing to share it. Recently I am trying to predic the 21 key points of hand, but now I have some questions:

Is the unit based on pixels when your network outputs 2D? Or another unit of length?
I tried to train on the HO3D dataset because I wanted to predict from the third-person perspective. Are your training steps the same as FHAD for the HO3D dataset?
Looking forward to your reply, thank you.

Some weights and baises are missing in the graphunet pre-trained model

Hi bardiadoosti,

Thanks for updating the trained model and sharing it again. However, there are some missing weights in the shared model. I run the updated testing code and get the following error:

RuntimeError: Error(s) in loading state_dict for DataParallel:
Missing key(s) in state_dict: "module.pool1.fc.weight", "module.pool1.fc.bias", "module.pool2.fc.weight", "module.pool2.fc.bias", "module.pool3.fc.weight", "module.pool3.fc.bias", "module.pool4.fc.weight", "module.pool4.fc.bias", "module.pool5.fc.weight", "module.pool5.fc.bias".

It seems to me that the pooling 1, 2, 3, 4, and 5 layers weights and biases are missing in the updated trained model. Please, let me know if I am wrong.

Thanks.

plot the results

hi @bardiadoosti how to plot the results? Could you share the sample code?

Will dataset be still required for training after make_data.py execution?

I have downloaded the dataset from the given sources and created the *.npy files using the make_data.py file. Do I still need dataset images/videos for training?

Details about pretraining on ObMan

Hi Bardia Doosti! Thank you for your excellent work. I am trying to reimplement your work and have some questions about details of pretraining on Obman dataset. The whole network can be splitted into two main parts: ResNet10+graph convolution(we name it as part1 ) and the Adaptive graph U-Net. I think there are 3 options to pretrain on Obman:

Pretrain the part1 and Graph U-Net seperately. For part1, use the image from Obman as input and get 2D prediction from part1. The loss is computed between part1’s output and GT 2D points from Obman. For part2, use the GT 2D points as input, the loss is computed between Graph U-Net’s output and GT 3D points.
Pretrain the whole network end2end. That is, use image as input and the loss is computed between final 3D points output and GT 3D points.
Pretrain only the Graph U-Net.

I don’t know the exact pretraining method used in the paper, could you please clarify this? Also, could you offer other detail parameters for pretraining such as learning rate, optimizer, and epochs? Or just detailed code for pretraining can also help very much.

Thank you again for your work.

The pretrained Graph U-Net model can not give same results in the paper

Hi! @bardiadoosti
Thank you for your excellent work
I test the pretrained Graph U-Net model from http://vision.soic.indiana.edu/wp/wp-content/uploads/graphunet.tar.gz on FPHA dataset by feeding the GT 2d coordinates. I got the following results:
test mse loss: 702.49
test mean distance error(on 29 keypoints): 28.06 mm

However, in the Tabel 1&2 of the paper, the test mean distance error is 6.81mm. I am really confused about the large gap.
Looking for you reply and thank you again.

custom training

how to perform custom training with others data? such as other size and shape object ,what we need to do? @ @ @bardiadoosti

Using GT 2d keypoints for testing?

Hi,

I found that there is no pre-trained weights for HOPE-Net but only the Graph network, and in Graph.py, you use label2d directly for testing instead of the results from the network. Is this correct?