salihkaragoz / pose-residual-network-pytorch Goto Github PK

Code for the Pose Residual Network introduced in 'MultiPoseNet: Fast Multi-Person Pose Estimation using Pose Residual Network' paper https://arxiv.org/abs/1807.04067

License: Other

Python 99.40% Shell 0.60%

pytorch python human-pose-estimation deep-neural-networks human-behavior-understanding pose-estimation

pose-residual-network-pytorch's Introduction

Pose Residual Network

This repository contains a PyTorch implementation of the Pose Residual Network (PRN) presented in our ECCV 2018 paper:

Muhammed Kocabas, Salih Karagoz, Emre Akbas. MultiPoseNet: Fast Multi-Person Pose Estimation using Pose Residual Network. In ECCV, 2018. arxiv

PRN is described in Section 3.2 of the paper.

Getting Started

We have tested our method on Coco Dataset

Prerequisites

python
pytorch
numpy
tqdm
pycocotools
progress
scikit-image

Installing

Clone this repository git clone https://github.com/salihkaragoz/pose-residual-network-pytorch.git
Install Pytorch
pip install -r src/requirements.txt
To download COCO dataset train2017 and val2017 annotations run: bash data/coco.sh. (data size: ~240Mb)

Training

python train.py

For more options look at opt.py

Testing

Download pre-train model
python test.py --test_cp=PathToPreTrainModel/PRN.pth.tar

Results

Results on COCO val2017 Ground Truth data.

 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets= 20 ] = 0.892
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets= 20 ] = 0.978
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets= 20 ] = 0.921
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets= 20 ] = 0.883
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets= 20 ] = 0.912
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 20 ] = 0.917
 Average Recall     (AR) @[ IoU=0.50      | area=   all | maxDets= 20 ] = 0.982
 Average Recall     (AR) @[ IoU=0.75      | area=   all | maxDets= 20 ] = 0.937
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets= 20 ] = 0.902
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets= 20 ] = 0.944

License

Citation

If you find this code useful for your research, please consider citing our paper:

@Inproceedings{kocabas18prn,
  Title          = {Multi{P}ose{N}et: Fast Multi-Person Pose Estimation using Pose Residual Network},
  Author         = {Kocabas, Muhammed and Karagoz, Salih and Akbas, Emre},
  Booktitle      = {European Conference on Computer Vision (ECCV)},
  Year           = {2018}
}

pose-residual-network-pytorch's People

Contributors

Stargazers

Watchers

pose-residual-network-pytorch's Issues

More details about the Fig.2 in your paper.

@salihkaragoz Thanks for your excellent work and shared repo. I'm very interested in the Fig.2 results of your paper, and I would like to reproduce these results. And, would you mind sharing your code about how to get sample poses obtained via clustering the structures learned by PRN.
Thanks.
Looking forward to your any replies

funny work

Licensing

Hello,

What is the license of this model ?

Thank you

Softmax across all keypoints?

class Flatten(nn.Module):
    def forward(self, input):
        return input.view(input.size(0), -1)

class PRN(nn.Module):
    def __init__(self,node_count,coeff):
        ...
        self.softmax   = nn.Softmax(dim=1)

    def forward(self, x):
        res = self.flatten(x)
        ...
        out = self.add(out,res)  # [N,H*W*C]
        out = self.softmax(out)
        out = out.view(out.size()[0],self.height, self.width, 17)

        return out

Pre-Trained model tar archive corrupted

@salihkaragoz Provided pre trained model tar archive to reproduce test results is corrupted,
https://drive.google.com/file/d/1OhdMllLGnpRAk6Wexw8LzXF_EHiolVj1/view?usp=sharing

Can you please provide a new one?

Thank you in advance.

Corrupted tar archive

Hello,
Although the name of the issue is self-explanatory, I'll add a few details here :

The archive is 806MB, that seems a bit large, especially when the pth file of a trained retina net is less than 200MB
The file can't be opened, which is a shame, I'd love to replicate your results :)

I hope you will be able to help, have a nice day !

Keypoint Estimation Subnet

Does this code contain the implementation of Keypoint Estimation Subnet? And how to add a loss at each level of K features in Keypoint Estimation Subnet? Thanks!

Confuse of input and label

The dataloader of coco dataset shows the details of the work how to exploit in network training. But I check the dataloader function, the weight and output actually is same "dataloader.py---- line 43-64 and line 75--95", the input is the weighs variable from gendata function, why the input is coming from the known keypoints position information rather than the true image data? It is definitely different from what u said in the paper. It means that u use the known label to predict the known label? Does it make sense? If I have some misunderstanding about the code, please let me know.

for j in range(17):
if kpv[j] > 0:
x0 = int((kpx[j] - x) * x_scale)
y0 = int((kpy[j] - y) * y_scale)

            if x0 >= self.bbox_width and y0 >= self.bbox_height:
                output[self.bbox_height - 1, self.bbox_width - 1, j] = 1
            elif x0 >= self.bbox_width:
                output[y0, self.bbox_width - 1, j] = 1
            elif y0 >= self.bbox_height:
                try:
                    output[self.bbox_height - 1, x0, j] = 1
                except:
                    output[self.bbox_height - 1, 0, j] = 1
            elif x0 < 0 and y0 < 0:
                output[0, 0, j] = 1
            elif x0 < 0:
                output[y0, 0, j] = 1
            elif y0 < 0:
                output[0, x0, j] = 1
            else:
                output[y0, x0, j] = 1

    img_id = ann_data['image_id']
    img_data = coco.loadImgs(img_id)[0]
    ann_data = coco.loadAnns(coco.getAnnIds(img_data['id']))

    for ann in ann_data:
        kpx = ann['keypoints'][0::3]
        kpy = ann['keypoints'][1::3]
        kpv = ann['keypoints'][2::3]

        for j in range(17):
            if kpv[j] > 0:
                if (kpx[j] > bbox[0] - bbox[2] * self.threshold and kpx[j] < bbox[0] + bbox[2] * (1 + self.threshold)):
                    if (kpy[j] > bbox[1] - bbox[3] * self.threshold and kpy[j] < bbox[1] + bbox[3] * (1 + self.threshold)):
                        x0 = int((kpx[j] - x) * x_scale)
                        y0 = int((kpy[j] - y) * y_scale)

                        if x0 >= self.bbox_width and y0 >= self.bbox_height:
                            weights[self.bbox_height - 1, self.bbox_width - 1, j] = 1
                        elif x0 >= self.bbox_width:
                            weights[y0, self.bbox_width - 1, j] = 1
                        elif y0 >= self.bbox_height:
                            weights[self.bbox_height - 1, x0, j] = 1
                        elif x0 < 0 and y0 < 0:
                            weights[0, 0, j] = 1
                        elif x0 < 0:
                            weights[y0, 0, j] = 1
                        elif y0 < 0:
                            weights[0, x0, j] = 1
                        else:
                            weights[y0, x0, j] = 1

    for t in range(17):
        weights[:, :, t] = gaussian(weights[:, :, t])
    output = gaussian(output, sigma=2, mode='constant', multichannel=True)
    # weights = gaussian_multi_input_mp(weights)
    # output = gaussian_multi_output(output)
    return weights, output

it's just a joke

pls give up your trying. i check the model, the kernel are just some linear layers with the size of 1024*34272

Downloadable weights

Hi. I'm very interested in this implementation. For now I'm gonna try training myself.

But do you think you'll put some downloadable weights that reach the scores thrown in the article ? I'd be very interested.

Thanks.

Hey @VladislavZavadskyy,

Could you clearly describe your problem instead of making a groundless judgment? This repo isn't a full pipeline of the things that we introduced in our recent paper, just a demo of the main contribution to help people to understand the idea. If you can state your issue in a concise way, maybe we can guide you to correct resources.

Thanks,

Originally posted by @mkocabas in #14 (comment)

Yes we would like you to direct us exactly the training methodology your preprocessing pipeline and ur hyper paramater tuning strategy. Or you could just release ur pretrained model which you were promising from long time.

run_webcam

@salihkaragoz Is it possible to provide a script to do the inference using a webcam ?

release all the code

thanks for sharing your code!
I am very interested in your code. I want to train your network from scratch. Can you release all the code(backbone, keypoint subnet, person detection subnet and pose residual net)? thank you very much!

help

Hello, I am studying your thesis recently and trying to run train.py on Linux, but it appears core dumped , what should I do?

Zero division by bbox[3]

Hi,
I got a zero division error during the training, I'm wondering when a bbox[3] has a zero value?
Please help to solve this issue, thanks a lot!

index created!
14%|██████████████████▊ | 9072/64115 [11:40<52:52, 17.35it/s]Traceback (most recent call last):
File "train.py", line 81, in
main(option)
File "train.py", line 68, in main
Evaluation(model, opt)
File "~/pose_residual_network.pytorch/src/eval.py", line 221, in Evaluation
y_scale = float(h) / math.ceil(b[3])
ZeroDivisionError: float division by zero
14%|██████████████████▌ | 9072/64115 [11:40<1:10:52, 12.95it/s]

How to build a entire solution with video or image as input?

@salihkaragoz
Thanks for you excellent works .I read the the code and found it is a key part of the solution your papers mentioned .
do you have a entire solution with video or image as input?
thanks!

Are there some wrong?

Why I can't find the backbone and RPN?

Please add a licence

question about segmentation and detection ？

i can not find the related code about detection and segmentation in dataloader and network output, where is it? thanks

Training Result not well

Dear salihkaragoz,
Thanks for you code. The results trained on your code seems not fine as your git . Something options missing?

Thanks for your reply!

Total Step: 17372 | Total Epoch: 16
17372it [05:57, 48.62it/s] | Epoch: 15 Total: 1:59:03 | ETA: 18:15:13 | loss:0.00240877992474
------------Evaulation Started------------
loading annotations into memory...
Done (t=0.17s)
creating index...
index created!
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2693/2693 [01:22<00:00, 32.71it/s]
Loading and preparing results...
DONE (t=0.11s)
creating index...
index created!
Running per image evaluation...
Evaluate annotation type keypoints
DONE (t=4.03s).
Accumulating evaluation results...
DONE (t=0.05s).
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets= 20 ] = 0.852
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets= 20 ] = 0.967
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets= 20 ] = 0.884
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets= 20 ] = 0.835
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets= 20 ] = 0.889
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 20 ] = 0.886

Doubt with Our keypoints + Our bboox？

Hi@salihkaragoz Thank you for your novel work, i have some doubt when we do real test.

PRN is trained on both GT, when we test with our predicted keypoints and box, do we need to train it again, or just use the same model in this reposity? If we need to train it with Our keypoints + Our bboox, how can we prepare the input and target?

pretrain model is different from the defined model [Solved]

Hi, I download the code and pre-trained model. Then I just run test:
RuntimeError: Error(s) in loading state_dict for PRN:
Missing key(s) in state_dict: "bneck.weight", "bneck.bias".
Unexpected key(s) in state_dict: "dens3.weight", "dens3.bias".

It looks like the model definition is different from the pre-trainined model.

Find some weird code in eval.py

Hi, thanks for your work. When I looking your code about this repository , i find somewhere is weird in eval.py. When to get predicated bbox_keypoints, you used the true keypoints to assign the bbox_keypoints. The code in eval.py is about line 200 and line 205.

The peaks is true keypoints coordinate, is it right? It seems that used the true coordinate to assign the predicated bbox_keypoints. Actually i think the line 209~220 in eval.py is the right way to get real predicated bbox_keypoints.

May be you can give me some advice about this, thanks.