Giter Club home page Giter Club logo

3d-multi-person-pose's Introduction

Monocular 3D Multi-Person Pose Estimation by Integrating Top-Down and Bottom-Up Networks

arXiv PWC PWC PWC

Introduction

This repository contains the code and models for the following paper.

Dual networks based 3D Multi-Person Pose Estimation from Monocular Video
Cheng Yu, Bo Wang, Robby T. Tan
IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022.

Demo video

Watch the video

Updates

  • 05/18/2022 link to demo video of the TPAMI journal paper is added in the readme
  • 08/27/2021 evaluation code of PCK and PCK_abs is updated by using bone length normalization option with dataset adaptation
  • 06/18/2021 evaluation code of PCK (person-centric) and PCK_abs (camera-centric), and pre-trained model for MuPoTS dataset tested and released

Installation

Dependencies

Pytorch >= 1.5
Python >= 3.6

Create an enviroment.

conda create -n 3dmpp python=3.6
conda activate 3dmpp

Install the latest version of pytorch (tested on pytorch 1.5 - 1.7) based on your OS and GPU driver installed following install pytorch. For example, command to use on Linux with CUDA 11.0 is like:

pip install torch==1.7.1+cu110 torchvision==0.8.2+cu110 torchaudio==0.7.2 -f https://download.pytorch.org/whl/torch_stable.html

Install dependencies

pip install -r requirements.txt

Build the Fast Gaussian Map tool:

cd lib/fastgaus
python setup.py build_ext --inplace
cd ../..

Models and Testing Data

Pre-trained Models

Download the pre-trained model and processed human keypoint files here, and unzip the downloaded zip file to this project's root directory, two folders are expected to see after doing that (i.e., ./ckpts and ./mupots).

MuPoTS Dataset

MuPoTS eval set is needed to perform evaluation as the results reported in Table 3 in the main paper, which is available on the MuPoTS dataset website. You need to download the mupots-3d-eval.zip file, unzip it, and run get_mupots-3d.sh to download the dataset.

if you encounter an error like: /bin/bash: bad interpreter ... , just launch:

sed -i 's/\r$//' get-mupots-3d.sh

After the download is complete, a MultiPersonTestSet.zip is avaiable, ~5.6 GB. Unzip it and move the folder MultiPersonTestSet to the root directory of the project to perform evaluation on MuPoTS test set. Now you should see the following directory structure.

${3D-Multi-Person-Pose_ROOT}
|-- ckpts              <-- the downloaded pre-trained Models
|-- lib
|-- MultiPersonTestSet <-- the newly added MuPoTS eval set
|-- mupots             <-- the downloaded processed human keypoint files
|-- util
|-- 3DMPP_framework.png
|-- calculate_mupots_btmup.py
|-- other python code, LICENSE, and README files
...

Usage

MuPoTS dataset evaluation

3D Multi-Person Pose Estimation Evaluation on MuPoTS Dataset

The following table is similar to Table 3 in the main paper, where the quantitative evaluations on MuPoTS-3D dataset are provided (best performance in bold). Evaluation instructions to reproduce the results (PCK and PCK_abs) are provided in the next section.

Group Methods PCK PCK_abs
Person-centric (relative 3D pose) Mehta et al., 3DV'18 65.0 N/A
Person-centric (relative 3D pose) Rogez et al., IEEE TPAMI'19 70.6 N/A
Person-centric (relative 3D pose) Mehta et al., ACM TOG'20 70.4 N/A
Person-centric (relative 3D pose) Cheng et al., ICCV'19 74.6 N/A
Person-centric (relative 3D pose) Cheng et al., AAAI'20 80.5 N/A
Camera-centric (absolute 3D pose) Moon et al., ICCV'19 82.5 31.8
Camera-centric (absolute 3D pose) Lin et al., ECCV'20 83.7 35.2
Camera-centric (absolute 3D pose) Zhen et al., ECCV'20 80.5 38.7
Camera-centric (absolute 3D pose) Li et al., ECCV'20 82.0 43.8
Camera-centric (absolute 3D pose) Cheng et al., AAAI'21 87.5 45.7
Camera-centric (absolute 3D pose) Our method 89.6 48.0

Run evaluation on MuPoTS dataset with estimated 2D joints as input

We split the whole pipeline into several separate steps to make it more clear for the users.

python calculate_mupots_topdown_pts.py
python calculate_mupots_topdown_depth.py
python calculate_mupots_btmup.py
python calculate_mupots_integrate.py

Please note that python calculate_mupots_btmup.py is going to take a while (30-40 minutes depending on your machine).

To evaluate the person-centric 3D multi-person pose estimation:

python eval_mupots_pck.py

After running the above code, the following PCK (person-centric, pelvis-based origin) value is expected, which matches the number reported in Table 3, PCK = 89 (percentage) in the paper.

...
Seq: 18
Seq: 19
Seq: 20
PCK_MEAN: 0.8923134794267524

Note: If procrustes analysis is used in eval_mupots_pck.py, the obtained value is slightly different (PCK_MEAN: 0.8994453169938017).

To evaluate camera-centric (i.e., camera coordinates) 3D multi-person pose estimation:

python eval_mupots_pck_abs.py

After running the above code, the following PCK_abs (camera-centric) value is expected, which matches the number reported in Table 3, PCK_abs = 48 (percentage) in the paper.

...
Seq: 18
Seq: 19
Seq: 20
PCK_MEAN: 0.48030635566606195

Note: If procrustes analysis is used in eval_mupots_pck_abs.py, the obtained value is slightly different (PCK_MEAN: 0.48514110933606175).

Apply on your own video

To run the code on your own video, it is needed to generate p2d, affpts, and affb (as defined here), which correspond to joints' location, joints' confidence, and bones' confidence.

  • For p2d and affpts, any off-the-shelf 2D pose estimators can be used to extract joints' location and their confidence values.
  • For affb, Part Affinity Field model can be used to extract the bone confidence, example code is here.
  • Note that we use the keypoint definition of H36M dataset, which is compatible with CrowdPose dataset but different from the COCO keypoint definition.

License

The code is released under the MIT license. See LICENSE for details.

Citation

If this work is useful for your research, please cite the following papers.

@article{cheng2022dual,
  title={Dual networks based 3D Multi-Person Pose Estimation from Monocular Video},
  author={Cheng, Yu and Wang, Bo and Tan, Robby},
  journal={IEEE Transactions on Pattern Analysis and Machine Intelligence},
  year={2022},
  publisher={IEEE}
}
@InProceedings{Cheng_2021_CVPR,
    author    = {Cheng, Yu and Wang, Bo and Yang, Bo and Tan, Robby T.},
    title     = {Monocular 3D Multi-Person Pose Estimation by Integrating Top-Down and Bottom-Up Networks},
    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
    month     = {June},
    year      = {2021},
    pages     = {7649-7659}
}

3d-multi-person-pose's People

Contributors

3dpose avatar ddddwee1 avatar projectoofficial avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

3d-multi-person-pose's Issues

FileNotFoundError

There is a error FileNotFoundError: [Errno 2] No such file or directory: 'mupots/pred_inte/1.pkl' when I run the eval_mupots_pck_abs.py. It looks like 'mupots' does not contain 'pred_inte'.

Code error

It seemed that the final absolute xyz prediction is depended on the xy of the GT which is uncomparable with other methods predicting root xyz and relative xyz and add them together to get the final joint xyz without any information from gt.

predP = predP + gt_p3d[k][:,14:15] * ratio # x,y is proportional to camera depth

affpts, affb calculation

Dear authors, thank you for sharing such a great work with public. While trying to run your code on a sample video I encountered a problem while calculating affpts and affb values required as input to your network (you calculated and provided them for the dataset you worked on). Although I found out that affpts can be calculated using confidence heatmaps and affb can be calculated using PAFs it's not clear where you get PAFs as outputs. I checked the pose estimator you used for getting keypoints but it doesn't generate PAFs for me to calculate affb values. Would you please provide help on this issue?

About the top-down network

Hi,

I found your paper very interesting. I just can't wait until the code is released, so ask here.

The paper says that the TD estimates all joints for all person in a bounding box, but GCN&TCN seems to produce one person per one bounding box. Then how do you group or select joints for a person in a bounding box before feeding the joint heatmap to GCN&TCN? Or do you put all joint heatmaps to GCN&TCN? (I don't think it's possible)

Also, BU use the concatenation of joint heatmaps and the input frame as input. But how? Is the channel of input is 3(rgb)+1(heatmap)? There are several potential problems. The number of people in the input frame change which may lead to dynamic input channel, or overlapping joint heatmaps of the same person. Could you give more details about them?

Thank you!

Dropbox not working

Hi, very appreciate for your great work. Could you please share other links to download the pretrained model? cause the dropbox seems to be unconnected.

Error in computing PCK_abs

Hi,
Congratulations for your work and thanks for releasing your code.
I think that there is a bug in your evaluation pipeline or I am missing something.
When you are computing the PCK_abs, you are first doing a rigid alignement of the predicted pose and the ground-truth pose and then you are setting the predicted root location to the ground-truth root location.
You should not do that and directly compute the euclidean distance for each joint between pred_pose and gt_pose which are expressed in camera-coordinate system.
Am I saying something wrong?
Best,

how to process to get pickle file ?

hi, thank for ur share.
i have read the instruction, but can u tell me the p2b shape is (frame-length,x,y)?
and ur provide link get pafs , i can not understand how to extract for human36, and ur example pkl for affb = torch.ones(2,16,16) / 16, how to explain?

can u share ur pickle file preprocess code?

Inter-person discriminator

Hi,
Kudos for the great work. I love the idea of the inter-person discriminator you describe in the paper. However, the following is not clear for me:

  1. How does the discriminator capture interaction between people? How does the architecture of D2 looks like?
  2. Does D2 only accept 2 people, could it accept more people if there is a 3 person interaction?
  3. If there are 3 people in the scene do you use all possible permutations taking 2 people at the time?
  4. Does the order of the inputs Pa and Pb matter?
  5. In your adversarial loss you use the estimated joints Pa and Pb, and its corresponding GT. If you have the GT correspondences then why use a discriminator and not direct supervision?

I couldn't find any of this in the paper nor in the supplementary material. Could you elaborate on this, please.

Sorry for so many questions. Thank you in advanced.

2D Multi Person Pose Estimator

Thanks for your great work! It gave me lots of inspiration.

I have some questions regarding your 2D Multi-Person Pose Estimator,

(1) How did you combine COCO and MuCo for training? Isn't they have a different KP annotation format?
(2) How much AP did you achieve after training?

Thank you!

Question about plot the figure 1 in paper

I see figure 1 in paper with a ground grid. Could you tell me how do you get the rotation relationship of people in camera coordinates with world coordinates? I would really appreciate your help.

PCK_abs threshold

Hello,
First of all, thank you for publishing your code.

I am having some trouble understanding how absolute coordinates are evaluated in MupoTS.
What is the threshold value used when computing the PCK_abs? Is it 150mm or 250 mm which is used to compute AP root?

Thanks in advance for your response and your time.

Focal lenght

Hi,
Kudos for the great work and for releasing the code.

You mention that you do not use camera parameters, however, in my understanding, if you can compute a ratio between the predicted root and the GT root, then you predict the absolute root right? Or assume at least some fixed focal length? I could not find an exact value of focal length in the code. Is this right?

Thanks!

Train

Hi thank you for the great job! May I know whether you have a plan to release the training codes and documents for this project?

Error

ImportError: cannot import name 'render_core_cython'

Frame name of the 3DPW subset with occlusion

Congratulations on the great work, guys!
I have noticed that you have extracted a 3DPW subset with occlusion (based on IoU) in you SuppMat.
Could you please provide the names of frame that you used for evaluation?
Thanks, in advance!
Yu
@3dpose @ddddwee1

model LICENSE?

Hello! Thank you so much for sharing the code and the model.

I would like to ask, what protocol is the Pre-trained Models? Can I use it in a business project?

Your reply is very much needed.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.