Giter Club home page Giter Club logo

romp's Introduction

Monocular, One-stage, Regression of Multiple 3D People

Google Colab demo arXiv PWC PWC

ROMP is a one-stage network for multi-person 3D mesh recovery from a single image.

Monocular, One-stage, Regression of Multiple 3D People,
Yu Sun, Qian Bao, Wu Liu, Yili Fu, Michael J. Black, Tao Mei,
arXiv paper (arXiv 2008.12272)

Contact: [email protected]. Feel free to contact me for related questions or discussions!

  • Simple: Simultaneously predicting the body center locations and corresponding 3D body mesh parameters for all people at each pixel.

  • Fast: ROMP ResNet-50 model runs over 30 FPS on a 1070Ti GPU.

  • Strong: ROMP achieves superior performance on multiple challenging multi-person/occlusion benchmarks, including 3DPW, CMU Panoptic, and 3DOH50K.

  • Easy to use: We provide user friendly testing API and webcam demos.

News

2021/4/19: Adding support for textured SMPL mesh using vedo. See visualization.md for the details.
2021/3/30: 1.0 version. Rebuilding the code. Release the ResNet-50 version and evaluation on 3DPW.
2020/11/26: Optimization for person-person occlusion. Small changes for video support.
2020/9/11: Real-time webcam demo using local/remote server. Please refer to config_guide.md for details.
2020/9/4: Google Colab demo. Saving a npy file per imag. Please refer to config_guide.md for details.

Try on Google Colab

Before installation, you can take a few minutes to try the prepared Google Colab demo a try.
It allows you to run the project in the cloud, free of charge.

Please refer to the bug.md for unpleasant bugs. Welcome to submit the issues for related bugs.

Installation

Please refer to install.md for installation.

Demo

Currently, the released code is used to re-implement demo results. Only 1-2G GPU memory is needed.

To do this you just need to run

cd ROMP/src
sh run.sh
# if there are any bugs about shell script, please consider run the following command instead:
CUDA_VISIBLE_DEVICES=0 python core/test.py --gpu=0 --configs_yml=configs/single_image.yml

Results will be saved in ROMP/demo/images_results.

Internet images

You can also run the code on random internet images via putting the images under ROMP/demo/images.

Please refer to config_guide.md for saving the estimated mesh/Center maps/parameters dict.

Internet videos

You can also run the code on random internet videos.

To do this you just need to firstly change the input_video_path in src/configs/video.yml to /path/to/your/video. For example, set

 video_or_frame: True
 input_video_path: '../demo/videos/sample_video.mp4' # None
 output_dir: '../demo/videos/sample_video_results/'

then run

cd ROMP/src
CUDA_VISIBLE_DEVICES=0 python core/test.py --gpu=0 --configs_yml=configs/video.yml

Results will be saved to ../demo/videos/sample_video_results.

Export to Blender FBX

Please refer to expert.md to export the results to fbx files for Blender usage. Currently, this function only support the single-person video cases. Therefore, please test it with ../demo/videos/sample_video2_results/sample_video2.mp4, whose results would be saved to ../demo/videos/sample_video2_results.

Webcam

We also provide the webcam demo code, which can run at real-time on a 1070Ti GPU / remote server.
Currently, limited by the visualization pipeline, the webcam visulization code only support the single-person mesh.

To do this you just need to run

cd ROMP/src
CUDA_VISIBLE_DEVICES=0 python core/test.py --gpu=0 --configs_yml=configs/webcam.yml
# or try to use the model with ResNet-50 as backbone.
CUDA_VISIBLE_DEVICES=0 python core/test.py --gpu=0 --configs_yml=configs/webcam_resnet.yml

Press Up/Down to end the demo. Pelease refer to config_guide.md for running webcam demo on remote server, setting mesh color or camera id.

Evaluation

Please refer to evaluation.md for evaluation on benchmarks.

TODO LIST

The code will be gradually open sourced according to:

  • the schedule
    • demo code for internet images / videos / webcam
    • runtime optimization
    • benchmark evaluation
    • training

Citation

Please considering citing

@inproceedings{ROMP,
  title = {Monocular, One-stage, Regression of Multiple 3D People},
  author = {Yu, Sun and Qian, Bao and Wu, Liu and Yili, Fu and Black, Michael J. and Tao, Mei},
  booktitle = {arxiv:2008.12272},
  month = {August},
  year = {2020}
}

Acknowledgement

We thank Peng Cheng for his constructive comments on Center map training.

Thanks to Marco Musy for his help in the textured SMPL visualization.

Here are some great resources we benefit:

  • SMPL models and layer is borrowed from MPII SMPL-X model.
  • Webcam pipeline is borrowed from minimal-hand.
  • Some functions are borrowed from HMR-pytorch.
  • Some functions for data augmentation are borrowed from SPIN.
  • Synthetic occlusion is borrowed from synthetic-occlusion.
  • The evaluation code of 3DPW dataset is brought from 3dpw-eval.
  • For fair comparison, the GT annotations of 3DPW dataset are brought from VIBE.
  • 3D mesh visualization is supported by vedo and Open3D.

romp's People

Contributors

arthur151 avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.