SiTH: Single-view Textured Human Reconstruction with Image-Conditioned Diffusion

Project Page | Paper | Youtube(6min) | Online Demo

Official code release for CVPR 2024 paper SiTH.

What you can find in this repo:

Demo for reconstructing a fully textured 3D human from a single image in 2 minutes (tested on an RTX 3090 GPU)
A minimal script for fitting the SMPL-X model to an image.
A new evaluation benchmark for single-view 3D human reconstruction.
A Gradio demo for creating 3D humans with poses and text prompts.

[TODO] Training scripts for the diffusion model and the mesh reconstruction model.

If you find our code and paper useful, please cite it as

@inproceedings{ho2024sith,
    title={SiTH: Single-view Textured Human Reconstruction with Image-Conditioned Diffusion},
    author={Ho, Hsuan-I and Song, Jie and Hilliges, Otmar},
    booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
    year={2024}
  }

News

[April 24, 2024] Gradio demo for 3D human creation is now available.
[April 15, 2024] Release demo code, models, and the evaluation benchmark.

Installation

Our code has been tested with PyTorch 2.1.0, CUDA 12.1, and an RTX 3090 GPU.

Simply run the following command to install relevant packages:

pip install -r requirements.txt

Quick Start

Download the checkpoint files into the checkpoints folder.

bash tools/download.sh

Download SMPL-X models and move them to the data/body_models folder. You should have the following data structure:

body_models
    └──smplx
        ├── SMPLX_NEUTRAL.pkl
        ├── SMPLX_NEUTRAL.npz
        ├── SMPLX_MALE.pkl
        ├── SMPLX_MALE.npz
        ├── SMPLX_FEMALE.pkl
        └── SMPLX_FEMALE.npz

Run the script for body fitting, back hallucination, and mesh reconstruction.

bash run.sh

Gradio Demo

We create an application combining SiTH and powerful ControlNet for 3D human creation. In the demo, users can easily create 3D humans with several button clicks.

You can either play our Online Demo or launch the web UI locally. To run the demo on your local machine, simply run

python app.py

You will see the following web UI on http://127.0.0.1:7860/.

SiTH Pipeline

Data Preparation

You can prepare your own RGBA images and put them into the data/examples/rgba folder. For example, you can create photos from OutfitAnyone, and remove the background with Segment Anything or Clipdrop.

Run the script to generate square and centralized input images into the data/examples/images folder. The default size is 1024x1024. You can also adjust the size by adjusting the --size and --ratio arguments.

python tools/centralize_rgba.py

Install and run openpose to get .json files of COCO-25 body, hand, and face keypoints. For example, we used the following command, and your image folder should contain files as in data/examples/images.

cd /path/to/openpose_dir

./build/examples/openpose/openpose.bin --image_dir /path/to/images_dir --write_json /path/to/images_dir --display 0 --net_resolution -1x544 --scale_number 3 --scale_gap 0.25 --hand --face --render_pose 0

SMPL-X Fitting

Next, we fit the SMPL-X body model to each input image and align them within a cube of [-1, 1]. By default, we use the following command that optimizes the global orientation, body shape, scale, and X,Y offset parameters.

python fit.py --opt_orient --opt_betas

There are also additional arguments and hyperparameters for customized fitting. For example, if you find the initial body pose not perfectly aligned, you can use the --pot_pose flag to optimize specific body joints. You can visualize the fitting results by activating the --debug flag.

Back-view Hallucination

Given the front-view images and SMPL-X parameters, we generate back-view images with our image-conditioned diffusion model. The following command generates images in the data/examples/back_images folder.

python hallucinate.py --num_validation_image 8

Note that generative models do have randomness. Therefore multiple images are generated and you can choose the best one to replace it in data/examples/back_images. There are several parameters you can play with:

--guidance_scale: Classifier-free guidance (CFG) scale.
--conditioning_scale: ControlNet conditioning scale.
--num_inference_steps: Denoising steps.
--pretrained_model_name_or_path: The default model is trained on 500 human scans. We offer a new model trained with 2000+ scans and more view angles. To use the model, please adjust to hohs/SiTH-diffusion-2000.

Textured Human Reconstruction

Before reconstructing the 3D meshes, make sure the following folders and images are ready.

data/examples
    ├──images
    |   ├── 000.png
    |   ├── 000_keypoints.json
    |   ...
    |
    ├──smplx
    |   ├── 000_smplx.obj
    |   ...
    |
    └──back_images
        ├── 000_00X.png
        ...

The following command will reconstruct textured meshes under data/examples/meshes:

python reconstruct.py --test-folder data/examples --config recon/config.yaml --resume checkpoints/recon_model.pth

The default --grid-size for marching cube is set to 512. If your images contain noisy segmentation borders, you can increase --erode-iter to shrink your segmentation mask.

Evaluation Benchmark

We created an evaluation benchmark using the CustomHumans dataset. Please apply the dataset directly and you will find the necessary files in the download link.

Note that we trained our models with 526 human scans provided in the THuman2.0 dataset and tested on 60 scans in the CustomHumans dataset. We used the default hyperparameters and commands suggested in run.sh. The evaluation script can be found here and here. You will need to install two additional packages for evaluation:

pip install torchmetrics[image] mediapipe

Single-view human 3D reconstruction benchmark

Methods	P-to-S (cm) ↓	S-to-P (cm) ↓	NC ↑	f-Score ↑
PIFu [Saito2019]	2.209	2.582	0.805	34.881
PIFuHD[Saito2020]	2.107	2.228	0.804	39.076
PaMIR [Zheng2021]	2.181	2.507	0.813	35.847
FOF [Feng2022]	2.079	2.644	0.808	36.013
2K2K [Han2023]	2.488	3.292	0.796	30.186
ICON* [Xiu2022]	2.256	2.795	0.791	30.437
ECON* [Xiu2023]	2.483	2.680	0.797	30.894
SiTH* (Ours)	1.871	2.045	0.826	37.029

*indicates methods trained on the same THuman2.0 dataset.

Back-view hallucination benchmark

Methods	SSIM ↑	LPIPS↓	KID(×10^−3^) ↓	Joints Err. (pixel) ↓
Pix2PixHD [Wang2018]	0.816	0.141	86.2	53.1
DreamPose [Karras2023]	0.844	0.132	86.7	76.7
Zero-1-to-3 [Liu2023]	0.862	0.119	30.0	73.4
ControlNet [Zhang2023]	0.851	0.202	39.0	35.7
SiTH (Ours)	0.950	0.063	3.2	21.5

Acknowledgement

We used code from other great research work, including occupancy_networks, pifuhd, kaolin-wisp, mmpose, smplx, SMPLer-X, editable-humans.

We created all the videos using powerful aitviewer.

We sincerely thank the authors for their awesome work!

Contact

For any questions or problems, please open an issue or contact Hsuan-I Ho.

jayvaghasiya / sith Goto Github PK

sith's Introduction

SiTH: Single-view Textured Human Reconstruction with Image-Conditioned Diffusion

Project Page | Paper | Youtube(6min) | Online Demo

News

Installation

Quick Start

Gradio Demo

SiTH Pipeline

Data Preparation

SMPL-X Fitting

Back-view Hallucination

Textured Human Reconstruction

Evaluation Benchmark

Acknowledgement

Contact

sith's People

Contributors

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent