Giter Club home page Giter Club logo

humannorm's Introduction

HumanNorm: Learning Normal Diffusion Model for High-quality and Realistic 3D Human Generation

Official implementation of HumanNorm, a method for generating high-quality and realistic 3D Humans from prompts.

Xin Huang1*, Ruizhi Shao2*, Qi Zhang1, Hongwen Zhang2, Ying Feng1, Yebin Liu2, Qing Wang1
1Northwestern Polytechnical University, 2Tsinghua University, *Equal Contribution

CVPR 2024

teaser_low.mp4

Method Overview

Installation

This part is the same as the original threestudio. Skip it if you already have installed the environment.

See installation.md for additional information, including installation via Docker.

  • You must have an NVIDIA graphics card with at least 20GB VRAM and have CUDA installed.
  • Install Python >= 3.8.
  • (Optional, Recommended) Create a virtual environment:
pip3 install virtualenv # if virtualenv is installed, skip it
python3 -m virtualenv venv
. venv/bin/activate

# Newer pip versions, e.g. pip-23.x, can be much faster than old versions, e.g. pip-20.x.
# For instance, it caches the wheels of git packages to avoid unnecessarily rebuilding them later.
python3 -m pip install --upgrade pip
  • Install PyTorch >= 1.12. We have tested on torch1.12.1+cu113 and torch2.0.0+cu118, but other versions should also work fine.
# torch1.12.1+cu113
pip install torch==1.12.1+cu113 torchvision==0.13.1+cu113 --extra-index-url https://download.pytorch.org/whl/cu113
# or torch2.0.0+cu118
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu118
  • (Optional, Recommended) Install ninja to speed up the compilation of CUDA extensions:
pip install ninja
  • Install dependencies:
pip install -r requirements.txt
  • (Optional) tiny-cuda-nn installation might require downgrading pip to 23.0.1

Download Finetuned Models

You can download our fine-tuned models on HuggingFace: Normal-adapted-model, Depth-adapted-model, Normal-aligned-model and ControlNet. We provide the script to download load these models.

./download_models.sh

After downloading, the pretrained_models/ is structured like:

./pretrained_models
├── normal-adapted-sd1.5/
├── depth-adapted-sd1.5/
├── normal-aligned-sd1.5/
└── controlnet-normal-sd1.5/

Download Tets

You can download the predefined Tetrahedra for DMTET by

sudo apt-get install git-lfs # install git-lfs
cd load/
sudo chmod +x download.sh
./download.sh

After downloading, the load/ is structured like:

./load
├── lights/
├── shapes/
└── tets
    ├── ...
    ├── 128_tets.npz
    ├── 256_tets.npz
    ├── 512_tets.npz
    └── ...

Quickstart

The directory scripts contains scripts used for full-body, half-body, and head-only human generations. The directory configs contains parameter settings for all these generations. HumanNorm generates 3D humans in three steps including geometry generation, coarse texture generation, and fine texture generation. You can directly execute these three steps using these scripts. For example,

./script/run_generation_full_body.sh

After generation, you can get the result for each step.

output.mp4

You can also modify the prompt in run_generation_full_body.sh to generate other models. The script looks like this:

#!/bin/bash
exp_root_dir="./outputs"
test_save_path="./outputs/rgb_cache"
timestamp="_20231223"
tag="curry"
prompt="a DSLR photo of Stephen Curry"

# Stage1: geometry generation
exp_name="stage1-geometry"
python launch.py \
    --config configs/humannorm-geometry-full.yaml \
    --train \
    timestamp=$timestamp \
    tag=$tag \
    name=$exp_name \
    exp_root_dir=$exp_root_dir \
    data.sampling_type="full_body" \
    system.prompt_processor.prompt="$prompt, black background, normal map" \
    system.prompt_processor_add.prompt="$prompt, black background, depth map" \
    system.prompt_processor.human_part_prompt=false \
    system.geometry.shape_init="mesh:./load/shapes/full_body.obj"

# Stage2: coarse texture generation
geometry_convert_from="$exp_root_dir/$exp_name/$tag$timestamp/ckpts/last.ckpt" 
exp_name="stage2-coarse-texture"
root_path="./outputs/$exp_name"
python launch.py \
    --config configs/humannorm-texture-coarse.yaml \
    --train \
    timestamp=$timestamp \
    tag=$tag \
    name=$exp_name \
    exp_root_dir=$exp_root_dir \
    system.geometry_convert_from=$geometry_convert_from \
    data.sampling_type="full_body" \
    data.test_save_path=$test_save_path \
    system.prompt_processor.prompt="$prompt" \
    system.prompt_processor.human_part_prompt=false

# Stage3: fine texture generation
ckpt_name="last.ckpt"
exp_name="stage3-fine-texture"
python launch.py \
    --config configs/humannorm-texture-fine.yaml \
    --train \
    system.geometry_convert_from=$geometry_convert_from \
    data.dataroot=$test_save_path \
    timestamp=$timestamp \
    tag=$tag \
    name=$exp_name \
    exp_root_dir=$exp_root_dir \
    resume="$root_path/$tag$timestamp/ckpts/$ckpt_name" \
    system.prompt_processor.prompt="$prompt" \
    system.prompt_processor.human_part_prompt=false

Todo

  • Release the reorganized code.
  • Improve the quality of texture generation.
  • Release the finetuning code.

Citation

If you find our work useful in your research, please cite:

@article{huang2023humannorm,
  title={Humannorm: Learning normal diffusion model for high-quality and realistic 3d human generation},
  author={Huang, Xin and Shao, Ruizhi and Zhang, Qi and Zhang, Hongwen and Feng, Ying and Liu, Yebin and Wang, Qing},
  journal={arXiv preprint arXiv:2310.01406},
  year={2023}
}

Acknowledgments

Our project benefits from the amazing open-source projects:

We are grateful for their contribution.

humannorm's People

Contributors

xhuangcv avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

humannorm's Issues

normal map issues

HumanNorm 论文里采用THUman等数据集的人体数据来获得法线图,那你是否注意‘RichDreamer’工作生成法线图的方式.如果我像RichDreamer方式从合成人体数据集中获取法线和深度,微调SD2.1是否会有效呢?而且我使用带有红色棒球帽的人作为提示,结果出现了手变成了棒球,请问这种现象是数据问题,还是框架问题
0
00

Twindom dataset

Hi, how did you obtain the Twindom dataset? Is it publicly available? I could not find any information about this dataset on Twindom's website.

Questions about Avatars

I wonder why avatars have so big limbs and faces.
It looks like SD Gundams.
Geometry model is not trained those data but I don't know why they look like.

微调问题

能提供微调代码吗,如果我认为你是通过sb1.5获得图像然后利用最先进的法线估计算法估计法线,深度也是如此。获得配对的数据然后分贝微调sb1.5这样。那sb1.5和sb2.1甚至sb3.0对模型的性能应该有所影响吧

about the quality of the results

It is a great work and I'm very interested in it. I have a few questions for you after I studied this work.

  1. I noticed that the muscles of the generated characters were very well developed such as thick calves. How can we remove this style and make them realistic?
  2. What should I do if I want to to control the ID of a generated character by insert a LoRA ? I tried to simply load up the weights of LoRA by StableDiffusionPipeline but it did not work eventually.
  3. Should prompts conform to a certain format? I tried my prompt "a DSLR photo of Asian girl, long shawl hair, bangs covering half her face" and finally got a low quality textured mesh as below.
    0
    24
    I am looking forward to your reply. Thanks!

Wondering About SMPL Initial Pose

As I know, there are many poses for using canonical poses like T-Pose, A-Pose.
But This looks a little bit different. Does this make the best quality??

cuda issues

Hi, good job. Can the RTX4080-16G work stably?

Design choice on DMTet

Hi, can you elaborate on the choice of DMTet for 3D generation model?
I mean it's a quite old paper, while there're more recent, advance method.

About generation time

Hi, this is a great work!
May I ask how long does it take to generate an avatar given a prompt from scratch and how long does it take to finish geometry generation and texture generation, respectively.
Thanks a lot!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.