Giter Club home page Giter Club logo

nersemble's Introduction

NeRSemble: Multi-view Radiance Field Reconstruction of Human Heads

Paper | Video | Project Page

Tobias Kirschstein, Shenhan Qian, Simon Giebenhain, Tim Walter and Matthias Nießner
Siggraph 2023

1. Installation

1.1. Dependencies

  • PyTorch 2.0
  • nerfstudio
  • tinycudann
  1. Setup environment

    conda env create -f environment.yml
    conda activate nersemble
    

    which creates a new conda environment nersemble (Installation may take a while).

  2. Manually install tinycudann:

    pip install git+https://github.com/NVlabs/tiny-cuda-nn/#subdirectory=bindings/torch
    

    (Also helpful, if you get an error like ImportError: DLL load failed while importing _86_C: The specified procedure could not be found. later on)

  3. Install the nersemble package itself by running

    pip install -e .

    inside the cloned repository folder.

1.2. Environment Paths

All paths to data / models / renderings are defined by environment variables.
Please create a file in your home directory in ~/.config/nersemble/.env with the following content:

NERSEMBLE_DATA_PATH="..."
NERSEMBLE_MODELS_PATH="..."
NERSEMBLE_RENDERS_PATH="..."

Replace the ... with the locations where data / models / renderings should be located on your machine.

  • NERSEMBLE_DATA_PATH: Location of the multi-view video dataset (See section 2 for how to obtain the dataset)
  • NERSEMBLE_MODELS_PATH: During training, model checkpoints and configs will be saved here
  • NERSEMBLE_RENDERS_PATH: Video renderings of trained models will be stored here

If you do not like creating a config file in your home directory, you can instead hard-code the paths in the env.py.

1.3. Troubleshooting

You may run into this error at the beginning of training:

\lib\site-packages\torch\include\pybind11\cast.h(624): error: too few arguments for template template parameter "Tuple"
          detected during instantiation of class "pybind11::detail::tuple_caster<Tuple, Ts...> [with Tuple=std::pair, Ts=<T1, T2>]"
(721): here

\lib\site-packages\torch\include\pybind11\cast.h(717): error: too few arguments for template template parameter "Tuple"
          detected during instantiation of class "pybind11::detail::tuple_caster<Tuple, Ts...> [with Tuple=std::pair, Ts=<T1, T2>]"
(721): here

This occurs during compilation of torch_efficient_distloss and can be solved by either training without distortion loss or by changing one line in the torch_efficient_distloss library (see sunset1995/torch_efficient_distloss#8).

2. Dataset

Access to the dataset can be requested here.
To reproduce the experiments from the paper, only download the nersemble_XXX_YYY.zip files (There are 10 in total for the 10 different sequences), as well as the camera_params.zip. Extract these .zip files into NERSEMBLE_DATA_PATH.
Also, see src/nersemble/data_manager/multi_view_data.py for an explanation of the folder layout.

3. Usage

3.1. Training

python scripts/train/train_nersemble.py $ID $SEQUENCE_NAME --name $NAME

where $ID is the id of the participant in the dataset (e.g., 030) and SEQUENCE_NAME is the name of the expression / emotion / sentence (e.g., EXP-2-eyes). $NAME may optionally be used to annotate the checkpoint folder and the wandb experiment with some descriptive experiment name.

The training script will place model checkpoints and configuration in ${NERSEMBLE_MODELS_PATH}/nersemble/NERS-XXX-${name}/. The incremental run id XXX will be automatically determined.

GPU Requirements

Training takes roughly 1 day and requires at least an RTX A6000 GPU (48GB VRAM). GPU memory requirements may be lowered by tweaking some of these hyperparameters:

  • --max_n_samples_per_batch: restricts How many ray samples are fed through the model at once (default 20 for 2^20 samples)
  • --n_hash_encodings: Number of hash encodings in the ensemble (default 32). Using 16 should give comparable quality (--latent_dim_time needs to be set to the same value)
  • --cone_angle: Use larger steps between ray samples for further away points. The default value of 0 (no step size increase) provides the best quality. Try values up to 0.004
  • --n_train_rays: Number of rays per batch (default 4096). Lower values can affect convergence
  • --mlp_num_layers / --mlp_layer_width: Making the deformation field smaller should still provide reasonable performance.

RAM requirements

Per default, the training script will cache loaded images in RAM which can cause RAM usage up to 200G. RAM usage can be lowered by:

  • --max_cached_images (default 10k): Set to 0 to completely disable caching

Special config for sequences 97 and 124

We disable the occupancy grid acceleration structure from Instant NGP as well as the use of distortion loss due to complex hair motion in sequence 97:

python scripts/train/train_nersemble.sh 97 HAIR --name $name --disable_occupancy_grid --lambda_dist_loss 0

We only train on a subset of sequence 124 (timesteps 95-570) and slightly prolong the warmup phase due to the complexity of the sequence:

 python scripts/train/train_nersemble.sh 124 FREE --name $name --start_timestep 95 --n_timesteps 475 --window_hash_encodings_begin 50000 --window_hash_encodings_end 100000

3.2. Evaluation

In the paper, all experiments are conducted by training on only 12 cameras and evaluating rendered images on 4 hold-out views (cameras 222200040, 220700191, 222200043 and 221501007).

  • For obtaining the reported PSNR, SSIM and LPIPS metrics (evaluated at 15 evenly spaced timesteps):

    python scripts/evaluate/evaluate_nersemble.py NERS-XXX

    where NERS-XXX is the run name obtained from running the training script above.

  • For obtaining the JOD video metric (evaluated at 24fps, takes much longer):

    python scripts/evaluate/evaluate_nersemble.py NERS-XXX --skip_timesteps 3 --max_eval_timesteps -1

The evaluation results will be printed in the terminal and persisted as a .json file in the model folder ${NERSEMBLE_MODELS_PATH}/NERS-XXX-${name}/evaluation.

3.3. Rendering

From a trained model NERS-XXX, a circular trajectory (4s) may be rendered via:

python scripts/render/render_nersemble.py NERS-XXX

The resulting .mp4 file is stored in NERSEMBLE_RENDERS_PATH.

4. Trained Models

We provide one trained NeRSemble for each of the 10 sequences used in the paper:

Participant ID Sequence Model
18 EMO-1-shout+laugh NERS-9018
30 EXP-2-eyes NERS-9030
38 EXP-1-head NERS-9038
85 SEN-01-port_strong_smokey NERS-9085
97 HAIR NERS-9097
124 FREE NERS-9124
175 EXP-6-tongue-1 NERS-9175
226 EXP-3-cheeks+nose NERS-9226
227 EXP-5-mouth NERS-9227
240 EXP-4-lips NERS-9240

Simply put the downloaded model folders into ${NERSEMBLE_MODELS_PATH}/nersemble.
You can then use the evaluate_nersemble.py and render_nersemble.py scripts to obtain renderings or reproduce the official metrics below.

5. Official metrics

Metrics averaged over all 10 sequences from the NVS benchmark (same 10 sequences as in the paper):

Model PSNR SSIM LPIPS JOD
NeRSemble 31.48 0.872 0.217 7.85

Note the following:

  • The metrics are slightly different from the paper due to the newer version of nerfstudio used in this repository
  • PSNR, SSIM and LPIPS are computed on only 15 evenly spaced timesteps (to make comparisons cheaper)
  • JOD is computed on every 3rd timestep (using --skip_timesteps 3 --max_eval_timesteps -1)
  • Metrics for sequence 97 were computed with --no_use_occupancy_grid_filtering

If you find our code, dataset or paper useful, please consider citing

@article{kirschstein2023nersemble,
    author = {Kirschstein, Tobias and Qian, Shenhan and Giebenhain, Simon and Walter, Tim and Nie\ss{}ner, Matthias},
    title = {NeRSemble: Multi-View Radiance Field Reconstruction of Human Heads},
    year = {2023},
    issue_date = {August 2023},
    publisher = {Association for Computing Machinery},
    address = {New York, NY, USA},
    volume = {42},
    number = {4},
    issn = {0730-0301},
    url = {https://doi.org/10.1145/3592455},
    doi = {10.1145/3592455},
    journal = {ACM Trans. Graph.},
    month = {jul},
    articleno = {161},
    numpages = {14},
}

Contact Tobias Kirschstein for questions, comments and reporting bugs, or open a GitHub issue.

nersemble's People

Contributors

tobias-kirschstein avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

nersemble's Issues

Training Requirement

Hello, this is really a fascinating job.
I have a question. I understand that this work requires at least A6000 48GB for training, but I currently only have a few 24GB 3090 graphics cards. Can I use distributed training to solve this problem? If so, can you give me some guidance?

how to get the depth map

Thanks for your excellent work. How can I preprocess my videos if I want to train my own datasets. In your paper mentions that the depth map is calculated by colmap. I want to make sure if colmap is enough?

Matting masks for free part?

Thank you for your excellent work. Could you please provide the alpha maps for the free parts? All emotional segments, with the exception of this particular part, already have their alpha maps for background removal. In case they are not available, could we obtain the background images for each camera?

Best regards,

How to do color correction in other sequences?

There is only color correction on the 10 sequences used in the paper while for the majority of other data, there is no such file.

I found in your paper that color correction involves a quite complex procedure, including face segmentation and optimal transport solving, which appears to be hard to reproduce.

I wonder if you could release (or point to) the code of doing such preprocessing, or release the correction data for the rest of the data?

Thank you very much.

Coordinate System of Camera Matrix

Hi authors, in your dataset, the camera extrinsics are stored in camera_params.json. Could you share which coordinate system are those matrices in? Are they in OpenGL or OpenCV or Pytorch3D style?

Expressions Transfer

Great job with this - how would you suggest doing expression transfer between 2 different rendered human heads?

Pretrained Model

Thank you for sharing your code. I just wanted to know if you have pre-trained weight for nersemble. Thank you. Please do share it if you have a pre-trained model.

DataManager compatible Extraction script

Hey Tobias,

First of all, I want to congratulate you on your awesome work ...
I am wondering if you can share the script which turns the default distribution of the dataset (i.e., mp4 files without extracted .png to the multi-view images, colmap and alpha) to the version that is supported by your datamanger?

As an other idea, I think it would be nice to being able to extract the images and alpha maps on the fly from the mp4 files directly instead of storing them ... let me know if I can help you with this

Thank you
BR

Hardware Requirements

Hello,

I'm interested in running your code to reproduce the results presented in your paper. Could you please specify the hardware requirements? Specifically, I'm interested in the following:

  • RAM requirements
  • GPU memory (VRAM) requirements
  • Any other hardware dependencies

Would it be possible to achieve the paper's results using an NVIDIA RTX 3090 or 4090?

Thanks!

About camera coordinate system

Hi @tobias-kirschstein , I'm confused with the cam extrinsic and intrinsic.
The paper wrote:

We estimate an individual extrinsic and a shared intrinsic camera
matrix by employing a fine checkerboard in combination with a
bundle adjustment optimization procedure.

I use colmap to estimate extrinsic and intrinsic to reproduce reconstruct on other dataset, but I canot get correct params.
So I did an experiment on nersemble using colmap, and I got roughly same numerical value but positive and negative differs a lot.
I visualized the estimate results and it looks same with the cam array in the paper, so I suggest that might be a coordinate system problem.

Colmap outputs should be opencv, what's the coordinate system in NeRSemble?

Thank you.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.