nvlabs / eg3d Goto Github PK

License: Other

Python 83.16% C++ 4.14% Cuda 12.70%

eg3d's Introduction

Efficient Geometry-aware 3D Generative Adversarial Networks (EG3D)
_{Official PyTorch implementation of the CVPR 2022 paper}

Efficient Geometry-aware 3D Generative Adversarial Networks
Eric R. Chan*, Connor Z. Lin*, Matthew A. Chan*, Koki Nagano*, Boxiao Pan, Shalini De Mello, Orazio Gallo, Leonidas Guibas, Jonathan Tremblay, Sameh Khamis, Tero Karras, and Gordon Wetzstein
* equal contribution

https://nvlabs.github.io/eg3d/

Abstract: Unsupervised generation of high-quality multi-view-consistent images and 3D shapes using only collections of single-view 2D photographs has been a long-standing challenge. Existing 3D GANs are either compute-intensive or make approximations that are not 3D-consistent; the former limits quality and resolution of the generated images and the latter adversely affects multi-view consistency and shape quality. In this work, we improve the computational efficiency and image quality of 3D GANs without overly relying on these approximations. We introduce an expressive hybrid explicit-implicit network architecture that, together with other design choices, synthesizes not only high-resolution multi-view-consistent images in real time but also produces high-quality 3D geometry. By decoupling feature generation and neural rendering, our framework is able to leverage state-of-the-art 2D CNN generators, such as StyleGAN2, and inherit their efficiency and expressiveness. We demonstrate state-of-the-art 3D-aware synthesis with FFHQ and AFHQ Cats, among other experiments.

For business inquiries, please visit our website and submit the form: NVIDIA Research Licensing

Requirements

We recommend Linux for performance and compatibility reasons.
1–8 high-end NVIDIA GPUs. We have done all testing and development using V100, RTX3090, and A100 GPUs.
64-bit Python 3.8 and PyTorch 1.11.0 (or later). See https://pytorch.org for PyTorch install instructions.
CUDA toolkit 11.3 or later. (Why is a separate CUDA toolkit installation required? We use the custom CUDA extensions from the StyleGAN3 repo. Please see Troubleshooting).
Python libraries: see environment.yml for exact library dependencies. You can use the following commands with Miniconda3 to create and activate your Python environment:
- cd eg3d
- conda env create -f environment.yml
- conda activate eg3d

Getting started

Pre-trained networks are stored as *.pkl files that can be referenced using local filenames. See Models for download links to pre-trained checkpoints.

Generating media

# Generate videos using pre-trained model

python gen_videos.py --outdir=out --trunc=0.7 --seeds=0-3 --grid=2x2 \
    --network=networks/network_snapshot.pkl

# Generate the same 4 seeds in an interpolation sequence

python gen_videos.py --outdir=out --trunc=0.7 --seeds=0-3 --grid=1x1 \
    --network=networks/network_snapshot.pkl

# Generate images and shapes (as .mrc files) using pre-trained model

python gen_samples.py --outdir=out --trunc=0.7 --shapes=true --seeds=0-3 \
    --network=networks/network_snapshot.pkl

We visualize our .mrc shape files with UCSF Chimerax.

To visualize a shape in ChimeraX do the following:

Import the .mrc file with File > Open
Find the selected shape in the Volume Viewer tool
1. The Volume Viewer tool is located under Tools > Volume Data > Volume Viewer
Change volume type to "Surface"
Change step size to 1
Change level set to 10
1. Note that the optimal level can vary by each object, but is usually between 2 and 20. Individual adjustment may make certain shapes slightly sharper
In the Lighting menu in the top bar, change lighting to "Full"

Interactive visualization

This release contains an interactive model visualization tool that can be used to explore various characteristics of a trained model. To start it, run:

python visualizer.py

See the Visualizer Guide for a description of important options.

Using networks from Python

You can use pre-trained networks in your own Python code as follows:

with open('ffhq.pkl', 'rb') as f:
    G = pickle.load(f)['G_ema'].cuda()  # torch.nn.Module
z = torch.randn([1, G.z_dim]).cuda()    # latent codes
c = torch.cat([cam2world_pose.reshape(-1, 16), intrinsics.reshape(-1, 9)], 1) # camera parameters
img = G(z, c)['image']                           # NCHW, float32, dynamic range [-1, +1], no truncation

The above code requires torch_utils and dnnlib to be accessible via PYTHONPATH. It does not need source code for the networks themselves — their class definitions are loaded from the pickle via torch_utils.persistence.

The pickle contains three networks. 'G' and 'D' are instantaneous snapshots taken during training, and 'G_ema' represents a moving average of the generator weights over several training steps. The networks are regular instances of torch.nn.Module, with all of their parameters and buffers placed on the CPU at import and gradient computation disabled by default.

The generator consists of two submodules, G.mapping and G.synthesis, that can be executed separately. They also support various additional options:

w = G.mapping(z, conditioning_params, truncation_psi=0.5, truncation_cutoff=8)
img = G.synthesis(w, camera_params)['image]

Please refer to gen_samples.py for complete code example.

Preparing datasets

Datasets are stored as uncompressed ZIP archives containing uncompressed PNG files and a metadata file dataset.json for labels. Each label is a 25-length list of floating point numbers, which is the concatenation of the flattened 4x4 camera extrinsic matrix and flattened 3x3 camera intrinsic matrix. Custom datasets can be created from a folder containing images; see python dataset_tool.py --help for more information. Alternatively, the folder can also be used directly as a dataset, without running it through dataset_tool.py first, but doing so may lead to suboptimal performance.

FFHQ: Download and process the Flickr-Faces-HQ dataset using the following commands.

Ensure the Deep3DFaceRecon_pytorch submodule is properly initialized

git submodule update --init --recursive

Run the following commands

cd dataset_preprocessing/ffhq
python runme.py

Optional: preprocessing in-the-wild portrait images. In case you want to crop in-the-wild face images and extract poses using Deep3DFaceRecon_pytorch in a way that align with the FFHQ data above and the checkpoint, run the following commands

cd dataset_preprocessing/ffhq
python preprocess_in_the_wild.py --indir=INPUT_IMAGE_FOLDER

AFHQv2: Download and process the AFHQv2 dataset with the following.

Download the AFHQv2 images zipfile from the StarGAN V2 repository
Run the following commands:

cd dataset_preprocessing/afhq
python runme.py "path/to/downloaded/afhq.zip"

ShapeNet Cars: Download and process renderings of the cars category of ShapeNet using the following commands. NOTE: the following commands download renderings of the ShapeNet cars from the Scene Representation Networks repository.

cd dataset_preprocessing/shapenet
python runme.py

Training

You can train new networks using train.py. For example:

# Train with FFHQ from scratch with raw neural rendering resolution=64, using 8 GPUs.
python train.py --outdir=~/training-runs --cfg=ffhq --data=~/datasets/FFHQ_512.zip \
  --gpus=8 --batch=32 --gamma=1 --gen_pose_cond=True

# Second stage finetuning of FFHQ to 128 neural rendering resolution (optional).
python train.py --outdir=~/training-runs --cfg=ffhq --data=~/datasets/FFHQ_512.zip \
  --resume=~/training-runs/ffhq_experiment_dir/network-snapshot-025000.pkl \
  --gpus=8 --batch=32 --gamma=1 --gen_pose_cond=True --neural_rendering_resolution_final=128

# Train with Shapenet from scratch, using 8 GPUs.
python train.py --outdir=~/training-runs --cfg=shapenet --data=~/datasets/cars_train.zip \
  --gpus=8 --batch=32 --gamma=0.3

# Train with AFHQ, finetuning from FFHQ with ADA, using 8 GPUs.
python train.py --outdir=~/training-runs --cfg=afhq --data=~/datasets/afhq.zip \
  --gpus=8 --batch=32 --gamma=5 --aug=ada --neural_rendering_resolution_final=128 --gen_pose_cond=True --gpc_reg_prob=0.8

Please see the Training Guide for a guide to setting up a training run on your own data.

Please see Models for recommended training configurations and download links for pre-trained checkpoints.

The results of each training run are saved to a newly created directory, for example ~/training-runs/00000-ffhq-ffhq512-gpus8-batch32-gamma1. The training loop exports network pickles (network-snapshot-<KIMG>.pkl) and random image grids (fakes<KIMG>.png) at regular intervals (controlled by --snap). For each exported pickle, it evaluates FID (controlled by --metrics) and logs the result in metric-fid50k_full.jsonl. It also records various statistics in training_stats.jsonl, as well as *.tfevents if TensorBoard is installed.

Quality metrics

By default, train.py automatically computes FID for each network pickle exported during training. We recommend inspecting metric-fid50k_full.jsonl (or TensorBoard) at regular intervals to monitor the training progress. When desired, the automatic computation can be disabled with --metrics=none to speed up the training slightly.

Additional quality metrics can also be computed after the training:

# Previous training run: look up options automatically, save result to JSONL file.
python calc_metrics.py --metrics=fid50k_full \
    --network=~/training-runs/network-snapshot-000000.pkl

# Pre-trained network pickle: specify dataset explicitly, print result to stdout.
python calc_metrics.py --metrics=fid50k_full --data=~/datasets/ffhq_512.zip \
    --network=ffhq-128.pkl

Note that the metrics can be quite expensive to compute (up to 1h), and many of them have an additional one-off cost for each new dataset (up to 30min). Also note that the evaluation is done using a different random seed each time, so the results will vary if the same metric is computed multiple times.

References:

GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium, Heusel et al. 2017
Demystifying MMD GANs, Bińkowski et al. 2018

Citation

@inproceedings{Chan2022,
  author = {Eric R. Chan and Connor Z. Lin and Matthew A. Chan and Koki Nagano and Boxiao Pan and Shalini De Mello and Orazio Gallo and Leonidas Guibas and Jonathan Tremblay and Sameh Khamis and Tero Karras and Gordon Wetzstein},
  title = {Efficient Geometry-aware {3D} Generative Adversarial Networks},
  booktitle = {CVPR},
  year = {2022}
}

Development

This is a research reference implementation and is treated as a one-time code drop. As such, we do not accept outside code contributions in the form of pull requests.

Acknowledgements

We thank David Luebke, Jan Kautz, Jaewoo Seo, Jonathan Granskog, Simon Yuen, Alex Evans, Stan Birchfield, Alexander Bergman, and Joy Hsu for feedback on drafts, Alex Chan, Giap Nguyen, and Trevor Chan for help with diagrams, and Colette Kress and Bryan Catanzaro for allowing use of their photographs. This project was in part supported by Stanford HAI and a Samsung GRO. Koki Nagano and Eric Chan were partially supported by DARPA’s Semantic Forensics (SemaFor) contract (HR0011-20-3-0005). The views and conclusions contained in this document are those of the authors and should not be interpreted as representing the official policies, either expressed or implied, of the U.S. Government. Distribution Statement "A" (Approved for Public Release, Distribution Unlimited).

eg3d's People

Contributors

Stargazers

Watchers

Forkers

d0ra-1h3-3xpl0ra forks-learning happyday521 robotpin mfkiwl multipath menglos geekwish sodabe622 computer-vision-rep idoganzer iuriimattos2 imgntn githubcrj shinyun yuby14 kelestz daveondet lfs119 aijia000 liuqinglong110 popmeshgrid frankspalteholz 740576774 wududu123 andrewhazelden yes-jumby looneyren vignywang ptrsmrs pterameta guoyijunless skyeking ai-machine-vision-lab yangjx86 stjordanis ikourbane intp1 zrnown samsim666 limzh00 jackweiwang janfschr stevenwhu recreatemyself victorzan viliusmat xiezhy6 sstzal cosmiccomposer abkkeb b1sounours mosure conrekatsu lucesintetica visual-synthesizer crashangelbr cryptowealth-technology minhtcai tomasguty lizhaoliaooo glociks mohammedgomaa onlineplay ayankumarbhunia ruoque miguelbandera ricklentz peterouzh 41xu mamuncseru fyviezhao cyq373 easy-shu l4rz yutongzheng tzuj6 zhengrchan anubajaj jjandnn deanofthewebb stevenhailin shaunak99 xxlei peterzs maewlila voodoohop kishanmishra1 sarvex qingcsai pravinshahi0007 aurorafang msoftware sayalipatiil takhyun12 baris-unver trungtruc123 zhanchao019 jingjing-you jingyangcarl

eg3d's Issues

about mirror dataset

def flip_yaw(pose_matrix):
flipped = pose_matrix.copy()
flipped[0, 1] *= -1
flipped[0, 2] *= -1
flipped[1, 0] *= -1
flipped[2, 0] *= -1
flipped[0, 3] *= -1
return flipped

should flipped[1, 0] change when the image is mirrored?

missing ffhq512-64.pkl in the pretrained model

Hi,

I download the eg3d_1.zip in the pretrained model link, but can not find the ffhq512-64.pkl in this folder

Data pre-processing problem

Hi, Thanks for your impressive work!

I have a problem with the data processing.
We will align and re-crop the wild FFHQ dataset in the data pre-processing script.
I wonder about the difference between the current re-crop procedure and the origin FFHQ crop procedure.
Can we achieve the same processed dataset within the cropped FFHQ dataset via directly predicting its camera coefficients, since the FFHQ-wild is too large?

Thanks in advance!

Training on ImageNet

Thanks for the impressive work！ I wonder if eg3d can be trained on ImageNet to generate 3D images？

Question for label

Hello I have a problem when I using the dataset_tool for prepare datasets both from you provided and mine own.
There is no label in dataset.json only ["label"] .
Can anyone give me some instruction? Thank you!

Can you guys give me some advices about how to re-implement this paper?

the generated images has bad texture. and training is easy to collapse.

Bad projection ?

Hello,

I have notice something strange in the definition of "generate_planes" and "project_onto_planes". If I have well understood, you define three transfer matrices in "generate_planes" that are used to project coordinates in "project_onto_planes" before to keep only the first two coordinates of your projection.

Problem:

B=2
N_rays=11
coordinates= torch.randn(B,N_rays,3)
planes = generate_planes()
out = project_onto_planes(planes, coordinates)

If we set P=coordinates[0][0], since out[:3,0,:] is the projection, it is supposed to return:
[P[0], P[1]]
[P[1], P[2]]
[P[2], P[0]]

However, I got:
[P[0], P[1]]
[P[0], P[2]]
[P[2], P[0]]

If you prefer, I have: [(X,Y), (X,Z), (Z,X)]

Reason:
If I am right, I have found the reason. You defined planes by the following matrices:
[[1, 0, 0],[0, 1, 0],[0, 0, 1]]
[[1, 0, 0],[0, 0, 1],[0, 1, 0]]
[[0, 0, 1],[1, 0, 0],[0, 1, 0]]

Let us call the matrices M1, M2 and M3. Their inverts are:

          [[1, 0, 0],
M1^{-1} = [0, 1, 0],
          [0, 0, 1]]

          [[1, 0, 0], 
M2^{-1} = [0, 0, 1],
          [0, 1, 0]]

          [[0, 1, 0],
M3^{-1} = [0, 0, 1],
          [1, 0, 0]]

If I have a point P=(X,Y,Z), I got:
P @ M1^{-1} = (X,Y,Z)
P @ M2^{-1} = (X,Z,Y)
P @ M3^{-1} = (Z,X,Y)

Then, if I keep only the two coordinates, I have: [(X,Y), (X,Z), (Z,X)]

Possible solution:
Update "generate_planes" to:

torch.tensor([[[1, 0, 0],[0, 1, 0],[0, 0, 1]],
           [[0, 1, 0],[0, 0, 1],[1, 0, 0]],
           [[0, 0, 1],[1, 0, 0],[0, 1, 0]]
           ], dtype=torch.float32)

Do not hesitate to tell me if I am misunderstanding something.

add Gradio Web Demo for cvpr 2022 call for demos

Hi, would you be interested in adding eg3d to Hugging Face as a Gradio Web Demo for CVPR 2022 call for Demos? The Hub offers free hosting, and it would make your work more accessible and visible to the rest of the ML community. Models/datasets/spaces(web demos) can be added to a user account or organization similar to github.

more info on CVPR call for demos: https://huggingface.co/CVPR

here is a example Gradio Demo for the CVPR org: https://huggingface.co/spaces/CVPR/ml-talking-face

and here is a guide for adding web demo to the organization: https://huggingface.co/blog/gradio-spaces

Please let us know if you would be interested and if you have any questions, we can also help with the technical implementation.

01157652618

ملفات

سحب ملفات وصور +201157652618

Inversion script

Hi! Congratulations to your great job.
Could you provide a script for GAN inversion?
I wonder if the model could reconstruct a 3D face from only one input image.

Fix mapping network

Hi! I want to finetune the synthesis network while keeping the mapping network fixed. I have tried many solutions, including freezing the weights of the mapping network and detaching the mapping network from the computational graph, but they does not work. Interestingly, I found it could be solved by using G instead of using G_ema in inference, I guess it is beacuse G_ema is running average of the weights of G, although mapping network of G is fixed, it will change in G_ema when averaging weights. Since using G will lead to performance degragation, how can I get a finetuned G_ema whose mapping network is fixed?

troubleshooting.md is missing

Thank you for this awesome work.

The link to the Troubleshooting seems to be broken due to the missing file.

FFHQ dataset camera pose labeling

"We use an off-the-shelf face detection and pose-extraction pipeline to both identify the face region and label the image with a pose"

Would it be possible to add the code to generate camera pose labels ("cameras.json") to this repo?

I think I have reached reasonable results, but there are a few questions.

Hi,

After days of training and debugging, my reimplementation has reached following results (currently training on ~15M images):

sample0_128.mp4

sample0_512.mp4

sample1_128.mp4

sample1_512.mp4

There are a few changes I have made:
1. I use ReLU instead of SoftMax as activation function for hidden layer in the Decoder architecture.
2. The neural rendering resolution is 128 x 128 at start.
3. Blurring images at first 200K images is disabled.

There must be something wrong with my reimplementation or understanding, since if I stick to the paper in these two aspects, my model diverges quickly. Besides, for simplicity, I use the default mixed precision for generator backbone and discriminator in StyleGAN2 and the probability of randomly swapping the conditioning pose of the generator with another random pose is always set to 50%. Do these two matter?

However, obviously there are problems with my demonstrations:
1. There are flickering and inconsistency between image at low resolution and gone through super resolution module.
2. The result performs bad at relatively large angle, which features blurring and parts' protruding.

sample2_circulation.mp4

Thus, my question is how to solve these issues? Are these two normal? If not, what should I check or do to improve the results?

Thanks!

About Geometry

hi guys

Which tools are used to visualize the geometry in this paper ?

Few questions about the paper

Awesome work. I have a few questions about the paper.

Was using a single StyleGan as backbone instead of 3 separate ligher-weight StyleGan to generate the triplane deliberate. My thought was that the triplane is mostly not pixel aligned and it could be wasteful for the convolution.
The paper says MLP uses softmax activation. Do you mean softmax for the hidden layer or the output layer? What's the reason for using softmax?
Has there been any exploration on using it with StyleGan3-R?
When will the code and weights be released.

where can i know how much time does it take in total to train

hi,

nice repo.
where can i know how much time does it take in total to train?
where can i know how many epoch does it take in total to train?

thanks

ninja: build stopped: subcommand failed.

Dear all,

after i loaded the weight in the visualizer, the following problem appeared. Do you guys know what is the cause of it? I am running pytorch 1.12 cuda 117 with gcc9. and tested on pytorch 1.11 with cuda113. Is there any docker I can run directly with the GUI?

python visualizer.py 
Loading "https://api.ngc.nvidia.com/v2/models/nvidia/research/eg3d/versions/1/files/afhqcats512-128.pkl"... Done.
Setting up PyTorch plugin "bias_act_plugin"... Failed!

Traceback (most recent call last):
  File "/home/cairs/anaconda3/envs/eg3d/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 1808, in _run_ninja_build
    subprocess.run(
  File "/home/cairs/anaconda3/envs/eg3d/lib/python3.9/subprocess.py", line 528, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/cairs/Desktop/code/eg3d/eg3d/viz/renderer.py", line 143, in render
    self._render_impl(res, **args)
  File "/home/cairs/Desktop/code/eg3d/eg3d/viz/renderer.py", line 324, in _render_impl
    all_ws = G.mapping(z=all_zs, c=all_cs, truncation_psi=trunc_psi, truncation_cutoff=trunc_cutoff) - w_avg
  File "<string>", line 41, in mapping
  File "/home/cairs/anaconda3/envs/eg3d/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "<string>", line 246, in forward
  File "/home/cairs/anaconda3/envs/eg3d/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "<string>", line 124, in forward
  File "/home/cairs/Desktop/code/eg3d/eg3d/torch_utils/ops/bias_act.py", line 86, in bias_act
    if impl == 'cuda' and x.device.type == 'cuda' and _init():
  File "/home/cairs/Desktop/code/eg3d/eg3d/torch_utils/ops/bias_act.py", line 43, in _init
    _plugin = custom_ops.get_plugin(
  File "/home/cairs/Desktop/code/eg3d/eg3d/torch_utils/custom_ops.py", line 138, in get_plugin
    torch.utils.cpp_extension.load(name=module_name, build_directory=cached_build_dir,
  File "/home/cairs/anaconda3/envs/eg3d/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 1202, in load
    return _jit_compile(
  File "/home/cairs/anaconda3/envs/eg3d/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 1425, in _jit_compile
    _write_ninja_file_and_build_library(
  File "/home/cairs/anaconda3/envs/eg3d/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 1537, in _write_ninja_file_and_build_library
    _run_ninja_build(
  File "/home/cairs/anaconda3/envs/eg3d/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 1824, in _run_ninja_build
    raise RuntimeError(message) from e
RuntimeError: Error building extension 'bias_act_plugin': [1/2] /usr/bin/nvcc  -DTORCH_EXTENSION_NAME=bias_act_plugin -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1013\" -isystem /home/cairs/anaconda3/envs/eg3d/lib/python3.9/site-packages/torch/include -isystem /home/cairs/anaconda3/envs/eg3d/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -isystem /home/cairs/anaconda3/envs/eg3d/lib/python3.9/site-packages/torch/include/TH -isystem /home/cairs/anaconda3/envs/eg3d/lib/python3.9/site-packages/torch/include/THC -isystem /home/cairs/anaconda3/envs/eg3d/include/python3.9 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_86,code=compute_86 -gencode=arch=compute_86,code=sm_86 --compiler-options '-fPIC' --use_fast_math -std=c++14 -c /home/cairs/.cache/torch_extensions/py39_cu116/bias_act_plugin/b46266ff65f9fa53c32108953a1c6f16-nvidia-geforce-rtx-3090/bias_act.cu -o bias_act.cuda.o 
FAILED: bias_act.cuda.o 
/usr/bin/nvcc  -DTORCH_EXTENSION_NAME=bias_act_plugin -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1013\" -isystem /home/cairs/anaconda3/envs/eg3d/lib/python3.9/site-packages/torch/include -isystem /home/cairs/anaconda3/envs/eg3d/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -isystem /home/cairs/anaconda3/envs/eg3d/lib/python3.9/site-packages/torch/include/TH -isystem /home/cairs/anaconda3/envs/eg3d/lib/python3.9/site-packages/torch/include/THC -isystem /home/cairs/anaconda3/envs/eg3d/include/python3.9 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_86,code=compute_86 -gencode=arch=compute_86,code=sm_86 --compiler-options '-fPIC' --use_fast_math -std=c++14 -c /home/cairs/.cache/torch_extensions/py39_cu116/bias_act_plugin/b46266ff65f9fa53c32108953a1c6f16-nvidia-geforce-rtx-3090/bias_act.cu -o bias_act.cuda.o 
/usr/include/c++/11/type_traits(1406): error: type name is not allowed

/usr/include/c++/11/type_traits(1406): error: type name is not allowed

/usr/include/c++/11/type_traits(1406): error: identifier "__is_same" is undefined

/usr/include/c++/11/ext/string_conversions.h(85): error: no instance of overloaded function "_Range_chk::_S_chk" matches the argument list
            argument types are: (const long, std::is_same<int, int>)
          detected during instantiation of "_Ret __gnu_cxx::__stoa(_TRet (*)(const _CharT *, _CharT **, _Base...), const char *, const _CharT *, std::size_t *, _Base...) [with _TRet=long, _Ret=int, _CharT=char, _Base=<int>]" 
/usr/include/c++/11/bits/basic_string.h(6620): here

/usr/include/c++/11/ext/string_conversions.h(85): error: no instance of overloaded function "_Range_chk::_S_chk" matches the argument list
            argument types are: (const long, std::is_same<long, int>)
          detected during instantiation of "_Ret __gnu_cxx::__stoa(_TRet (*)(const _CharT *, _CharT **, _Base...), const char *, const _CharT *, std::size_t *, _Base...) [with _TRet=long, _Ret=long, _CharT=char, _Base=<int>]" 
/usr/include/c++/11/bits/basic_string.h(6625): here

/usr/include/c++/11/ext/string_conversions.h(85): error: no instance of overloaded function "_Range_chk::_S_chk" matches the argument list
            argument types are: (const unsigned long, std::is_same<unsigned long, int>)
          detected during instantiation of "_Ret __gnu_cxx::__stoa(_TRet (*)(const _CharT *, _CharT **, _Base...), const char *, const _CharT *, std::size_t *, _Base...) [with _TRet=unsigned long, _Ret=unsigned long, _CharT=char, _Base=<int>]" 
/usr/include/c++/11/bits/basic_string.h(6630): here

/usr/include/c++/11/ext/string_conversions.h(85): error: no instance of overloaded function "_Range_chk::_S_chk" matches the argument list
            argument types are: (const long long, std::is_same<long long, int>)
          detected during instantiation of "_Ret __gnu_cxx::__stoa(_TRet (*)(const _CharT *, _CharT **, _Base...), const char *, const _CharT *, std::size_t *, _Base...) [with _TRet=long long, _Ret=long long, _CharT=char, _Base=<int>]" 
/usr/include/c++/11/bits/basic_string.h(6635): here

/usr/include/c++/11/ext/string_conversions.h(85): error: no instance of overloaded function "_Range_chk::_S_chk" matches the argument list
            argument types are: (const unsigned long long, std::is_same<unsigned long long, int>)
          detected during instantiation of "_Ret __gnu_cxx::__stoa(_TRet (*)(const _CharT *, _CharT **, _Base...), const char *, const _CharT *, std::size_t *, _Base...) [with _TRet=unsigned long long, _Ret=unsigned long long, _CharT=char, _Base=<int>]" 
/usr/include/c++/11/bits/basic_string.h(6640): here

/usr/include/c++/11/ext/string_conversions.h(85): error: no instance of overloaded function "_Range_chk::_S_chk" matches the argument list
            argument types are: (const float, std::is_same<float, int>)
          detected during instantiation of "_Ret __gnu_cxx::__stoa(_TRet (*)(const _CharT *, _CharT **, _Base...), const char *, const _CharT *, std::size_t *, _Base...) [with _TRet=float, _Ret=float, _CharT=char, _Base=<>]" 
/usr/include/c++/11/bits/basic_string.h(6646): here

/usr/include/c++/11/ext/string_conversions.h(85): error: no instance of overloaded function "_Range_chk::_S_chk" matches the argument list
            argument types are: (const double, std::is_same<double, int>)
          detected during instantiation of "_Ret __gnu_cxx::__stoa(_TRet (*)(const _CharT *, _CharT **, _Base...), const char *, const _CharT *, std::size_t *, _Base...) [with _TRet=double, _Ret=double, _CharT=char, _Base=<>]" 
/usr/include/c++/11/bits/basic_string.h(6650): here

/usr/include/c++/11/ext/string_conversions.h(85): error: no instance of overloaded function "_Range_chk::_S_chk" matches the argument list
            argument types are: (const long double, std::is_same<long double, int>)
          detected during instantiation of "_Ret __gnu_cxx::__stoa(_TRet (*)(const _CharT *, _CharT **, _Base...), const char *, const _CharT *, std::size_t *, _Base...) [with _TRet=long double, _Ret=long double, _CharT=char, _Base=<>]" 
/usr/include/c++/11/bits/basic_string.h(6654): here

/usr/include/c++/11/ext/string_conversions.h(85): error: no instance of overloaded function "_Range_chk::_S_chk" matches the argument list
            argument types are: (const long, std::is_same<int, int>)
          detected during instantiation of "_Ret __gnu_cxx::__stoa(_TRet (*)(const _CharT *, _CharT **, _Base...), const char *, const _CharT *, std::size_t *, _Base...) [with _TRet=long, _Ret=int, _CharT=wchar_t, _Base=<int>]" 
/usr/include/c++/11/bits/basic_string.h(6751): here

/usr/include/c++/11/ext/string_conversions.h(85): error: no instance of overloaded function "_Range_chk::_S_chk" matches the argument list
            argument types are: (const long, std::is_same<long, int>)
          detected during instantiation of "_Ret __gnu_cxx::__stoa(_TRet (*)(const _CharT *, _CharT **, _Base...), const char *, const _CharT *, std::size_t *, _Base...) [with _TRet=long, _Ret=long, _CharT=wchar_t, _Base=<int>]" 
/usr/include/c++/11/bits/basic_string.h(6756): here

/usr/include/c++/11/ext/string_conversions.h(85): error: no instance of overloaded function "_Range_chk::_S_chk" matches the argument list
            argument types are: (const unsigned long, std::is_same<unsigned long, int>)
          detected during instantiation of "_Ret __gnu_cxx::__stoa(_TRet (*)(const _CharT *, _CharT **, _Base...), const char *, const _CharT *, std::size_t *, _Base...) [with _TRet=unsigned long, _Ret=unsigned long, _CharT=wchar_t, _Base=<int>]" 
/usr/include/c++/11/bits/basic_string.h(6761): here

/usr/include/c++/11/ext/string_conversions.h(85): error: no instance of overloaded function "_Range_chk::_S_chk" matches the argument list
            argument types are: (const long long, std::is_same<long long, int>)
          detected during instantiation of "_Ret __gnu_cxx::__stoa(_TRet (*)(const _CharT *, _CharT **, _Base...), const char *, const _CharT *, std::size_t *, _Base...) [with _TRet=long long, _Ret=long long, _CharT=wchar_t, _Base=<int>]" 
/usr/include/c++/11/bits/basic_string.h(6766): here

/usr/include/c++/11/ext/string_conversions.h(85): error: no instance of overloaded function "_Range_chk::_S_chk" matches the argument list
            argument types are: (const unsigned long long, std::is_same<unsigned long long, int>)
          detected during instantiation of "_Ret __gnu_cxx::__stoa(_TRet (*)(const _CharT *, _CharT **, _Base...), const char *, const _CharT *, std::size_t *, _Base...) [with _TRet=unsigned long long, _Ret=unsigned long long, _CharT=wchar_t, _Base=<int>]" 
/usr/include/c++/11/bits/basic_string.h(6771): here

/usr/include/c++/11/ext/string_conversions.h(85): error: no instance of overloaded function "_Range_chk::_S_chk" matches the argument list
            argument types are: (const float, std::is_same<float, int>)
          detected during instantiation of "_Ret __gnu_cxx::__stoa(_TRet (*)(const _CharT *, _CharT **, _Base...), const char *, const _CharT *, std::size_t *, _Base...) [with _TRet=float, _Ret=float, _CharT=wchar_t, _Base=<>]" 
/usr/include/c++/11/bits/basic_string.h(6777): here

/usr/include/c++/11/ext/string_conversions.h(85): error: no instance of overloaded function "_Range_chk::_S_chk" matches the argument list
            argument types are: (const double, std::is_same<double, int>)
          detected during instantiation of "_Ret __gnu_cxx::__stoa(_TRet (*)(const _CharT *, _CharT **, _Base...), const char *, const _CharT *, std::size_t *, _Base...) [with _TRet=double, _Ret=double, _CharT=wchar_t, _Base=<>]" 
/usr/include/c++/11/bits/basic_string.h(6781): here

/usr/include/c++/11/ext/string_conversions.h(85): error: no instance of overloaded function "_Range_chk::_S_chk" matches the argument list
            argument types are: (const long double, std::is_same<long double, int>)
          detected during instantiation of "_Ret __gnu_cxx::__stoa(_TRet (*)(const _CharT *, _CharT **, _Base...), const char *, const _CharT *, std::size_t *, _Base...) [with _TRet=long double, _Ret=long double, _CharT=wchar_t, _Base=<>]" 
/usr/include/c++/11/bits/basic_string.h(6785): here

19 errors detected in the compilation of "/home/cairs/.cache/torch_extensions/py39_cu116/bias_act_plugin/b46266ff65f9fa53c32108953a1c6f16-nvidia-geforce-rtx-3090/bias_act.cu".
ninja: build stopped: subcommand failed.

Question about the FID of FFHQ dataset

Thank you for sharing this cool work!

But, I'd like to ask a question regarding the FID of the FFHQ dataset.
I noticed that this work uses different aligning and cropping for the FFHQ dataset. I wonder how this affects the performance of the model.
If one uses the alignment option from the original download script from the FFHQ repository and resizes them, how much FID difference will it make?

Noisy geometry

Hello,

I load the .ply file as a mesh in a mesh viewer and I get some very noisy geometry of the face. In your paper , the geometry/shape results are not so noisy/spiky! The above geometry has a voxel resolution of 512 with sampling multiplier of 4.

Any suggestions on how to avoid this?

Quesion about the camera extrinsic matrix

Hi～
I found that the camera extrinsic matrix in dataset.json is a little strange.
Since in the world coordinate system of eg3d, the y-axis points out of the screen and the z-axis points up, why is trans_y close to 0 and trans_z close to 2.6 for most faces? In my understanding, (trans_x, trans_y, trans_z) is the position of the camera in the world coordinate system. In this way, the camera is above the head in most cases.

Issue with runme.py -- permission denied

Hello,

I'm having issues downloading the FFHQ "in-the-wild" images with runme.py. Seems like the permission is denied for some content:

Anyone knows how to deal with that? Thank you in advance!

Hope that this will be setup on Colab

Would be awesome if the Inversion Demo could be set up on colab.
That way we could also try making 3d avatars of ourselfs with just a single input image.

I open the code of my unofficial reimplementation.

You can find the code here,
https://github.com/shoutOutYangJie/EG3D-pytorch

The result is not better than the paper. but I think that you can use it as a reference.

Training with sampled camera poses

Thank you for releasing the code of EG3D!

It seems like that EG3D requires an off-the-shelf pose detector for images. I am wondering if it is possible to train EG3D with sampled camera poses (same as pi-GAN). It would be nice to share some thoughts on this.

Cheers,
Wonbong

Are you able to release the code for generating camera poses?

Hi, thanks so much for releasing the amazing work.

However, the provided datasets are too large to download, and it seems that the poses were pre-computed.
I would like to run the code to extract the poses from the original FFHQ datasets.
I understood you referred to Deep3DFaceRecon_pytorch for that, however, do you have any instructions to get the camera poses defined in your codebase?

Thanks

Ask for ffhq-64.pkl

Hi! I noticed that you finetuned ffhq512-128.pkl and ffhqrebalanced512-64.pkl based on the ffhq-64.pkl, which is not included on the NGC Catalog.

Maybe the ffhq-64.pkl is a good starting point for finetuning?

Could you please upload ffhq-64.pkl?

Thanks!

How to generate mesh with texture

Thank you for your great work!
I am wondering how to generate mesh with texture. Currently, I try to project the result mesh vertices, given the default camera parameters, back to the generated image and obtain the vertices' color from the image reversely. However, something seems wrong during the projection. For example, in Blender, if I set the camera parameter the same as eg3d's demo and import the generated mesh, the visual range of the mesh in the camera perspective is inconsistent with the generated image. How can I fix this?
Also, is there a more elegant way to generate mesh with texure in eg3d?

Hopefully the code will be open source soon

Issues with GLFW

Whenever I try to use the visualizer.py program, I stumble upon these errors:

/root/miniconda3/envs/eg3d/lib/python3.9/site-packages/glfw/__init__.py:834: GLFWError: (65544) b'X11: The DISPLAY environment variable is missing'
  warnings.warn(message, GLFWError)
/root/miniconda3/envs/eg3d/lib/python3.9/site-packages/glfw/__init__.py:834: GLFWError: (65537) b'The GLFW library is not initialized'
  warnings.warn(message, GLFWError)
python3: /builds/florianrhiem/pyGLFW/glfw-3.3.4/src/input.c:832: glfwSetKeyCallback: Assertion `window != ((void *)0)' failed.
Aborted (core dumped)

I installed all the requirements specified in the environment.yml file, I also tried using some other versions of GLFW, but it didn't work (test done for redundancy). I have no experience with GLFW, could someone help me with this?

custom stylgan3 model

i'm training a custom stylegan3 model, will it work fine in eg3d or will it need to retrain?
thanks

Is it normal that many generated zero-value points while starting to train EG3d?

wish to your reply.

Details in EG3D Inversion

I released my EG3D inversion code for your reference, you can find it here: EG3D-projector.

Thanks for the impressive work!

As you mentioned in the paper, you use Pivotal Tuning Inversion to invert test images. The PTI finetunes the EG3D parameters based on the pivot latent code, which is obtained by optimization.
The pivot latent code is "w" or "w+", however, they are correlated with camera parameters that are fed to the mapping network. Will the novel-view synthesis be affected by this camera-fixed latent code?

I also noticed that you set a hyper-parameter entangle = 'camera' in gen_videos.py, it seems that you have considered this issue when rendering different views for a specific latent code. I tried the 'condition' and 'both', the camera parameters that are input to the mapping only control some unrelated semantic attribute (expression, clothes...).
I think maybe the [zs,c] that is fed to the mapping network can be regarded as a latent code with a shape of [1,512+25], does the c influence the camera view of subsequent synthesis?

Now I have reproduced the PTI inversion of EG3D. Please see the video below. I input the re-aligned 00000.png and its camera parameters (from the dataset.json), then I optimize the latent code 'w' and use it as the pivot to finetune eg3d.

The result looks a little strange. I want to know if my implementation is consistent with yours!

I think I've figured out why the camera parameters that are input to the mapping network can't control the camera view, please refer to Generator Pose Conditioning.

00000_w.mp4

How to upsample

Hello,
I tried to use the dataset with 128*128 resolution to train my own model.
I have question of how to upsample the result in training right now I only have the model of 128 resolution result.
Should I manually do it in command or it can automatic upsample it after how many iteration?
Can you give me some instuction? Thank you so much : )

Cuda memory error

I train on 8 sheets of 24G TITAN RTX. Strangely, when I set the 4-card batch size to 16 for training it works fine. But when I set the batch size of 8 cards to 32, the following error will be reported. Why?

Traceback (most recent call last): [285/1697]
File "python3.8/site-packages/torch/multiprocessing/spawn.py", line 69, in _wrap
fn(i, *args)
File "eg3d/train.py", line 52, in subprocess_fn
training_loop.training_loop(rank=rank, **c)
File "eg3d/training/training_loop.py", line 285, in training_loop
loss.accumulate_gradients(phase=phase.name, real_img=real_img, real_c=real_c, gen_z=gen_z, gen_c=gen_c, gain=phase.interval, cur_nimg=cur_nimg)
File "eg3d/training/loss.py", line 121, in accumulate_gradients
gen_img, _gen_ws = self.run_G(gen_z, gen_c, swapping_prob=swapping_prob, neural_rendering_resolution=neural_rendering_resolution)
File "eg3d/training/loss.py", line 70, in run_G
gen_output = self.G.synthesis(ws, c, neural_rendering_resolution=neural_rendering_resolution, update_emas=update_emas)
File "eg3d/training/triplane.py", line 89, in synthesis
sr_image = self.superresolution(rgb_image, feature_image, ws, noise_mode=self.rendering_kwargs['superresolution_noise_mode'], **{k:synthesis_kwargs[k]
for k in synthesis_kwargs.keys() if k != 'noise_mode'})
File "python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
return forward_call(*input, **kwargs)
File "eg3d/training/superresolution.py", line 289, in forward
x, rgb = self.block1(x, rgb, ws, **block_kwargs)
File "python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
return forward_call(*input, **kwargs)
File "eg3d/training/networks_stylegan2.py", line 448, in forward
x = self.conv1(x, next(w_iter), fused_modconv=fused_modconv, **layer_kwargs)
File "python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
return forward_call(*input, **kwargs)
File "eg3d/training/networks_stylegan2.py", line 329, in forward
x = bias_act.bias_act(x, self.bias.to(x.dtype), act=self.activation, gain=act_gain, clamp=act_clamp)
File "eg3d/torch_utils/ops/bias_act.py", line 87, in bias_act
return _bias_act_cuda(dim=dim, act=act, alpha=alpha, gain=gain, clamp=clamp).apply(x, b)
File "eg3d/torch_utils/ops/bias_act.py", line 152, in forward
y = _plugin.bias_act(x, b, _null_tensor, _null_tensor, _null_tensor, 0, dim, spec.cuda_idx, alpha, gain, clamp)
RuntimeError: CUDA out of memory. Tried to allocate 256.00 MiB (GPU 0; 23.70 GiB total capacity; 7.11 GiB already allocated; 21.81 MiB free; 7.83 GiB rese
rved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Ma
nagement and PYTORCH_CUDA_ALLOC_CONF

expecting code releasing

can not run runme.py

Could not establish connection, could not download file of runme.py

Antialiased interpolation

Thanks for providing such impressive work!

I note in the repo, all the torch.nn.functional.interpolate function use the args antialias=True.
I wondered how much the setting will influence the ultimate results compared to use the normal upsample handle torch.nn.functional.interpolate(image_orig_tensor, size=(size, size), mode='bilinear', align_corners=False).

Thanks in advance!

ffhq data download

thank you for your excellent work!
I'm reproducing the ffhq experiment,but in-the-wild-images is too large,and i can't understand the code in 'runme.py'.Could you please provide the zip of ffhq that eg3d can use directly?

I try to implement EG3d, and I find the camera parameters controls ID, and noise Z is invalid.

Like this. Each row is generated by different noise z, and each column is generated by the same camera paramters.

You can find, each row contains different IDs, although each column control same angle.

Flickering in generated videos

Good work!
I try to run gen_videos.py to generate videos, it seems like in the results the interpolation is not smooth with some flickering. But the flickering is not apparent in the demo videos on your project page.

Could you tell me why the results are not smooth?

interpolation.mp4

How much GPU Memory required?

Hi, I try to train this modoel with 12GB GPUs, with "--gpus=8 --batch=8 --mbstd-group=1". But it stuck and I noticed the memory on the first GPU exhausted at a moment. So I wonder:

How much memory does it require to train eg3d with arguments "--gpus=8 --batch=32"?
Are the 16GB V100s okay to train the eg3d?

Thank you!

Custom input image?

Hi, thank for great paper and open source. Can i use my own face image to change face pose?

Extract exact yaw and pitch angles in degree?

Hi :-)

First of all, thanks for the amazing work!

I have a question regarding the yaw and pitch angles: Is it possible to extract the exact angles (in degree) for a generated face image?

How to fix the sunken face or hairs?

hi, i tried to implement eg3d myself, and i find that there are some samples have sunken faces or hairs? Why does this happen? How to solve it? The resolution is 128*128.

about face alignment

HI! thanks for sharing awesome works!

I created projection module,
project this image

to project this video

but when alignment is a little bit wrong, then the result is terrible.

I think it is about alingment problem or extrinsic/intrinsic extraction code is the problem.

I have two questions

did you align face again after extract extrinsic parameter to get a correct head position?
how do you extract extrinsic/intrinsic parameter from this repo, i cannot found anything from it.

Thanks.

Type of Cameras for Pytorch3D rendering

Hello!

I am saving the camera parameters during the synthesis of the images and I reconstruct the meshes using the '.ply'data and Pytorch3D. However I a cannot get a good rendering because of issues with the camera pose. The exact problem is that I am not sure what type of camera you are using for your rendering. Should I use a Perspective or Orthographic Camera?

How to pass an input?

Hi, I downloaded ffhqrebalanced512-128.pkl then ran the command suggested in the readme:

python gen_videos.py --outdir=out --trunc=0.7 --seeds=0-3 --grid=2x2 \
    --network=ffhqrebalanced512-128.pkl

The output is a black video with some red lines around the edges that flicker a bit.

I'm wondering what the expected output is? Shouldn't I be passing some input image to be transformed? There is the .pkl file, but I thought that is just a model trained on hundreds of input faces. So shouldn't it be input + model => output?

About ffhq dataset download

Excuse me, I meet the problem in pre-data because I find the ffhq dataset is so big to download. Then I directly download the thumbnails128x128 from Google drive, but it also seems to need json files when it runs. I also download the json, but I dont know how to merge the thumbnails and the json. Could you give me some suggestions or tell me how to get the data correctly? Looking forward to your reply.

nvlabs / eg3d Goto Github PK

eg3d's Introduction

Efficient Geometry-aware 3D Generative Adversarial Networks (EG3D)Official PyTorch implementation of the CVPR 2022 paper

Requirements

Getting started

Generating media

Interactive visualization

Using networks from Python

Preparing datasets

Training

Quality metrics

Citation

Development

Acknowledgements

eg3d's People

Contributors

Stargazers

Watchers

Forkers

eg3d's Issues

Recommend Projects

Recommend Topics

Recommend Org

Efficient Geometry-aware 3D Generative Adversarial Networks (EG3D)
_{Official PyTorch implementation of the CVPR 2022 paper}