Giter Club home page Giter Club logo

cips-3d's Introduction

This repository contains the code of the paper CIPS-3D: A 3D-Aware Generator of GANs Based on Conditionally-Independent Pixel Synthesis. For 3D GAN inversion and editing of real images, please refer to this repository: https://github.com/PeterouZh/CIPS-3Dplusplus (CIPS-3D++).

Updates

✔️ (2022-1-5) The code has been refactored. Please refer to the scripts in exp/cips3d/bash. Please upgrade the tl2 package with pip install -I tl2.

✔️ (2021-11-26) The configuration files (yaml files) for training are being released.

✔️ (2021-10-27) All the code files have been released. The configuration files (yaml files) for training will be released next. Now I have provided a GUI script and models to facilitate the experiment of network interpolation (see below). If you find any problems, please open an issue. Have fun with it.

✔️ (2021-10-25) Thank you for your kind attention. The github star has reached two hundred. I will open source the training code in the near future.

✔️ (2021-10-20) We are planning to publish the training code here in December. But if the github star reaches two hundred, I will advance the date. Stay tuned 🕙.

Demo videos

demo1.mp4
demo2.mp4
demo_animal_finetuned.mp4
demo3.mp4
demo4.mp4
demo5.mp4
Mirror symmetry problem

The problem of mirror symmetry refers to the sudden change of the direction of the bangs near the yaw angle of pi/2. We propose to use an auxiliary discriminator to solve this problem (please see the paper).

Note that in the initial stage of training, the auxiliary discriminator must dominate the generator more than the main discriminator does. Otherwise, if the main discriminator dominates the generator, the mirror symmetry problem will still occur. In practice, progressive training is able to guarantee this. We have trained many times from scratch. Adding an auxiliary discriminator stably solves the mirror symmetry problem. If you find any problems with this idea, please open an issue.

Prepare environment
git clone --recursive https://github.com/PeterouZh/CIPS-3D.git
cd CIPS-3D

# Create virtual environment
conda create -y --name cips3d python=3.6.7
conda activate cips3d

pip install torch==1.8.2+cu102 torchvision==0.9.2+cu102 -f https://download.pytorch.org/whl/lts/1.8/torch_lts.html

pip install --no-cache-dir -r requirements.txt
pip install -I tl2

pip install -e torch_fidelity_lib
pip install -e pytorch_ema_lib

Model interpolation (web demo)

Download the pre-trained checkpoints.

Execute this command:

streamlit run --server.port 8650 -- scripts/web_demo.py  \
  --outdir results/model_interpolation \
  --cfg_file configs/web_demo.yaml \
  --command model_interpolation

Then open the browser: http://your_ip_address:8650.

You can debug this script with this command:

python scripts/web_demo.py  \
  --outdir results/model_interpolation \
  --cfg_file configs/web_demo.yaml \
  --command model_interpolation \
  --debug True

Pre-trained checkpoints

ffhq_exp
FFHQ_r256 train_ffhq_high-20220105_143314_190
AFHQ_r256 finetune_afhq-20220124_193407_473
CartoonFaces_r256 finetune_photo2cartoon-20220107_172255_454
Prepare dataset

FFHQ: Download FFHQ dataset images1024x1024 (89.1 GB)

# Downsampling images in advance to speed up training
python scripts/dataset_tool.py \
    --source=datasets/ffhq/images1024x1024 \
    --dest=datasets/ffhq/downsample_ffhq_256x256.zip \
    --width=256 --height=256

CartoonFaces Download photo2cartoon dataset

# Prepare training dataset.
python scripts/dataset_tool.py \
    --source=datasets/photo2cartoon/photo2cartoon \
    --dest=datasets/photo2cartoon/photo2cartoon_stylegan2.zip 
    

AFHQ Download afhq dataset

# Prepare training dataset.
python scripts/dataset_tool.py \
    --source=datasets/AFHQv2/AFHQv2 \
    --dest=datasets/AFHQv2/AFHQv2_stylegan2.zip 
    

Training

Please refer to the scripts in exp/cips3d/bash. I will release all the pre-trained models when the reproducing is over.

running order:

  • exp/cips3d/bash/ffhq_exp:

    • train_ffhq_r32.sh -> train_ffhq_r64.sh -> train_ffhq_r128.sh -> train_ffhq_r256.sh
    • eval_fid.sh
  • exp/cips3d/bash/finetuning_exp: (require pre-trained models from the above step)

    • finetune_photo2cartoon.sh

developing:

  • exp/cips3d/bash/ffhq_exp_v1:

  • exp/cips3d/bash/afhq_exp:

Bug fixed

  • If the training process is blocked when training with multi GPUs, please upgrade the tl2 via pip install -I tl2
Old readme

Note:

  • In order to ensure that this code is consistent with my original dirty code, please follow me to reproduce the results using this code step by step.
  • The training script train_v16.py is dirty, but I'm not going to refactor it. After all, it still works stably.

Start training at 64x64

Training:

export CUDA_HOME=/usr/local/cuda-10.2/
export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
export PYTHONPATH=.
python exp/dev/nerf_inr/scripts/train_v16.py \
    --port 8888 \
    --tl_config_file configs/train_ffhq.yaml \
    --tl_command train_ffhq \
    --tl_outdir results/train_ffhq \
    --tl_opts curriculum.new_attrs.image_list_file datasets/ffhq/images256x256_image_list.txt \
      D_first_layer_warmup True

Dummy training (for debug):

export CUDA_HOME=/usr/local/cuda-10.2/
export CUDA_VISIBLE_DEVICES=1
python exp/dev/nerf_inr/scripts/train_v16.py \
    --port 8888 \ 
    --tl_config_file configs/train_ffhq.yaml \
    --tl_command train_ffhq \
    --tl_outdir results/train_ffhq_debug \ 
    --tl_debug \
    --tl_opts curriculum.new_attrs.image_list_file datasets/ffhq/images256x256_image_list.txt \
      num_workers 0 num_images_real_eval 10 num_images_gen_eval 2 

When the FID of the 64x64 model reaches about 16, we start the next step: resume training at 128x128. Let's wait for the training (about 2 days or less).

Reproduced results: best_FID=15.27

Resume training at 128x128 from the 64x64 models

Training:

export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
export PYTHONPATH=.
python exp/dev/nerf_inr/scripts/train_v16.py \
    --port 8888 \
    --tl_config_file configs/train_ffhq.yaml \
    --tl_command train_ffhq_r128 \
    --tl_outdir results/train_ffhq \
    --tl_resume \
    --tl_resumedir results/train_ffhq \
    --tl_opts curriculum.new_attrs.image_list_file datasets/ffhq/images256x256_image_list.txt \
      D_first_layer_warmup True reset_best_fid True update_aux_every 16 d_reg_every 1 train_aux_img True

Dummy training (for debug):

export CUDA_HOME=/usr/local/cuda-10.2/
export CUDA_VISIBLE_DEVICES=1
python exp/dev/nerf_inr/scripts/train_v16.py \
    --port 8888 \ 
    --tl_config_file configs/train_ffhq.yaml \
    --tl_command train_ffhq_r128 \
    --tl_outdir results/train_ffhq \ 
    --tl_resume \
    --tl_resumedir results/train_ffhq \
    --tl_debug \
    --tl_opts curriculum.new_attrs.image_list_file datasets/ffhq/images256x256_image_list.txt \
      num_workers 0 num_images_real_eval 10 num_images_gen_eval 2 reset_best_fid True

When the FID of the 128x128 model reaches about 16, we start the next step.

Some hyperparameters may be different from the original experiment. Hope it works normally. Let's wait for the training (maybe longer).

Resume training at 256x256 from the 128x128 models

Finetune INR Net

Citation

If you find our work useful in your research, please cite:


@article{zhou2021CIPS3D,
  title = {{{CIPS}}-{{3D}}: A {{3D}}-{{Aware Generator}} of {{GANs Based}} on {{Conditionally}}-{{Independent Pixel Synthesis}}},
  shorttitle = {{{CIPS}}-{{3D}}},
  author = {Zhou, Peng and Xie, Lingxi and Ni, Bingbing and Tian, Qi},
  year = {2021},
  eprint = {2110.09788},
  eprinttype = {arxiv},
}

Acknowledgments

cips-3d's People

Contributors

peterouzh avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

cips-3d's Issues

The quality of generated images for FFHQ

Hello,

Thanks for sharing your source code and pre-trained weights. I am trying to generate high-quality images from FFHQ pre-trained model. However, the quality of generated images is not as good enough as stated in the paper. I could not reproduce the results.

I am using the pre-trained weights from here https://github.com/PeterouZh/CIPS-3D/releases/tag/v0.0.2

The command I tried:
python exp/cips3d/scripts/sample_images.py --tl_config_file exp/cips3d/configs/ffhq_exp.yaml --tl_command sample_images

Generated images:
0048220334 0038712131 0002215104

Do you have any idea regarding the problem?

Fine-tuning FFHQ curriculum

Can I have .yaml config file for fine-tuning FFHQ?
I'm using your exp/cips3d/configs/finetune_exp.yaml (for AFHQ) and after 7 hours, FID ~ 145 :(((.
image

A few questions

Dear Dr.Zhou,
Thanks for sharing your great job and congratulations on your graduating Ph.D ! I have a few questions and hoping for your reply.

1、I found a command in another issue #31 (comment) python exp/cips3d/scripts/sample_images.py --tl_config_file exp/cips3d/configs/ffhq_exp.yaml --tl_command sample_images But I can't find those arguments in sample_images.py and confuse about why he knows how to use. And I also found some packages import from tl2 library, but failed to find any documentation. I wonder if there are any instruction i miss in addition to README.
2、I saw two generators file in /CIPS-3D/exp/cips3d/models generator.py and generator_v1.py, which one should I use ?
3、Which class in generator files indicates the complete generator module cause I want to do some inversion tests and not sure whether it's class GeneratorNerfINR ? And the G_ema.pth or generator.pth in ckpt is the corresponding parametors to the generator which I can directly load into, am i right?
4、What is the use of state_dict.pth in ckpt.

By the way, I think using Chinese is more convenience for us. Thanks!

Question about the input of shallow nerf network

I know nerf is a view-dependent synthesis method due to a direction input.
However, in your code. I find you don't use it. Why can cips3d still work? just input the world coordinate can achieve new view synthesis? why?

configuration environment issues

Hi,good job!

I have a problem, please help me.

pip install -e torch_fidelity_lib
ERROR: File "setup.py" or "setup.cfg" not found. Directory cannot be installed in editable mode: /media/sdb/wd/test_code/CIPS-3D/torch_fidelity_lib

Question about camera postion

In the funtion "sample_camera_positions", you generate a random camera postion. But I am very confused about last three lines.

output_points = torch.zeros((bs, 3), device=device) # (bs, 3)
output_points[:, 0:1] = r * torch.sin(phi) * torch.cos(theta) # x
output_points[:, 2:3] = r * torch.sin(phi) * torch.sin(theta) # z
output_points[:, 1:2] = r * torch.cos(phi) # y

I don't know what it means? Please give me help.
I guess it refers to 3d rotation, but I am not sure.

Problem about reproducing the results

Hi, PeterouZh,

I'm reproducing your results at the same pace with you. Honestly speaking, this model takes about 40 hours to reach 64x64 at FID 15.97 with 8 A100 gpus. While I change the resolution to 128x128, the FID reach to 23.58. I'm still traning it and it reach FID 20.03 yet.

How can this model reach FID 6.XX as you described in paper? Do we miss some key things? It looks that this model can only reach 10+ FID in 256 resolutions because the performance increases very lowly when the FID reach 16 at 64x64 resolution.

By the way, I try to reproduce your results few weeks ago but I met problems about moxing. Does moxing provide very important tricks for this work?

Can I have the pre-trained discriminators file

Hi Zhou,
Thank you for reading my message. I am very impressive by your work of CIPS. And I would like to fine tune it on my own dataset.
Can I have the files "results/CIPS-3D/ffhq_exp/train_ffhq_high-20220105_143314_190/ckptdir/best_fid"?
Best,
Guoxian

The pretrained model can be used in finetune_photo2cartoon.sh?

I load the FFHQ pretrained model from Pre-trained checkpoints.
And change the finetune_dir as Pre-trained checkpoints in finetune_photo2cartoon.sh. But it seems not to work. I want to know if the pre-trained model can be used in finetune_photo2cartoon.sh?

Configuration environment issues

Hi,good job!

I have a problem, please help me.

pip install -e torch_fidelity_lib
ERROR: File "setup.py" or "setup.cfg" not found. Directory cannot be installed in editable mode: /media/sdb/wd/test_code/CIPS-3D/torch_fidelity_lib

license

please add a license file

> I want test some other image on your model. But I dont konw how to do it. If I have image sequence with pose data,how to test?

I want test some other image on your model. But I dont konw how to do it. If I have image sequence with pose data,how to test?

  1. Align the images in the way of StyleGAN. You can refer to this script align_images.py.
  2. Project the aligned images into the W space, also known as GAN inversion. Different from the common 2D inversion, you'd better set an appropriate yaw/pitch/fov for the CIPS-3D generator to make the initial pose of G(w) and the image to be inverted consistent.
  3. After you get the w of the image, you can reconstruct images of different styles using G'(w). G' can be obtained by interpolating generators of different domains.

Hope this helps.

Originally posted by @PeterouZh in #7 (comment)

Data format for training from scratch

Hi, I'm quite interested in the paper. I want to know more about format of data to train the model from scratch. Do images need to be different views of the same person from different cameras, or just random 2D images with no info about yaw/pitch are enough? thank you

How to add my own image to generate

Hello big guy, the environment is set up and can only be generated with the default materials. I tried to search for a long time, but I couldn't find how to customize and add my own image to generate. After reading the previous question, custom addition was not supported before, is it not supported now?

Two problems when training model to generate 128x128

when I train model to generate 64x64, everything is normal. However, when I train 128x128, I find a weird result about auxiliary input.

image

The first five rows contain the output of the INR network, and the last five rows contain the output of the auxiliary rgb layer. You can find the result is abnormal. When I train 64x64, the last five rows are also normal.
So I am confused about whether the author also meets this problem or not.
By the way, I guess when I increased the resolution, the Discriminator also inserts some new layers before original discriminator, I am not sure that it can work if new random-initialized layers are inserted before a pretrained network?

so I have two questions.

  1. Is the aboving result of the auxiliary rgb layer normal?
  2. Is it reasonable to insert some new layers to a pretrained model?

Thanks.

How to train on my own dataset?

Hi, Thanks for open-sourcing this awesome work. I would like to train the model on my own dataset. So far, I have pre-processed all images to size 256x256 by using the scripts/dataset_tool.py. Here are the issues I met when trying to train on my own images:

  • How to generate image list? I used the following command to generate a list but not sure if this is correct, I actually didn't see the datasets/ffhq/ffhq_256.txt file when training on the FFHQ dataset.
      python3 -m tl2.tools.get_data_list --source_dir datasets/my_images/downsample_ffhq_256x256/ --outfile datasets/my_images/ffhq_256.txt  --ext *.png
    
  • How to change the yaml file ffhq_exp.yaml to point to my own dataset directory?
  • How to pass hyperparameters to the model? I tried to use the training command in the old readme below:
    export CUDA_HOME=/usr/local/cuda-10.2/
    export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
    export PYTHONPATH=.
    python exp/dev/nerf_inr/scripts/train_v16.py \
        --port 8888 \
        --tl_config_file configs/train_ffhq.yaml \
        --tl_command train_ffhq \
        --tl_outdir results/train_ffhq \
        --tl_opts curriculum.new_attrs.image_list_file datasets/ffhq/images256x256_image_list.txt \
          D_first_layer_warmup True
    
    But I'm not sure how to train on 32x32 images (I'd like a quick tryout), or changing the batch_size, etc. I looked into the tl2 library but failed to find any documentation.
    Thanks for your time and any help would be appreciated!

add web demo/model to Huggingface

Hi, would you be interested in adding CIPS-3D to Hugging Face? The Hub offers free hosting, and it would make your work more accessible and visible to the rest of the ML community.

Example from other organizations:
Keras: https://huggingface.co/keras-io
Microsoft: https://huggingface.co/microsoft
Facebook: https://huggingface.co/facebook

Example spaces with repos:
github: https://github.com/salesforce/BLIP
Spaces: https://huggingface.co/spaces/salesforce/BLIP

github: https://github.com/facebookresearch/omnivore
Spaces: https://huggingface.co/spaces/akhaliq/omnivore

and here are guides for adding spaces/models/datasets to your org

How to add a Space: https://huggingface.co/blog/gradio-spaces
how to add models: https://huggingface.co/docs/hub/adding-a-model
uploading a dataset: https://huggingface.co/docs/datasets/upload_dataset.html

Please let us know if you would be interested and if you have any questions, we can also help with the technical implementation.

Can I put my face photo in your pre-trained web_ Demo to generate a 3D? video

Hello, thank you for your contribution. I try to run your web_ Demo. I saw you say"Thus current stylization is limited to randomly generated images. To
edit a real image, we need to project the image to the latent space of the generator. ".So I can't import other face images to produce the effect like the demo-video? Thank you.

Why not train from scratch?

您好,感谢您的开源代码。

在Readme中您有说明,生成高分辨率时的训练流程是32->64->128->256, 每次训练都基于前一分辨率得到的model进行finetune。
这样的训练策略的确会比直接训练要容易得多,那请问您试过直接训练256分辨率吗,调整训练参数是否也能得到类似的效果?

CUDA error: out of memory

Hi guy,
There is an issue CUDA error: out of memory (even with batch size = 1) when I try to run training script with this command
CUDA_VISIBLE_DEVICES=2 python -c "import sys; sys.path.append('./'); from exp.tests.test_cips3d import Testing_ffhq_exp; Testing_ffhq_exp().test_train_ffhq(debug=False)" --tl_opts batch_size 1 img_size 32 total_iters 80000

I try to run on V100 GPU with 32Gb mem. What should I do?
Btw, really appreciate your work, a great paper. 👏

image

closed

Hi,

Thanks for the great work. I am trying to inverse the image into w/z using the pretrained model. So would you release the pretrained discriminator to enable the inversion feature? Thanks

Where to find full pretrained models?

The pre-trained model files only contain generator.pth, ema.pth, and ema2.pth files. Where can I find the other files in order to do stylization and transfer learning from CelebA onto another dataset?

Quantitative evaluation

Do you have quantitative evaluations of the model on some other datasets (compcar or afhq)? If so, how do they compare to StyleNeRF results?

How can I get an image resolution greater than 256?

Hi! You did a great job, thanks for such a great paper and promptly published CIPS-3D code.

I've already gotten good results with your pipeline, but for images with resolution 64x64. Now I'm waiting the results of generating images with a resolution of 128x128. And I will further train for higher resolution images.

I understand correctly, in order to get 512x512 images, I need to convert the original FFHQ dataset once again through your script dataset_tool.py, but specifying the resize for 512 in it? And after I need to run training pipeline with lower values for generator learning rate and discriminator learning rate?

Thanks!

Some details about Shallow 3D NeRF Network

In the second block of the Shallow 3D NeRF Network(Fig 2), the input dim is 64, while the dim of latent code from mapping network is 128. I would like to know how to convert latent code from mapping network from 128 to 64 ?
11
?

An error occurred while reproducing

QQ图片20211224180615

I am very interested in your project,but there was an error in the reproductionWhy can't I open this website,please give me some advice,thankyou!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.