ha0tang / c2gan Goto Github PK

View Code? Open in Web Editor NEW

69.0 5.0 5.0 8.04 MB

[ACM MM 2019 Oral] Cycle In Cycle Generative Adversarial Networks for Keypoint-Guided Image Generation

Home Page: http://disi.unitn.it/~hao.tang/project/C2GAN.html

License: Other

Python 64.49% Shell 0.96% MATLAB 34.56%

image-translation keypoints landmarks facial-landmarks gans image-generation image-to-image-translation

c2gan's Introduction

Cycle-In-Cycle GANs
Installation
Dataset Preparation
Generating Images Using Pretrained Model
Train and Test New Models
Acknowledgments
Related Projects
Citation
Contributions

Cycle-In-Cycle GANs

| Conference Paper | Extended Paper | Project |
Cycle In Cycle Generative Adversarial Networks for Keypoint-Guided Image Generation
Hao Tang¹, Dan Xu², Gaowen Liu³, Wei Wang⁴, Nicu Sebe¹ and Yan Yan³
¹University of Trento, ²University of Oxford, ³Texas State University, ⁴EPFL
The repository offers the official implementation of our paper in PyTorch.

In the meantime, check out our related BMVC 2020 oral paper Bipartite Graph Reasoning GANs for Person Image Generation, ECCV 2020 paper XingGAN for Person Image Generation, and ICCV 2021 paper Intrinsic-Extrinsic Preserved GANs for Unsupervised 3D Pose Transfer.

C2GAN Framework

License

The code is released for academic research use only. For commercial use, please contact [email protected].

Installation

Clone this repo.

git clone https://github.com/Ha0Tang/C2GAN
cd C2GAN/

This code requires PyTorch 0.4.1+ and python 3.6.9+. Please install dependencies by

pip install -r requirements.txt (for pip users)

./scripts/conda_deps.sh (for Conda users)

To reproduce the results reported in the paper, you would need an NVIDIA TITAN Xp GPUs.

Dataset Preparation

For your convenience we provide download scripts:

bash ./datasets/download_c2gan_dataset.sh RaFD_image_landmark

RaFD_image_landmark: 3.0 GB

or you can use ./scripts/convert_pts_to_figure.m to convert the generated pts files to figures.

Prepare the datasets like in this folder after the download has finished. Please cite their paper if you use the data.

Generating Images Using Pretrained Model

You need download a pretrained model (e.g., Radboud) with the following script:

bash ./scripts/download_c2gan_model.sh Radboud

The pretrained model is saved at ./checkpoints/{name}_pretrained/latest_net_G.pth.
Then generate the result using

python test.py --dataroot ./datasets/Radboud --name Radboud_pretrained --model c2gan --which_model_netG unet_256 --which_direction AtoB --dataset_mode aligned --norm batch --gpu_ids 0 --batch 16;

The results will be saved at ./results/. Use --results_dir {directory_path_to_save_result} to specify the results directory.

For your own experiments, you might want to specify --netG, --norm, --no_dropout to match the generator architecture of the trained model.

Train and Test New Models

Download a dataset using the previous script (e.g., Radboud).
To view training results and loss plots, run python -m visdom.server and click the URL http://localhost:8097.
Train a model:

sh ./train_c2gan.sh

To see more intermediate results, check out ./checkpoints/Radboud_c2gan/web/index.html.
Test the model:

sh ./test_c2gan.sh

The test results will be saved to a html file here: ./results/Radboud_c2gan/latest_test/index.html.

Acknowledgments

This source code is inspired by Pix2pix, and GestureGAN.

Related Projects

BiGraphGAN | XingGAN | GestureGAN | SelectionGAN | Guided-I2I-Translation-Papers

Citation

If you use this code for your research, please cite our paper.

C2GAN

@article{tang2021total,
  title={Total Generate: Cycle in Cycle Generative Adversarial Networks for Generating Human Faces, Hands, Bodies, and Natural Scenes},
  author={Tang, Hao and Sebe, Nicu},
  journal={IEEE Transactions on Multimedia (TMM)},
  year={2021}
}

@inproceedings{tang2019cycleincycle,
  title={Cycle In Cycle Generative Adversarial Networks for Keypoint-Guided Image Generation},
  author={Tang, Hao and Xu, Dan and Liu, Gaowen and Wang, Wei and Sebe, Nicu and Yan, Yan},
  booktitle={ACM MM},
  year={2019}
}

If you use the original BiGraphGAN, XingGAN, GestureGAN, and SelectionGAN model, please cite the following papers:

BiGraphGAN

@article{tang2022bipartite,
  title={Bipartite Graph Reasoning GANs for Person Pose and Facial Image Synthesis},
  author={Tang, Hao and Shao, Ling and Torr, Philip HS and Sebe, Nicu},
  journal={International Journal of Computer Vision (IJCV)},
  year={2022}
}

@inproceedings{tang2020bipartite,
  title={Bipartite Graph Reasoning GANs for Person Image Generation},
  author={Tang, Hao and Bai, Song and Torr, Philip HS and Sebe, Nicu},
  booktitle={BMVC},
  year={2020}
}

XingGAN

@inproceedings{tang2020xinggan,
  title={XingGAN for Person Image Generation},
  author={Tang, Hao and Bai, Song and Zhang, Li and Torr, Philip HS and Sebe, Nicu},
  booktitle={ECCV},
  year={2020}
}

GestureGAN

@article{tang2019unified,
  title={Unified Generative Adversarial Networks for Controllable Image-to-Image Translation},
  author={Tang, Hao and Liu, Hong and Sebe, Nicu},
  journal={IEEE Transactions on Image Processing (TIP)},
  year={2020}
}

@inproceedings{tang2018gesturegan,
  title={GestureGAN for Hand Gesture-to-Gesture Translation in the Wild},
  author={Tang, Hao and Wang, Wei and Xu, Dan and Yan, Yan and Sebe, Nicu},
  booktitle={ACM MM},
  year={2018}
}

SelectionGAN

@article{tang2022multi,
  title={Multi-Channel Attention Selection GANs for Guided Image-to-Image Translation},
  author={Tang, Hao and Torr, Philip HS and Sebe, Nicu},
  journal={IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI)},
  year={2022}
}

@inproceedings{tang2019multi,
  title={Multi-channel attention selection gan with cascaded semantic guidance for cross-view image translation},
  author={Tang, Hao and Xu, Dan and Sebe, Nicu and Wang, Yanzhi and Corso, Jason J and Yan, Yan},
  booktitle={CVPR},
  year={2019}
}

Contributions

If you have any questions/comments/bug reports, feel free to open a github issue or pull a request or e-mail to the author Hao Tang ([email protected]).

Collaborations

I'm always interested in meeting new people and hearing about potential collaborations. If you'd like to work together or get in contact with me, please email [email protected]. Some of our projects are listed here.

If you can do what you do best and be happy, you're further along in life than most people.

c2gan's People

Contributors

Stargazers

Watchers

Forkers

gwliu213 tang16 cv-ip tyhu penny-admixture

c2gan's Issues

An issue about data processing

I got into trouble when I reproduced your paper, can you help me?

My question is about the processing of the Radbound dataset. I downloaded your script for your Radbound dataset and found that the face images were not equal to the number of landmark pictures.

I noticed what you mentioned in your paper: 'We first remove those images in who's face is not detected correctly the public OpenFace software, leading to 5,628 training information and leading 1,407 testing image pair'. The training set and the test set add up to 7,035 images, which corresponds to the number of face images which I downloaded (there are 1005 images in each face expression folder in the data set. if all the face images are combined with lankmarks, the total number of image pairs is just 7035).

But the problem is that you filtered the undetected landmark, so there aren't enough landmarks to pair with the face images. So I'd like to ask how much data you actually used in your experiments. If 7035 images were used, as the paper says, how do you solve the problem of the missing lankmarks?

Training for Market1501

Hi, thanks for the great paper and work.
I am trying to reproduce the results for Market1501 dataset. I use the same hyperparameters mentioned in the paper.

During training, after 14 epochs, the Keypoint generator stops producing any output for fake and recovered images. Have you experienced this? Any leads on how this can be fixed?
The paper states that OpenPose is used for body keypoint output for the first few epochs. Please explain how this has been incorporated in the code. I do not see any loading of a pretrained openpose model in the training code.
A little more detail on how the training was conducted for Market 1501 will be helpful to me (which model for netG and netD)

Thanks,
jysa01

Hi, I have seen the article and the coder. I don't know how the concated images in the train/test dir been created? How to get the keypoint images?

Get wrong result during test phase

Hi, Hao Tang! I met with difficulties again. Could you help me to figure them out?

I tried to reproduce your experiment but got wrong result. I used the weight you provided to run the network. While the result of SSIM was only 0.82 (0.86 in your paper), PSNR was 23.53 (21.9 in your paper). I checked the generated images in the result folder while they were normal and the input images sucessfully translated into another expressions .

Here is my specific workflow:

Download dataset and weight
Pair face with landmark, and lead to a row of images [neutral_face_A, other_expression_B, other_landmark_B, neutral_landmark_A]
Split the whole dataset, and get 1407 images for testset. (BTW, I tried different seeds to make testset, but the value of SSIM are all around 0.82)
Run the test shell script with the default args you set before.
Calculate SSIM and PSNR of each pairs of images and take the average.

Do I need to train the network by myself to reproduce the experiment, or I just did something wrong during the test process?

Undertsanding model behavior for Recovered images during testing

Hello @Ha0Tang ,
I tried to reproduce the results of the Keypoint guided person Image generation using your C2GAN code on Market1501 data.
I downloaded the data from https://www.kaggle.com/pengcw1/market-1501/data
and organized the data into train and test set as discussed in the paper.
I use the Resnet9_block model as the Generator network and the hyperparameters are chosen from the paper for training.

During testing, I get some results which indicate that the model is learning the target pose.
I am unable to explain why or how the recovered_A image looks exactly like a copy of the real_A.
For instance, consider the below sample:

Note: The L1 loss and MSE loss are computed between the Real_A and recovered_A.

During testing,
The output Fake_B = netG(combined_realA_inputC) though not perfect, is believable
But in case of Recovered_A = netG(combined_FakeB_inputD) , it is surprising to me that the generator is able to output the texture on the shirt though only the images FakeB and inputD were provided at input.
The model has NOT been trained on this sample as both train and test sets are disjoint sets.
I spent some time analysing the code but I am unable to explain this behavior. Is this expected from the model? If yes, what was the motivation ? Kindly help me understand this behavior of the model.

Thanks and best regards,
jysa01

Why not add L1 loss of Keypoint generator?

The whole loss consists of:
GAN loss(both Keypoint and Image)
Cycle loss(both Keypoint and Image)
L1 loss(Only Image)

I wonder why you not add L1 loss of Keypoint. Did you experiment with that?

Can't dowload pretrain model

I Can't dowload pretrain model with command bash ./scripts/download_c2gan_model.sh Radboud
Are you have link to dowload it?

Er, I downloaded the Radboud Faces Database, but how can I get images like in the RaFD_landmark_figure folder?

Testing of images and train for high resolution images

Thanks for sharing the code. The concept is good.

I want to get pose transfer of source image. For this i want to use MVC dataset (https://github.com/MVC-Datasets/MVC) contains 4 poses of images(streight view,left side view, right side view, back view). I preprocessed these images that,
1.Non-human images have to be deleted using face-detection
2.I created key-points for these images using open-pose

I want to train these images such that
image A:streight view,
image B: left view
I have a following questions which can solve my problem to train:

Suppose I trained for those two types images (streight, left view), to get a lefft view of test_image, how can I give its keypoints?? (I don't have left view of test_image. I want generate it)
How to train for high resolution images??

Thanks in Advance.
Regards,
Sandhya

abnormal output during training

Hi, I met a problem during training the network.

At train phase, due to opt.display_freq=1, at each step, 8 images are generated, where real_A, real_B, input_C, input_D should correspond to the input of the network. But I observed that they did not form a pair. See the 12th epoch output.
In this picture, real_A and real_B are obviously not the same person.

I have read the code carefully, but I can’t find the reason. Do you know why this happened?

some range transformation needed to solve this issue

but I guess this could be something related to pytorch version error but same error appears on pytorch 0.4.1 :(