Giter Club home page Giter Club logo

stargan-v2's Introduction

StarGAN v2 - Official PyTorch Implementation

StarGAN v2: Diverse Image Synthesis for Multiple Domains
Yunjey Choi*, Youngjung Uh*, Jaejun Yoo*, Jung-Woo Ha
In CVPR 2020. (* indicates equal contribution)

Paper: https://arxiv.org/abs/1912.01865
Video: https://youtu.be/0EVh5Ki4dIY

Abstract: A good image-to-image translation model should learn a mapping between different visual domains while satisfying the following properties: 1) diversity of generated images and 2) scalability over multiple domains. Existing methods address either of the issues, having limited diversity or multiple models for all domains. We propose StarGAN v2, a single framework that tackles both and shows significantly improved results over the baselines. Experiments on CelebA-HQ and a new animal faces dataset (AFHQ) validate our superiority in terms of visual quality, diversity, and scalability. To better assess image-to-image translation models, we release AFHQ, high-quality animal faces with large inter- and intra-domain variations. The code, pre-trained models, and dataset are available at clovaai/stargan-v2.

Teaser video

Click the figure to watch the teaser video.

IMAGE ALT TEXT HERE

TensorFlow implementation

The TensorFlow implementation of StarGAN v2 by our team member junho can be found at clovaai/stargan-v2-tensorflow.

Software installation

Clone this repository:

git clone https://github.com/clovaai/stargan-v2.git
cd stargan-v2/

Install the dependencies:

conda create -n stargan-v2 python=3.6.7
conda activate stargan-v2
conda install -y pytorch=1.4.0 torchvision=0.5.0 cudatoolkit=10.0 -c pytorch
conda install x264=='1!152.20180717' ffmpeg=4.0.2 -c conda-forge
pip install opencv-python==4.1.2.30 ffmpeg-python==0.2.0 scikit-image==0.16.2
pip install pillow==7.0.0 scipy==1.2.1 tqdm==4.43.0 munch==2.5.0

Datasets and pre-trained networks

We provide a script to download datasets used in StarGAN v2 and the corresponding pre-trained networks. The datasets and network checkpoints will be downloaded and stored in the data and expr/checkpoints directories, respectively.

CelebA-HQ. To download the CelebA-HQ dataset and the pre-trained network, run the following commands:

bash download.sh celeba-hq-dataset
bash download.sh pretrained-network-celeba-hq
bash download.sh wing

AFHQ. To download the AFHQ dataset and the pre-trained network, run the following commands:

bash download.sh afhq-dataset
bash download.sh pretrained-network-afhq

Generating interpolation videos

After downloading the pre-trained networks, you can synthesize output images reflecting diverse styles (e.g., hairstyle) of reference images. The following commands will save generated images and interpolation videos to the expr/results directory.

CelebA-HQ. To generate images and interpolation videos, run the following command:

python main.py --mode sample --num_domains 2 --resume_iter 100000 --w_hpf 1 \
               --checkpoint_dir expr/checkpoints/celeba_hq \
               --result_dir expr/results/celeba_hq \
               --src_dir assets/representative/celeba_hq/src \
               --ref_dir assets/representative/celeba_hq/ref

To transform a custom image, first crop the image manually so that the proportion of face occupied in the whole is similar to that of CelebA-HQ. Then, run the following command for additional fine rotation and cropping. All custom images in the inp_dir directory will be aligned and stored in the out_dir directory.

python main.py --mode align \
               --inp_dir assets/representative/custom/female \
               --out_dir assets/representative/celeba_hq/src/female

AFHQ. To generate images and interpolation videos, run the following command:

python main.py --mode sample --num_domains 3 --resume_iter 100000 --w_hpf 0 \
               --checkpoint_dir expr/checkpoints/afhq \
               --result_dir expr/results/afhq \
               --src_dir assets/representative/afhq/src \
               --ref_dir assets/representative/afhq/ref

Evaluation metrics

To evaluate StarGAN v2 using Fréchet Inception Distance (FID) and Learned Perceptual Image Patch Similarity (LPIPS), run the following commands:

# celeba-hq
python main.py --mode eval --num_domains 2 --w_hpf 1 \
               --resume_iter 100000 \
               --train_img_dir data/celeba_hq/train \
               --val_img_dir data/celeba_hq/val \
               --checkpoint_dir expr/checkpoints/celeba_hq \
               --eval_dir expr/eval/celeba_hq

# afhq
python main.py --mode eval --num_domains 3 --w_hpf 0 \
               --resume_iter 100000 \
               --train_img_dir data/afhq/train \
               --val_img_dir data/afhq/val \
               --checkpoint_dir expr/checkpoints/afhq \
               --eval_dir expr/eval/afhq

Note that the evaluation metrics are calculated using random latent vectors or reference images, both of which are selected by the seed number. In the paper, we reported the average of values from 10 measurements using different seed numbers. The following table shows the calculated values for both latent-guided and reference-guided synthesis.

Dataset FID (latent) LPIPS (latent) FID (reference) LPIPS (reference) Elapsed time
celeba-hq 13.73 ± 0.06 0.4515 ± 0.0006 23.84 ± 0.03 0.3880 ± 0.0001 49min 51s
afhq 16.18 ± 0.15 0.4501 ± 0.0007 19.78 ± 0.01 0.4315 ± 0.0002 64min 49s

Training networks

To train StarGAN v2 from scratch, run the following commands. Generated images and network checkpoints will be stored in the expr/samples and expr/checkpoints directories, respectively. Training takes about three days on a single Tesla V100 GPU. Please see here for training arguments and a description of them.

# celeba-hq
python main.py --mode train --num_domains 2 --w_hpf 1 \
               --lambda_reg 1 --lambda_sty 1 --lambda_ds 1 --lambda_cyc 1 \
               --train_img_dir data/celeba_hq/train \
               --val_img_dir data/celeba_hq/val

# afhq
python main.py --mode train --num_domains 3 --w_hpf 0 \
               --lambda_reg 1 --lambda_sty 1 --lambda_ds 2 --lambda_cyc 1 \
               --train_img_dir data/afhq/train \
               --val_img_dir data/afhq/val

Animal Faces-HQ dataset (AFHQ)

We release a new dataset of animal faces, Animal Faces-HQ (AFHQ), consisting of 15,000 high-quality images at 512×512 resolution. The figure above shows example images of the AFHQ dataset. The dataset includes three domains of cat, dog, and wildlife, each providing about 5000 images. By having multiple (three) domains and diverse images of various breeds per each domain, AFHQ sets a challenging image-to-image translation problem. For each domain, we select 500 images as a test set and provide all remaining images as a training set. To download the dataset, run the following command:

bash download.sh afhq-dataset

[Update: 2021.07.01] We rebuild the original AFHQ dataset by using high-quality resize filtering (i.e., Lanczos resampling). Please see the clean FID paper that brings attention to the unfortunate software library situation for downsampling. We thank to Alias-Free GAN authors for their suggestion and contribution to the updated AFHQ dataset. If you use the updated dataset, we recommend to cite not only our paper but also their paper.

The differences from the original dataset are as follows:

  • We resize the images using Lanczos resampling instead of nearest neighbor downsampling.
  • About 2% of the original images had been removed. So the set is now has 15803 images, whereas the original had 16130.
  • Images are saved as PNG format to avoid compression artifacts. This makes the files bigger than the original, but it's worth it.

To download the updated dataset, run the following command:

bash download.sh afhq-v2-dataset

License

The source code, pre-trained models, and dataset are available under Creative Commons BY-NC 4.0 license by NAVER Corporation. You can use, copy, tranform and build upon the material for non-commercial purposes as long as you give appropriate credit by citing our paper, and indicate if changes were made.

For business inquiries, please contact [email protected].
For technical and other inquires, please contact [email protected].

Citation

If you find this work useful for your research, please cite our paper:

@inproceedings{choi2020starganv2,
  title={StarGAN v2: Diverse Image Synthesis for Multiple Domains},
  author={Yunjey Choi and Youngjung Uh and Jaejun Yoo and Jung-Woo Ha},
  booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition},
  year={2020}
}

Acknowledgements

We would like to thank the full-time and visiting Clova AI Research (now NAVER AI Lab) members for their valuable feedback and an early review: especially Seongjoon Oh, Junsuk Choe, Muhammad Ferjad Naeem, and Kyungjune Baek. We also thank Alias-Free GAN authors for their contribution to the updated AFHQ dataset.

stargan-v2's People

Contributors

caffeinism avatar clovaaiadmin avatar us avatar yotamnitzan avatar youngjung avatar yunjey avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

stargan-v2's Issues

Heatmaps

Hello, nice work.
I have a couple of doubts regarding the heatmaps.

  1. Could you please elaborate on these values? Why resizing and shifting heatmaps and why those numbers for different regions of the face (x and x2)? In the main paper, there is nothing about heatmaps or keypoints, so I am trying to understand the intuition.

  2. Are wing.ckpt pre-trained weights the same as in this work, or do they differ in some way?

  3. Does CelebA_HQ work without heatmaps?

Thanks :)

Path error upon running the example you provided to transform a custom image

Probably caused by windows, edited the line 58 of solver to link directly to the checkpoint instead

Traceback (most recent call last):
File "main.py", line 182, in
main(args)
File "main.py", line 37, in main
solver = Solver(args)
File "D:\Documents\Desktop\StarGAN\core\solver.py", line 58, in init
self.ckptios = [CheckpointIO(ospj(args.checkpoint_dir, '{:06d}_nets_ema.ckpt'), **self.nets_ema)]
File "D:\Documents\Desktop\StarGAN\core\checkpoint.py", line 17, in init
os.makedirs(os.path.dirname(fname_template), exist_ok=True)
File "D:\Dev\Python\lib\os.py", line 220, in makedirs
mkdir(name, mode)
FileNotFoundError: [WinError 3] Path not found: '{:'

Segmentation fault when align custom images

Hi @yunjey , thanks for great work!
I followed your instruction to manually crop my own image and run the wing alignment, yet I get segmentation fault without any more error message. Please help

Error below:
UserWarning: Default upsampling behavior when mode=bilinear is changed to align_corners=False since 0.4.0. Please specify align_corners=True if the old behavior is desired. See the documentation of nn.Upsample for details.
"See the documentation of nn.Upsample for details.".format(mode))
Segmentation fault

[Q] keeping background from the content image

@yunjey @youngjung
First, I want to thank you for your good works including the high-quality paper and very organized codes.
StarGANv2 can generate realistic synthetic images that following the given reference images. But, the generated image has the background of the reference image. Do you have some ideas to maintain not only the pose and identity of the content image but also keep the background the content image in generated results?

update generator

when updating the generator, the discriminator parameters should be fixed, but i found that you did not

this part in solver.py (239-241):
x_fake = nets.generator(x_real, s_trg, masks=masks)
out = nets.discriminator(x_fake, y_trg)
loss_adv = adv_loss(out, 1)

i think maybe the following is right:
x_fake = nets.generator(x_real, s_trg, masks=masks)
with torch.no_grad():
out = nets.discriminator(x_fake, y_trg)

loss_adv = adv_loss(out, 1)

could tell me this right? tks

IndexError: index 2 is out of bounds for dimension 1 with size 2

File "main.py", line 182, in
main(args)
File "main.py", line 59, in main
solver.train(loaders)
File "stargan-v2/core/solver.py", line 110, in train
nets, args, x_real, y_org, y_trg, z_trg=z_trg, masks=masks)
File "stargan-v2/core/solver.py", line 212, in compute_d_loss
s_trg = nets.mapping_network(z_trg, y_trg)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in call
result = self.forward(*input, **kwargs)
File "stargan-v2/core/model.py", line 221, in forward
s = out[idx, y] # (batch, style_dim)
IndexError: index 2 is out of bounds for dimension 1 with size 2

How to solve this problem?
please help
tensor shape as follows:
idx, y tensor([0, 1, 2, 3, 4, 5, 6, 7]) tensor([2, 1, 1, 1, 1, 1, 1, 1])
out.shape torch.Size([8, 2, 64])
idx.shape, y.shape torch.Size([8]) torch.Size([8])

Question in total GAN loss

Hi,

thanks for your paper and code. I was wondering if you explain why you defined your total GAN loss as: loss = loss_adv + args.lambda_sty * loss_sty - args.lambda_ds * loss_ds + args.lambda_cyc * loss_cyc. Why did you combine the GAN loss and style loss and subtract it from the diversity and cyclic loss? What is the intuition behind that?

Thanks for your time!

Is there any rule for number of image data, or validation set size when use custom dataset?

I re-runned the training with provided dataset and training code, and guess the previous errors are due to mismatch some 'number' between my custom dataset and AFHQ or CELEBA.

Is there any mandatory fixed number of files in each modalities of val folder, or representative folder?
I matched many numbers (number of domains, image size, etc... ) but the numbers of images are quite small (train : 4-600 per domain, val : 100 per each domain), and the images in representative folders are also smaller than examples.

multi-gpu training

hi,I found that the actual training time was longer than the time mentioned in the paper,could you release the multi-gpu code? or is there any tips for me to change this code to multi-gpu? (i have tried to make the change , but there is some problem, maybe the library of Munch not support multi-gpu operation)

tks

Inference on GPU

I am trying to test the model, But it runs on CPU and takes , 16GB of memory. How can we run the model on GPU ?

mode of sample and align

when i execute command of python main.py --mode sample --num_domains 2 --resume_iter 100000 --w_hpf 1
--checkpoint_dir expr/checkpoints/celeba_hq
--result_dir expr/results/celeba_hq
--src_dir assets/representative/celeba_hq/src
--ref_dir assets/representative/celeba_hq/ref
or python main.py --mode align
--inp_dir assets/representative/custom/female
--out_dir assets/representative/celeba_hq/src/female

the following error occurred:
Traceback (most recent call last):
File "main.py", line 182, in
main(args)
File "main.py", line 37, in main
solver = Solver(args)
File "/root/work/stargan-v2/core/solver.py", line 34, in init
self.nets, self.nets_ema = build_model(args)
File "/root/work/stargan-v2/core/model.py", line 300, in build_model
fan = FAN(fname_pretrained=args.wing_path).eval()
File "/root/work/stargan-v2/core/wing.py", line 213, in init
self.load_pretrained_weights(fname_pretrained)
File "/root/work/stargan-v2/core/wing.py", line 217, in load_pretrained_weights
checkpoint = torch.load(fname) # map_location=torch.device('cpu'))
File "/root/anaconda3/envs/stargan-v2/lib/python3.6/site-packages/torch/serialization.py", line 526, in load
if _is_zipfile(opened_file):
File "/root/anaconda3/envs/stargan-v2/lib/python3.6/site-packages/torch/serialization.py", line 76, in _is_zipfile
if ord(magic_byte) != ord(read_byte):
TypeError: ord() expected a character, but string of length 0 found

and i have installed all dependencies and downloaded the corresponding datasets and checkpoints as your description in repository. could you tell me how to solve this problem? tks much

About AdaIN

Hello! I have a question about class AdaIN.
In your implementation, you used (1 + gamma) * self.norm (x) + beta . Why don't you use gamma * self.norm (x) + beta ? Thank you.

some problems about updating style encoder network

train the generator

g_loss, g_losses_latent = compute_g_loss(nets, args, x_real, y_org, y_trg, z_trgs=[z_trg, z_trg2], masks=masks, face_mask=face_mask)
self._reset_grad()
g_loss.backward()
optims.generator.step()
optims.mapping_network.step()
optims.style_encoder.step()

g_loss, g_losses_ref = compute_g_loss(nets, args, x_real, y_org, y_trg, x_refs=[x_ref, x_ref2], masks=masks, face_mask=face_mask)
self._reset_grad()
g_loss.backward()
optims.generator.step()

Hi, I was very curious why the style encoder isn't updated when computing the loss with reference images.

about pre-trained model on AFHQ

I have downloaded the pre-trained models trained on AFHQ dataset, I wonder why the ckpt file still contains the weights of FAN network? Isn't it unused when training models on AFHQ?

--style_dim', type=int, default=64

--style_dim', type=int, default=64,It's a great honor to see the project developed by your team. I wonder how face style can find these 64 kinds. For example, I need to use this model to modify the hair color, face color and skin color of a figure in a photo

Score numbers differ from the paper

Hello, the numbers you report here differ from the paper.

  1. Is the version of the paper using the heatmaps? If yes, do you have numbers without heatmaps? They would help a lot.
  2. Are these new numbers using the same experimental framework reported in the paper? Batch size, number of iterations, etc.?
  3. I also found that some of the architectures do not match with the ones reported in the paper. For instance, the discriminator max_dim goes up to 512 (in the paper says 1024), and the mapping network is different as well. Is there going to be an updated arxiv version?
  4. [EDIT]. If AFHQ does not use heatmaps, why the numbers are also different from the paper?

Thank you.

Training hangs on fetching images and labels

Hi, training with afhq train script mentioned in your README the code seems to hang for me on fetching a batch of images and labels. So it never actually begins to train as it gets stuck on line 101 of solver.py:

inputs = next(fetcher)

Any ideas about this problem?

Thanks in advance!

Sam

About new datasets

Hello, I would like to ask, if I use a new dataset, how do I prepare it, and how is the data in the assets folder selected, if I want to test an entire test-dataset, can you provide a guidance?

Update step with reference images

Hi, I was wondering why the style encoder isn't updated when computing the losses with reference images.
Thank you for your work.

Generate Image resolution higher than 256

Is it possible to generate image with resolution 512 or 1024? I tried the img_size argument in main.py to change it to 512, yet I got following errors, seems like the model doesn't support other resolution?

RuntimeError: Error(s) in loading state_dict for Generator:
Missing key(s) in state_dict: "encode.3.conv1x1.weight", "encode.7.conv1.weight", "encode.7.conv1.bias", "encode.7.conv2.weight", "encode.7.conv2.bias", "encode.7.norm1.weight", "encode.7.norm1.bias", "encode.7.norm2.weight", "encode.7.norm2.bias", "decode.7.conv1.weight", "decode.7.conv1.bias", "decode.7.conv2.weight", "decode.7.conv2.bias", "decode.7.norm1.fc.weight", "decode.7.norm1.fc.bias", "decode.7.norm2.fc.weight", "decode.7.norm2.fc.bias", "decode.7.conv1x1.weight".
size mismatch for from_rgb.weight: copying a param with shape torch.Size([64, 3, 3, 3]) from checkpoint, the shape in current model is torch.Size([32, 3, 3, 3]).
size mismatch for from_rgb.bias: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([32]).

Error

Getting the issue in Windows 10 CMD
I tried ffmpeg uninstall and install.
Nothing works.

Error message:

Traceback (most recent call last):
File "main.py", line 182, in
main(args)
File "main.py", line 37, in main
solver = Solver(args)
File "C:\Users\xxx\stargan-v2\core\solver.py", line 58, in init
self.ckptios = [CheckpointIO(ospj(args.checkpoint_dir, '{:06d}_nets_ema.ckpt'), **self.nets_ema)]
File "C:\Users\xxx\stargan-v2\core\checkpoint.py", line 17, in init
os.makedirs(os.path.dirname(fname_template), exist_ok=True)
File "C:\Users\xxx.conda\envs\stargan-v2\lib\os.py", line 220, in makedirs
mkdir(name, mode)
FileNotFoundError: [WinError 3] The system cannot find the path specified: '{:'

domain translation using latents

Is it possible to easily translate a source image from one domain to another using latent and not ref images?

I see a fucntion translate_using_latent(nets, args, x_src, y_trg_list, z_trg_list, psi, filename): in utils.py (line 78) but it is never used and I am not sure of how "y_trg_list" and "z_trg_list" are supposed to be.

Please help me

I'm new to Python, I've been studying it for a while. I found your project by chance and it really impressed me, I want to figure out how everything works and test it myself, but using istruction nothing works. When I write the command:
bash download.sh celeba-hq-dataset

download.sh: line 9:
StarGAN v2
Copyright (c) 2020-present NAVER Corp.

This work is licensed under the Creative Commons Attribution-NonCommercial
4.0 International License. To view a copy of this license, visit
http://creativecommons.org/licenses/by-nc/4.0/ or send a letter to
Creative Commons, PO Box 1866, Mountain View, CA 94042, USA.
: No such file or directory
download.sh: line 38: wget: command not found
unzip: cannot find or open ./data/celeba_hq.zip, ./data/celeba_hq.zip.zip or ./data/celeba_hq.zip.ZIP.
rm: ./data/celeba_hq.zip: No such file or directory

eval bug

when compute FID and LPIPS , mode = "latent" then ,the bug
UnBoundLocalError: local variable 'loader_ref' referenced before assigment

HighPass

hi, there is no any explanations about HighPass In Paper, could you tell me what role this function plays?
tks!

Manage domain (attribute) as StarGANv1

If you look at the StarGAN v1, one image can belong to multiple domains. It treats attributes by txt file.
And my datasets set up that way now. (Every image belongs to multiple domains.) Images are in one folder and save labels using a CSV file. Should I save the image by domain? Is there any good way to solve my problem? Thanks.

안녕하세요. 한국인이라서 혹시나 보실까하고 한국어로도 질문 남깁니다. 위에서 질문 드린것과 같이, 지금 저는 이미지들이 한 폴더에 있고 각 이미지가 여러 도메인에 속하는 상태입니다. (사진 A가 여자 도메인에 속하는 동시에 금발 도메인에 속해서 두 도메인의 학습 모두에 사진 A를 사용하려 합니다.) 제시해주신대로 데이터를 저장하려면 여러 폴더에 같은 사진이 저장되어야 하는데 이는 너무 번거롭고 용량도 많이 차지해서 다른 좋은 방법이 있을지 궁금합니다. 감사합니다. :)

Working on expr/results/celeba_hq/video_ref.mp4 Killed

Hey!

I am following the readme tutorial but when I run the command

python main.py --mode sample --num_domains 2 --resume_iter 100000 --w_hpf 1
--checkpoint_dir expr/checkpoints/celeba_hq
--result_dir expr/results/celeba_hq
--src_dir assets/representative/celeba_hq/src
--ref_dir assets/representative/celeba_hq/ref

The video generation seems to go well but then after 100% it just print a "Killed" message and the video is not generated:

Working on expr/results/celeba_hq/reference.jpg...
/home/ubuntu/anaconda3/envs/stargan-v2/lib/python3.6/site-packages/torch/nn/functional.py:2506: UserWarning: Default upsampling behavior when mode=bilinear is changed to align_corners=False since 0.4.0. Please specify align_corners=True if the old behavior is desired. See the documentation of nn.Upsample for details.
"See the documentation of nn.Upsample for details.".format(mode))
Working on expr/results/celeba_hq/video_ref.mp4...
video_ref: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 32/32 [04:01<00:00, 7.54s/it]
Killed

Curiosity about Adam Beta1=0

Hi,
first of all congrats for the paper. I am curious about your choice of Adam parameters. Could you give more insights about Beta1=0? Why don't you use Momentum?

Thank you!

Preserving identity

Hello,
Thanks for awesome codes and application.

I have a question about training many (6~8) domains like Stargan RAFD implementation in github.

When I make all domains separate folders and train with -ds 8, the result model totally mixes the identity of the photo. The pre-trained CelebHQ model works far better to generate the results even I put the photos of domains as male or female source. (i.e. let the person smile while preserve the identity).

Is it related to changed GAN structure, and Gender is far vivid feature compare to expressions?

confusion about training custom and AFHQ data.

When I run AFHQ training code

python main.py --mode train --num_domains 3 --w_hpf 0 \
                --lambda_reg 1 --lambda_sty 1 --lambda_ds 2 --lambda_cyc 1 \
                --train_img_dir data/afhq/train \
                --val_img_dir data/afhq/val

The printed namespace is including celeba values.(Bold part below).
Is it normal? Since I face same phenomenon in custom dataset training, and suspect that it's a possible cause of errors.

Namespace(batch_size=8, beta1=0.0, beta2=0.99, checkpoint_dir='expr/checkpoints', ds_iter=100000, eval_dir='expr/eval', eval_every=50000, f_lr=1e-06, hidden_dim=512, img_size=256, inp_dir='**assets/representative/custom/female'**, lambda_cyc=1.0, lambda_ds=2.0, 
lambda_reg=1.0, lambda_sty=1.0, latent_dim=16, **lm_path='expr/checkpoints/celeba_lm_mean.npz'**, lr=0.0001, mode='train', num_domains=3, num_outs_per_domain=10, num_workers=4, **out_dir='assets/representative/celeba_hq/src/female',** print_every=10, randcrop_prob=0.5, **ref_dir='assets/representative/celeba_hq/ref**', result_dir='expr/results', resume_iter=0, sample_dir='expr/samples', sample_every=5000, save_every=10000, seed=777, **src_dir='assets/representative/celeba_hq/src',** style_dim=64, total_iters=100000, train_img_dir='data/afhq/train', val_batch_size=32, val_img_dir='data/afhq/val', w_hpf=0.0, weight_decay=0.0001, wing_path='expr/checkpoints/wing.ckpt')

PILLOW_VERSION is missing on pillow==7.0.0

Running command to generate images fails on:

File "/home/bobi/.local/lib/python3.6/site-packages/torchvision/transforms/functional.py", line 5, in <module>
    from PIL import Image, ImageOps, ImageEnhance, PILLOW_VERSION
ImportError: cannot import name 'PILLOW_VERSION'
(stargan-v2) bobi@strix:~/Desktop/stargan-v2$ python 
Python 3.6.7 |Anaconda, Inc.| (default, Oct 23 2018, 19:16:44) 
[GCC 7.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import PIL
>>> PIL.PILLOW_VERSION
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: module 'PIL' has no attribute 'PILLOW_VERSION

I works with

pip install "pillow<7"

Get output image

Is there an option to get the output image only. On running the model, I am getting the video as the output.
I only want to save the generated images.

Thank you for open sourcing such an amazing work.

No module named 'core.data_loader'

when start to train my data, run "python3 main.py --mode train.............."

Traceback (most recent call last):
File "main.py", line 18, in
from core.data_loader import get_train_loader
ImportError: No module named 'core.data_loader'

Needs help!

Hi,

First, it's a great project, and thank you for sharing the documents and codes. However, I have encountered two issues listed below. Thanks again for your help in advance!

  1. x264 cannot be installed?
    (stargan-v2) C:\Users\xxx\stargan-v2>conda install x264=='1!152.20180717' ffmpeg=4.0.2 -c conda-forge

CondaValueError: invalid package specification: x264=='1152.20180717

  1. When running the script in CMD in windows 10, the error shows as below:

python main.py --mode sample --num_domains 2 --resume_iter 100000 --w_hpf 1 --checkpoint_dir expr/checkpoints/celeba_hq --result_dir expr/results/celeba_hq --src_dir assets/representative/celeba_hq/src --ref_dir assets/representative/celeba_hq/ref

Error message:
Traceback (most recent call last):
File "main.py", line 182, in
main(args)
File "main.py", line 37, in main
solver = Solver(args)
File "C:\Users\xxx\stargan-v2\core\solver.py", line 58, in init
self.ckptios = [CheckpointIO(ospj(args.checkpoint_dir, '{:06d}_nets_ema.ckpt'), **self.nets_ema)]
File "C:\Users\xxx\stargan-v2\core\checkpoint.py", line 17, in init
os.makedirs(os.path.dirname(fname_template), exist_ok=True)
File "C:\Users\xxx.conda\envs\stargan-v2\lib\os.py", line 220, in makedirs
mkdir(name, mode)
FileNotFoundError: [WinError 3] The system cannot find the path specified: '{:'

UnBoundLocalError

Similar to issue #11

After train, eval error happens when use 4 domains dataset.

Should I adjust lambda_ds for #domains-1, or adjust other variable to fit with # of domains?

python main.py --mode train --num_domains 4 --w_hpf 0 \
               --lambda_reg 1 --lambda_sty 1 --lambda_ds 1 --lambda_cyc 1 \
               --train_img_dir data/custom/train \
               --val_img_dir data/custom/val
Calculating evaluation metrics...
Number of domains: 4
Preparing DataLoader for the evaluation phase...
Traceback (most recent call last):
  File "main.py", line 182, in <module>
    main(args)
  File "main.py", line 59, in main
    solver.train(loaders)
  File "/home/ipsych/ML/Stargan_v2/core/solver.py", line 170, in train
    calculate_metrics(nets_ema, args, i+1, mode='latent')
  File "/home/ipsych/.conda/envs/Pytorch_1_4_0/lib/python3.7/site-packages/torch/autograd/grad_mode.py", line 49, in decorate_no_grad
    return func(*args, **kwargs)
  File "/home/ipsych/ML/Stargan_v2/metrics/eval.py", line 61, in calculate_metrics
    iter_ref = iter(loader_ref)
UnboundLocalError: local variable 'loader_ref' referenced before assignment

how to use different dataset

How to use different datasets? How to arrange it inside data file?
for example I want to transfer cat images to dogs.

Questions about the batch size 4 model

  Hello, according to the source code you provided,  I keep the same parameters, except batchsize changed to 4, training a model.
  However, it was found that the batch size 4 model  was quite different from the official model, especially in terms of hairstyle and style diversification.
  Could you please help me to analyze the reason? Is it batch size or the parameter used by the official model is different from the source code?
  Thanks a million!

RuntimeError: CUDA out of memory

I'm running the training with default --batch_size 8 and I get:

RuntimeError: CUDA out of memory. Tried to allocate 64.00 MiB (GPU 0; 15.75 GiB total capacity; 14.58 GiB already a llocated; 22.88 MiB free; 14.75 GiB reserved in total by PyTorch)

Server details:

  • GPU: 1 x NVIDIA Tesla V100
  • n1-highmem-4 (4 vCPU, 26 GB memory)

running this training on Google Cloud Platform.

different update frequencies of different modules

Hi~ thanks a lot for your awesome work!

I find that during each iteration, D is optimized for twice:

d_loss, d_losses_latent = compute_d_loss(
    nets, args, x_real, y_org, y_trg, z_trg=z_trg, masks=masks)
self._reset_grad()
d_loss.backward()
optims.discriminator.step()

d_loss, d_losses_ref = compute_d_loss(
    nets, args, x_real, y_org, y_trg, x_ref=x_ref, masks=masks)
self._reset_grad()
d_loss.backward()
optims.discriminator.step()

G is also optimized for twice:

g_loss, g_losses_latent = compute_g_loss(
    nets, args, x_real, y_org, y_trg, z_trgs=[z_trg, z_trg2], masks=masks)
self._reset_grad()
g_loss.backward()
optims.generator.step()
optims.mapping_network.step()
optims.style_encoder.step()

g_loss, g_losses_ref = compute_g_loss(
    nets, args, x_real, y_org, y_trg, x_refs=[x_ref, x_ref2], masks=masks)
self._reset_grad()
g_loss.backward()
optims.generator.step()

However, mapping_network and style_encoder are only optimized for once.

Could you explain it for me? many thanks again.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.