vita-group / transgan Goto Github PK

[NeurIPS‘2021] "TransGAN: Two Pure Transformers Can Make One Strong GAN, and That Can Scale Up", Yifan Jiang, Shiyu Chang, Zhangyang Wang

License: Other

Python 91.37% Shell 1.46% C++ 2.20% Cuda 4.97%

transformer transformer-encoder transformer-models gan pytorch

transgan's Introduction

TransGAN: Two Pure Transformers Can Make One Strong GAN, and That Can Scale Up

Code used for TransGAN: Two Pure Transformers Can Make One Strong GAN, and That Can Scale Up.

Implementation

Guidance

Cifar training script

python exp/cifar_train.py

I disabled the evaluation during training job as it causes strange bug. Please launch another evaluation job simultaneously by copying the path to test script.

Cifar test

First download the cifar checkpoint and put it on ./cifar_checkpoint. Then run the following script.

python exp/cifar_test.py

Main Pipeline

Representative Visual Results

README waits for updated

Acknowledgement

Codebase from AutoGAN, pytorch-image-models

Citation

if you find this repo is helpful, please cite

@article{jiang2021transgan,
  title={Transgan: Two pure transformers can make one strong gan, and that can scale up},
  author={Jiang, Yifan and Chang, Shiyu and Wang, Zhangyang},
  journal={Advances in Neural Information Processing Systems},
  volume={34},
  year={2021}
}

transgan's People

Contributors

Stargazers

Watchers

Forkers

alighofrani95 sailfish009 lliai yyht codeaudit canbuoy linhduongtuan peterouzh mldl tagore15 sorrowyn z-c-for-school gnn2qsu jbdatascience shadowkun hongyunnchen lzhbrian whitefu chaoso hrgentry ayankumarbhunia b1sounours sts-sadr jawad1347 anj1 yangyin2016 ashlee-lu austinyan owalnuto ekoziol zcxxlshirley ssusantachary martaegorova chaitanya1123 zeta1999 anonymouscodexxx mymuli anime-reality standardgalactic xrosliang teruha dechang64 syp1997 gain-wyj gdh756462786 smallflyingpig braman09 peternara flock1 jxlim89 mtlong jet-zheng louth-bin ltyrk abdelpakey bobbyinfj jerryzhang1119 tonyzhao-jt magicknight xiesibo braveryang markhsia killsking notalex2 thithaotran wushaoyong suppersine wyibo85 ikasumi thesidshah yifeipet forks-learning wcznb sparsel melody47 jianjuly qweasdzxcvde vjerin floricaaa neuroidss jiancheng-ai yeating isaacgn elliottabe hreynaud dtwu0108 githubpgq snoopybingo xiusdk kaiifu ml-lab yangqinzhu songxiao-tt b2220333 sweetyhh yqgans tanmdl cikrhazo c1a1o1 phtu-cs

transgan's Issues

run Transgan with single gpu

hello,your work is very exciting but i need to modify your code to run successfully on a GPU have tried many times,but i have not succeded.May i ask your advice?

which one is the cifar10 checkpoint

As mentioned in title, there seems no cifar10 checkpoint in the google drive. right?

Using Conv?

Hello, thank you for uploading the code for your awesome work.

I'm looking at the code and there is a convolutional layer in the generator (self.deconv). Is there some reason using it?
On the paper it says there are no convolutional layers, so I'm a little bit confused

Thank you

Training for CIFAR-10

Hello,

I've been having a lot of trouble running the most basic model training for CIFAR-10. Is this the correct usage?

python exps/cifar_train.py

The code appeared to stall, and rerunning a second time resulted in "Address already in use", and I wasn't able to find options to run for either a single GPU/run without multiprocessing-distributed. I've tried editing cifar_train.py to specify only a single GPU, and otherwise calling torch.cuda.set_device(...), but the training never goes through.

I did have to change the version of tensorboard from requirements.txt, though I don't see how that would result in these issues. Do you have any advice on how to make the training run and complete?

Thank you!

Training script

Hello!
Thanks for your works on TransGAN, it's amazing! When do you plan to post the training scripts?

About the patchsize in the Generator

Dear author:
Thanks for your job. In your model_search cifa and 256 size code ,I do not find where the patchsize use in the Generator but only in the Discriminator. Can you tell me where it is in your code? Thank you very much.

RuntimeError: The size of tensor a (5) must match the size of tensor b (4097) at non-singleton dimension 1

Before iteration , i meet follow error,what should i do

and the cfg.py

import argparse

def str2bool(v):
if v.lower() in ('yes', 'true', 't', 'y', '1'):
return True
elif v.lower() in ('no', 'false', 'f', 'n', '0'):
return False
else:
raise argparse.ArgumentTypeError('Boolean value expected.')

def parse_args():
parser = argparse.ArgumentParser()
parser.add_argument('--world-size', default=-1, type=int,
help='number of nodes for distributed training')
parser.add_argument('--rank', default=-1, type=int,
help='node rank for distributed training')
parser.add_argument('--loca_rank', default=-1, type=int,
help='node rank for distributed training')
parser.add_argument('--dist-url', default='tcp://224.66.41.62:23456', type=str,
help='url used to set up distributed training')
parser.add_argument('--dist-backend', default='nccl', type=str,
help='distributed backend')
parser.add_argument('--seed', default=12345, type=int,
help='seed for initializing training. ')
parser.add_argument('--gpu', default=0, type=int,
help='GPU id to use.')
parser.add_argument('--multiprocessing-distributed', action='store_true',
help='Use multi-processing distributed training to launch '
'N processes per node, which has N GPUs. This is the '
'fastest way to use PyTorch for either single node or '
'multi node data parallel training')
parser.add_argument(
'--max_epoch',
type=int,
default=200,
help='number of epochs of training')
parser.add_argument(
'--max_iter',
type=int,
default=10,
help='set the max iteration number')
parser.add_argument(
'-gen_bs',
'--gen_batch_size',
type=int,
default=4,
help='size of the batches')
parser.add_argument(
'-dis_bs',
'--dis_batch_size',
type=int,
default=4,
help='size of the batches')
parser.add_argument(
'--g_lr',
type=float,
default=0.0002,
help='adam: gen learning rate')
parser.add_argument(
'--wd',
type=float,
default=0,
help='adamw: gen weight decay')
parser.add_argument(
'--d_lr',
type=float,
default=0.0002,
help='adam: disc learning rate')
parser.add_argument(
'--ctrl_lr',
type=float,
default=3.5e-4,
help='adam: ctrl learning rate')
parser.add_argument(
'--lr_decay',
action='store_true',
help='learning rate decay or not')
parser.add_argument(
'--beta1',
type=float,
default=0.0,
help='adam: decay of first order momentum of gradient')
parser.add_argument(
'--beta2',
type=float,
default=0.9,
help='adam: decay of first order momentum of gradient')
parser.add_argument(
'--num_workers',
type=int,
default=0,
help='number of cpu threads to use during batch generation')
parser.add_argument(
'--latent_dim',
type=int,
default=128,
help='dimensionality of the latent space')
parser.add_argument(
'--img_size',
type=int,
default=256,
help='size of each image dimension')
parser.add_argument(
'--channels',
type=int,
default=3,
help='number of image channels')
parser.add_argument(
'--n_critic',
type=int,
default=1,
help='number of training steps for discriminator per iter')
parser.add_argument(
'--val_freq',
type=int,
default=20,
help='interval between each validation')
parser.add_argument(
'--print_freq',
type=int,
default=100,
help='interval between each verbose')
parser.add_argument(
'--load_path',
type=str,
help='The reload model path')
parser.add_argument(
'--exp_name',
type=str,
default='Test',
help='The name of exp')
parser.add_argument(
'--d_spectral_norm',
type=str2bool,
default=False,
help='add spectral_norm on discriminator?')
parser.add_argument(
'--g_spectral_norm',
type=str2bool,
default=False,
help='add spectral_norm on generator?')
parser.add_argument(
'--dataset',
type=str,
default='cifar10',
help='dataset type')
parser.add_argument(
'--data_path',
type=str,
default='./data',
help='The path of data set')
parser.add_argument('--init_type', type=str, default='normal',
choices=['normal', 'orth', 'xavier_uniform', 'false'],
help='The init type')
parser.add_argument('--gf_dim', type=int, default=64,
help='The base channel num of gen')
parser.add_argument('--df_dim', type=int, default=64,
help='The base channel num of disc')
parser.add_argument(
'--gen_model',
type=str,
default='ViT_custom_rp',
help='path of gen model')
parser.add_argument(
'--dis_model',
type=str,
default='ViT_custom_rp',
help='path of dis model')
parser.add_argument(
'--controller',
type=str,
default='controller',
help='path of controller')
parser.add_argument('--eval_batch_size', type=int, default=100)
parser.add_argument('--num_eval_imgs', type=int, default=50000)
parser.add_argument(
'--bottom_width',
type=int,
default=4,
help="the base resolution of the GAN")
parser.add_argument('--random_seed', type=int, default=12345)

# search
parser.add_argument('--shared_epoch', type=int, default=15,
                    help='the number of epoch to train the shared gan at each search iteration')
parser.add_argument('--grow_step1', type=int, default=25,
                    help='which iteration to grow the image size from 8 to 16')
parser.add_argument('--grow_step2', type=int, default=55,
                    help='which iteration to grow the image size from 16 to 32')
parser.add_argument('--max_search_iter', type=int, default=90,
                    help='max search iterations of this algorithm')
parser.add_argument('--ctrl_step', type=int, default=30,
                    help='number of steps to train the controller at each search iteration')
parser.add_argument('--ctrl_sample_batch', type=int, default=1,
                    help='sample size of controller of each step')
parser.add_argument('--hid_size', type=int, default=100,
                    help='the size of hidden vector')
parser.add_argument('--baseline_decay', type=float, default=0.9,
                    help='baseline decay rate in RL')
parser.add_argument('--rl_num_eval_img', type=int, default=5000,
                    help='number of images to be sampled in order to get the reward')
parser.add_argument('--num_candidate', type=int, default=10,
                    help='number of candidate architectures to be sampled')
parser.add_argument('--topk', type=int, default=5,
                    help='preserve topk models architectures after each stage' )
parser.add_argument('--entropy_coeff', type=float, default=1e-3,
                    help='to encourage the exploration')
parser.add_argument('--dynamic_reset_threshold', type=float, default=1e-3,
                    help='var threshold')
parser.add_argument('--dynamic_reset_window', type=int, default=500,
                    help='the window size')
parser.add_argument('--arch', nargs='+', type=int,
                    help='the vector of a discovered architecture')
parser.add_argument('--optimizer', type=str, default="adam",
                    help='optimizer')
parser.add_argument('--loss', type=str, default="hinge",
                    help='loss function')
parser.add_argument('--n_classes', type=int, default=0,
                    help='classes')
parser.add_argument('--phi', type=float, default=1,
                    help='wgan-gp phi')
parser.add_argument('--grow_steps', nargs='+', type=int,default=[50,100,150],
                    help='the vector of a discovered architecture')
parser.add_argument('--D_downsample', type=str, default="avg",
                    help='downsampling type')
parser.add_argument('--fade_in', type=float, default=1,
                    help='fade in step')
parser.add_argument('--d_depth', type=int, default=7,
                    help='Discriminator Depth')
parser.add_argument('--g_depth', type=str, default="5,4,2",
                    help='Generator Depth')
parser.add_argument('--g_norm', type=str, default="ln",
                    help='Generator Normalization')
parser.add_argument('--d_norm', type=str, default="ln",
                    help='Discriminator Normalization')
parser.add_argument('--g_act', type=str, default="gelu",
                    help='Generator activation Layer')
parser.add_argument('--d_act', type=str, default="gelu",
                    help='Discriminator activation layer')
parser.add_argument('--patch_size', type=int, default=4,
                    help='Discriminator Depth')
parser.add_argument('--fid_stat', type=str, default="./fid_stat/fid_camera.npz",
                    help='Discriminator Depth')
parser.add_argument('--diff_aug', type=str, default="None",
                    help='differentiable augmentation type')
parser.add_argument('--accumulated_times', type=int, default=1,
                    help='gradient accumulation')
parser.add_argument('--g_accumulated_times', type=int, default=1,
                    help='gradient accumulation')
parser.add_argument('--num_landmarks', type=int, default=64,
                    help='number of landmarks')
parser.add_argument('--d_heads', type=int, default=4,
                    help='number of heads')
parser.add_argument('--dropout', type=float, default=0.,
                    help='dropout ratio')
parser.add_argument('--ema', type=float, default=0.995,
                    help='ema')
parser.add_argument('--ema_warmup', type=float, default=0.,
                    help='ema warm up')
parser.add_argument('--ema_kimg', type=int, default=500,
                    help='ema thousand images')
parser.add_argument('--latent_norm',action='store_true',
    help='latent vector normalization')
parser.add_argument('--ministd',action='store_true',
    help='mini batch std')
parser.add_argument('--g_mlp', type=int, default=4,
                    help='generator mlp ratio')
parser.add_argument('--d_mlp', type=int, default=4,
                    help='discriminator mlp ratio')
parser.add_argument('--g_window_size', type=int, default=8,
                    help='generator mlp ratio')
parser.add_argument('--d_window_size', type=int, default=8,
                    help='discriminator mlp ratio')
parser.add_argument('--show', action='store_true',
                help='show')

opt = parser.parse_args()

return opt

Where is implementation of MT-CT?

Hello, When I saw the training generator code in funtions.py, there was no the part corresponding to the multi-task co-training(MT-CT). Did i miss the part of MT-CT??

Thanks!

longformer model not present

After commit ccfcd8c, code fails to work because the longformer_8_8_1 model is not available. Please upload the model, thanks.

Is the function `get_attn_mask` the same one used in the paper on arxiv?

Hi,
When I try to recreate the attention mask using the function get_attn_mask in models/TransGAN_8_8_G2_1.py or models/TransGAN_8_8_1.py

I end up with the image below.
The image on the left is the output from get_attn_mask with N=32*32, w = 25.
The image on the right is reshaping the mask for 495th row (or pixel) into into a 32x32 image (like in the paper).
The grey color represents that pixel.

As it's visible the outputs won't match up with the paper for any value of w.

I appreciate any help that you can provide. Thanks!

(PS: Added the image from the paper for reference.)

Question about GAN training.

Hi, thanks for the work. Follow the training code in functions.py, it seems that you did not freeze D when training G. When running dis_optimizer.step(), the gradient from G training will also be used to update D's parameters. So I wonder whether if I missed something, or it was a bug here. Thanks a lot!

Please advise on how to use text versus images as data

What steps in modifying the code would I need to perform to use text in pipeline versus images?

I’d like to try this with generating text versus images.

The model code

I want to know if you could push the code of the generator and discriminator.

celeba

The size of your data set celeba is 6464, 128128, 256*256, so how do you process the data to get different sizes ?

Error occurs when training on Cifar-10

Excuse me. I tried to train the transGAN on cifar10 dataset by using the script in "exps/cifar_train.py", but error occurs with RuntimeError: The size of tensor a (64) must match the size of tensor b (256) at non-singleton dimension 3 in line attn = attn + relative_position_bias.unsqueeze(0) and real_validity = dis_net(real_imgs).

I print the shape of tensor 'attn' and 'relative_position_bias' with (64, 4, 64, 64) and (4, 256, 256) . I haven't modify the code in discriminator.

I don't know how to solve this problem.

Could you please provide the celeba-hq-256 checkpoint?

Hi,

I saw the cifar10 checkpoint but can not find the celeba-hq checkpoint link. Could you please kindly release the celeba-hq-256 checkpoint? Thanks a lot.

Best,
Yang

loss

can you share the loss curve during training？ i found that the losses do not converge during training.

Linear Unflatten layer

As I understand, your paper wants to completely remove convolution layers. But in code, for the linear unflatten layer (to obtain RGB image), I see you use conv2d. Why do you use this? Is there anyway that we can get RGB image without conv2d?

Image sizes don't match

Commit 1c51d9f breaks the size of some of the layers in the checkpoint when running the STL10 test:

RuntimeError: Error(s) in loading state_dict for DataParallel:
        size mismatch for module.pos_embed: copying a param with shape torch.Size([1, 145, 384]) from checkpoint, the shape in current model is torch.Size([1, 2305, 384]).
        size mismatch for module.patch_embed.weight: copying a param with shape torch.Size([384, 3, 4, 4]) from checkpoint, the shape in current model is torch.Size([384, 3, 1, 1]).

Would you be able to upload the new checkpoint, thank you.

About the role of function “get_attn_mask”.

Thanks for your work！

When I'm looking at the code of model/Celeba64_TransGAN.py , I notice that the function “get_attn_mask” seems to play a role in the training process.Can you point out the specific role of this function?

Thank you~~

How finetune the code on my own dataset?

Can generate 1024x1024 images?

你好，请问我应该如何复现论文效果，例如cifar10上的效果，如何加载预训练模型？是使用fid stat中的文件么

需要复现一下您这篇论文，不知道如何验证模型效果，如何加载预训练模型，十分感谢

model mismatch

Hi,thank u for your great work.But I got some problems when testing your provided model from https://drive.google.com/file/d/1Td9baoNua6jNtVvsnJczW1u4QWa1w9sl/view?usp=sharing.
It seemed that the network of trained model is different from that in model_search.

which is the defualt gen_model

models_search has no attribute shared_gan

The result FID on CIFAR10 could not be reimplemented.

Hi,

My environment configuration is same as requirements.txt except tensorflow uninstalled.

I run python exps/cifar_train.py with 4 GeForce GTX TITAN (12G).

The slowest FID is 10.72, which is much higher than 9.26 in paper.

My train log:-----------------------------
2021-08-01 11:50:07,095 Namespace(D_downsample='avg', accumulated_times=1, arch=None, baseline_decay=0.9, batch_size=16, beta1=0.0, beta2=0.99, bottom_width=8, channels=3, controller='controller', ctrl_lr=0.00035, ctrl_sample_batch=1, ctrl_step=30, d_act='gelu', d_depth=3, d_heads=4, d_lr=0.0001, d_mlp=4, d_norm='ln', d_spectral_norm=False, d_window_size=8, data_path='./data', dataset='cifar10', df_dim=384, diff_aug='translation,cutout,color', dis_batch_size=16, dis_model='ViT_custom_scale2', dist_backend='nccl', dist_url='tcp://localhost:14256', distributed=True, dropout=0.0, dynamic_reset_threshold=0.001, dynamic_reset_window=500, ema=0.9999, ema_kimg=500, ema_warmup=0.1, entropy_coeff=0.001, eval_batch_size=8, exp_name='cifar_train', fade_in=0.0, fid_stat='None', g_accumulated_times=1, g_act='gelu', g_depth='5,4,2', g_lr=0.0001, g_mlp=4, g_norm='ln', g_spectral_norm=False, g_window_size=8, gen_batch_size=32, gen_model='ViT_custom', gf_dim=1024, gpu=0, grow_step1=25, grow_step2=55, grow_steps=[0, 0], hid_size=100, img_size=32, init_type='xavier_uniform', latent_dim=256, latent_norm=False, load_path=None, loca_rank=-1, loss='wgangp-eps', lr_decay=False, max_epoch=2558.0, max_iter=500000, max_search_iter=90, ministd=False, multiprocessing_distributed=True, n_classes=0, n_critic=4, num_candidate=10, num_eval_imgs=20000, num_landmarks=64, num_workers=4, optimizer='adam', patch_size=2, path_helper={'prefix': 'logs/cifar_train_2021_08_01_11_50_07', 'ckpt_path': 'logs/cifar_train_2021_08_01_11_50_07/Model', 'log_path': 'logs/cifar_train_2021_08_01_11_50_07/Log', 'sample_path': 'logs/cifar_train_2021_08_01_11_50_07/Samples'}, phi=1.0, print_freq=50, random_seed=12345, rank=0, rl_num_eval_img=5000, seed=12345, shared_epoch=15, show=False, topk=5, val_freq=20, wd=0.001, world_size=4)
2021-08-01 16:00:22,196 => calculate inception score
2021-08-01 16:03:21,008 Inception score: 0, FID score: 74.7215576171875 || @ epoch 20.
2021-08-01 20:01:06,548 => calculate inception score
2021-08-01 20:04:05,234 Inception score: 0, FID score: 49.187530517578125 || @ epoch 40.
2021-08-02 00:01:34,926 => calculate inception score
2021-08-02 00:04:33,541 Inception score: 0, FID score: 41.36199951171875 || @ epoch 60.
2021-08-02 04:01:46,366 => calculate inception score
2021-08-02 04:04:45,042 Inception score: 0, FID score: 34.57147216796875 || @ epoch 80.
2021-08-02 08:02:00,014 => calculate inception score
2021-08-02 08:04:58,443 Inception score: 0, FID score: 28.511077880859375 || @ epoch 100.
2021-08-02 12:02:32,033 => calculate inception score
2021-08-02 12:05:30,596 Inception score: 0, FID score: 23.330780029296875 || @ epoch 120.
2021-08-02 16:02:47,431 => calculate inception score
2021-08-02 16:05:46,199 Inception score: 0, FID score: 19.77392578125 || @ epoch 140.
2021-08-02 20:09:35,037 => calculate inception score
2021-08-02 20:12:33,914 Inception score: 0, FID score: 16.3189697265625 || @ epoch 160.
2021-08-03 00:10:55,077 => calculate inception score
2021-08-03 00:13:53,645 Inception score: 0, FID score: 14.2860107421875 || @ epoch 180.
2021-08-03 04:12:10,663 => calculate inception score
2021-08-03 04:15:09,299 Inception score: 0, FID score: 13.2266845703125 || @ epoch 200.
2021-08-03 08:13:34,201 => calculate inception score
2021-08-03 08:16:32,938 Inception score: 0, FID score: 12.24041748046875 || @ epoch 220.
2021-08-03 12:14:29,228 => calculate inception score
2021-08-03 12:17:27,896 Inception score: 0, FID score: 11.8358154296875 || @ epoch 240.
2021-08-03 16:15:31,432 => calculate inception score
2021-08-03 16:18:51,338 Inception score: 0, FID score: 11.555419921875 || @ epoch 260.
2021-08-03 20:17:01,392 => calculate inception score
2021-08-03 20:20:00,398 Inception score: 0, FID score: 11.2257080078125 || @ epoch 280.
2021-08-04 00:17:57,224 => calculate inception score
2021-08-04 00:20:56,055 Inception score: 0, FID score: 10.96759033203125 || @ epoch 300.
2021-08-04 04:18:56,332 => calculate inception score
2021-08-04 04:21:55,526 Inception score: 0, FID score: 10.788848876953125 || @ epoch 320.
2021-08-04 08:19:56,183 => calculate inception score
2021-08-04 08:22:58,059 Inception score: 0, FID score: 10.72705078125 || @ epoch 340.
2021-08-04 12:20:55,906 => calculate inception score
2021-08-04 12:23:54,770 Inception score: 0, FID score: 11.013397216796875 || @ epoch 360.
2021-08-04 16:21:42,184 => calculate inception score
2021-08-04 16:24:40,722 Inception score: 0, FID score: 11.45196533203125 || @ epoch 380.
2021-08-04 20:22:26,842 => calculate inception score
2021-08-04 20:25:25,717 Inception score: 0, FID score: 11.9810791015625 || @ epoch 400.
2021-08-05 00:23:18,124 => calculate inception score
2021-08-05 00:26:17,206 Inception score: 0, FID score: 12.340728759765625 || @ epoch 420.
2021-08-05 04:24:06,863 => calculate inception score
2021-08-05 04:27:05,850 Inception score: 0, FID score: 13.225006103515625 || @ epoch 440.
2021-08-05 08:25:24,152 => calculate inception score
2021-08-05 08:28:23,146 Inception score: 0, FID score: 13.89190673828125 || @ epoch 460.
2021-08-05 12:27:29,588 => calculate inception score
2021-08-05 12:30:28,551 Inception score: 0, FID score: 14.103118896484375 || @ epoch 480.
2021-08-05 16:28:49,395 => calculate inception score
2021-08-05 16:31:48,632 Inception score: 0, FID score: 14.109130859375 || @ epoch 500.
2021-08-05 20:30:04,791 => calculate inception score
2021-08-05 20:33:03,678 Inception score: 0, FID score: 14.337677001953125 || @ epoch 520.
2021-08-06 00:31:27,460 => calculate inception score
2021-08-06 00:34:26,133 Inception score: 0, FID score: 14.567657470703125 || @ epoch 540.
2021-08-06 04:32:47,586 => calculate inception score
2021-08-06 04:35:46,843 Inception score: 0, FID score: 14.43280029296875 || @ epoch 560.
2021-08-06 08:34:08,444 => calculate inception score
2021-08-06 08:37:07,550 Inception score: 0, FID score: 14.98822021484375 || @ epoch 580.
2021-08-06 12:35:32,232 => calculate inception score
2021-08-06 12:38:31,877 Inception score: 0, FID score: 14.719512939453125 || @ epoch 600.
2021-08-06 16:37:05,458 => calculate inception score
2021-08-06 16:40:04,417 Inception score: 0, FID score: 14.872100830078125 || @ epoch 620.
2021-08-06 20:38:30,419 => calculate inception score
2021-08-06 20:41:29,706 Inception score: 0, FID score: 14.83355712890625 || @ epoch 640.
2021-08-07 00:39:53,802 => calculate inception score
2021-08-07 00:42:52,606 Inception score: 0, FID score: 15.34588623046875 || @ epoch 660.
2021-08-07 04:41:16,504 => calculate inception score
2021-08-07 04:44:14,945 Inception score: 0, FID score: 15.1998291015625 || @ epoch 680.
2021-08-07 08:42:53,671 => calculate inception score
2021-08-07 08:45:52,648 Inception score: 0, FID score: 15.45361328125 || @ epoch 700.

Model Size

Could you please report the model size of the proposed TransGAN? Compared with the StyleGAN and StyleGAN2 of resolution=256

Generated Images Have Some Blocking Artifact

Due to the patch-wise generation of TransGAN, I found some blocking artifacts in your generations. I think the authors had already known these phenomena. Are there any tricks to eliminate these artifacts?

models_search

in 'models_search" file, why there don't have "building_blocks_search" ?

Training fails

I've tried using the function train in functions.py, and training seems to fail:

  File "functional.py", line 86, in adam
    exp_avg.mul_(beta1).add_(grad, alpha=1 - beta1)
RuntimeError: output with shape [768] doesn't match the broadcast shape [3, 48, 1, 768]
``

UnboundLocalError: local variable 'gen_net' referenced before assignment

excuse me, sorry to disturb you. When I run the train_derived.py, the terminal appeared the "UnboundLocalError: local variable 'gen_net' referenced before assignment". The sentences of if , elif and else are used too much, I think. How can I fix it. Thanks.

Self-Attention

Why is in the self-attention layer you have a linear projection after multiplying the value matrix with the attention weights ?
In this overview you can see, as much as I can tell that it doesn't appear: https://arxiv.org/pdf/1906.01529.pdf.
In some implementations it doesn't appear and in some it does, what is the impact of it to the attention ?
Is it add weights to the attention for its better configuration ?

self.proj = nn.Linear(dim, dim)

in theAttention model:

class Attention(nn.Module):
    def __init__(self, dim, num_heads=8, qkv_bias=False, qk_scale=None, attn_drop=0., proj_drop=0., is_mask=0):
        super().__init__()
        self.num_heads = num_heads
        head_dim = dim // num_heads
        # NOTE scale factor was wrong in my original version, can set manually to be compat with prev weights
        self.scale = qk_scale or head_dim ** -0.5

        self.qkv = nn.Linear(dim, dim * 3, bias=qkv_bias)
        self.attn_drop = nn.Dropout(attn_drop)
        self.proj = nn.Linear(dim, dim)
        self.proj_drop = nn.Dropout(proj_drop)
        self.mat = matmul()
        self.is_mask = is_mask
        self.remove_mask = False
        self.mask_4 = get_attn_mask(is_mask, 4)
        self.mask_5 = get_attn_mask(is_mask, 5)
        self.mask_6 = get_attn_mask(is_mask, 6)
        self.mask_7 = get_attn_mask(is_mask, 7)
        self.mask_8 = get_attn_mask(is_mask, 8)
        self.mask_10 = get_attn_mask(is_mask, 10)

Training on single GPU

What script should I use to train TransGAN (celeba_hq dataset) on a single GPU? I am using a PC with 4GB NVIDIA GTX 1650.

train

Hi, I'm trying to train the model on CelebA64*64. as you said that the mask plays a big role in the training stage, could you please tell me how to set the "is_mask" argument? Also, could you please tell me how to set the "drop","attn_drop" and "drop_path" in the "Block" class when initializing it? Thank you very much!

Can this model generate 128 or 256 resolution images?

Can this model generate 128 or 256 resolution images? If so, what is the cfg of corresponding situation?
I tried 128 with the parameter "bottom_width" equal to 16 , but single GPU card with 24 GB seems not enough.

training code

Hello! When will the training code be updated, or how should I write it? Could you please guide me, thank you very much.

The trick of exponential moving average weights for the generator

In train_derived.py, line 257-270, the moving average weights of G is used for evaluation.

Could you please explain why? And would the performance be influenced seriously without the trick?

Thanks.

Question

May I ask why the mask is used in the Transformer Encoder? It seems that the original implemention only uses the mask in the Transformer Decoder.Thanks.

celeba image size

in your data set, what is the size of the celeba data you use ？

Custom dataset HOW-TO and is this capable of conditional GAN ?

I have 2 questions:

Can you please share how to train custom dataset (structure of data path) ?
Is this model capable to train conditional GAN ?

Thanks,
Steve

Evolution of the generations during training

Hello, I was wondering if you had available anywhere some samples of the outputs of the GAN a different stages of training: like before training, after N epochs, etc... To get an idea of how the model is reaching its goal, and what to expect while training to see if I'm way out of the path. I have been trying to make something very similar with limited success, and always have to fall back to some convolutional layers (like putting a Conv2D after every attention layer) to get any relevant results...

Thanks!

fid

what is 'fid_stats_celeba_hq_256.npz '? how to get it ? and the calculate of 'fid stat' is necessary ?

Checkpoint doesn't match

Commit 9442615 breaks the checkpoint loading:

RuntimeError: Error(s) in loading state_dict for DataParallel:
        Unexpected key(s) in state_dict: "module.to_rgb.0.weight", "module.to_rgb.0.bias", "module.to_rgb.0.running_mean", "module.to_rgb.0.running_var", "module.to_rgb.0.num_batches_tracked".

Could you upload the updated checkpoint, thanks.