igul222 / improved_wgan_training Goto Github PK

View Code? Open in Web Editor NEW

2.3K 2.3K 671.0 174 KB

Code for reproducing experiments in "Improved Training of Wasserstein GANs"

License: MIT License

Python 100.00%

improved_wgan_training's People

Contributors

Stargazers

Watchers

Forkers

ml-lab jfsantos johndpope mbavar vyraun bkjackson scitator sxjscience mlpanda bityangke dylandrover mylearning2017 jdc08161063 wanjinchang liuzhenhzong stevenlol sunjieee zhangyuancv hundred06 benjamesbabala dreadlord1984 yangerkun ieee820 wmengti canoefzh nonvolatilememory collector-m hma02 kingstorm byzhang kevinwenya kingofoz xingdi-eric-yuan leezqcst chongyang915 einsnull northerntree coocoky hdnse4798 furyphoenix livey sahpat229 tigerneil ieswxia hblu maozhiqiang agneselee gxlcliqi sunxingxingtf zhangxujinsh co9olguy chunniunai220ml sssxzzz phoenixdai wolfhu leiup dapeng2018 liuchenxjtu xjwxjw multipath appierys pandamax nanfengpo kekedan gwding tony32769 aalitaiga c1a1o1 ljwdust liaoheping frizfealer iwtw pingchunzhang ieyer xi-studio tsc2017 lim0606 eternallovelin hito0512 yushanshan05 liupeng89 zhangzhaofeng brianherman jadore801120 notlaughinggirl cookiegg michaelfeng87 vseledkin niumeng07 denizokt xushao dltlqqns jsupancic decade2014 hcg2008 xiaojingyi iij0 mengxiaomao hades210 shuolongbj

improved_wgan_training's Issues

gradients = tf.gradients(disc_interpolates, [interpolates])[0]

gradients = tf.gradients(disc_interpolates, [interpolates])[0] what does [0] mean at the end? I am really confused by this index. Thank you very much!

problems with running the gan_toy.py

Hi, I am running the gan_toy.py on python3, and I got some problems like below, I don't know if this happens only because I am running the code on a different version of python or lack of gpu. Please help me, thanks a lot!

no gradient error when run gan_cifar_resnet.py with BN

When I run gan_cifar_resnet.py with NORMALIZATION_D=True(my tensorflow version is 1.2.0), an error occurred No gradient defined for operation 'gradients/FusedBatchNorm_62_grad/FusedBatchNormGrad' as below:

Can you tell me how to solve it?
Thanks!

How to train on 1000x1000 images? with multi-gpu settings

Hi,

I tried a couple of time to change the resolution of input image to higher ones, but failed due to some errors which I could not debug. Has someone already tried bigger images than 64x64s?

About the graph of the gradients respect to inputs

I'm not familiar with tensorflow, I want to change this code to pytorch. But I found that the graph of the gradients respect to interpolates data don't connect to the graph of the net. It's just a single variable. So if I plus gradient penalty to loss, it won't work, it won't backward error to net. So do the graph of gradients connect to the graph of net with tensorflow ?

Multi GPU training type

Is the type of Multi GPU training that was implemented in this project Async SGD or Sync SGD? Does tensorflow protect automatically against stale gradients? Just wondering for one of my own projects. I have implemented Sync SGD using the multi gpu cifar example, and another using a similar technique to the one in this repo. The one of this repo appears to be far more adjustable and ends up being faster since the gradients are left to tf to setup. Do you know if this is Async or Sync SGD? Please let me know if you do.

Use different generated fake images to calculate D_loss and G_loss?

I see when calculating D_loss, you use the fake images generated by this https://github.com/igul222/improved_wgan_training/blob/master/gan_cifar_resnet.py#L192 , but when calculating the G_loss, you use another generated images by this https://github.com/igul222/improved_wgan_training/blob/master/gan_cifar_resnet.py#L293 .

Since I'm new to GAN and this is not the same as other implementations, could anyone tell me if this implementation (use different generated fake images to calculate D_loss and G_loss) is right? And what's the difference between them?

Cannot reproduce the inception score in the paper

Hi,

Excellent work! I am trying to reproduce the 8.42 +- 0.1 inception score in the paper by running gan_cifar_resnet.py for 100000 iterations (which takes about 3 days). Finally I got 8.15 +- 0.08 inception score. Is it because hyperparameters? Do you have any suggested hyperparameter setting to reproduce the experiment?

Thanks!

iter 93999 inception_50k_std 0.108622521162 acc_fake 0.916953146458 time 1.41457112956 dev_cost 1.73758101463 inception_50k 8.30711174011 cost -1.07328498363 acgan
0.00449406914413 acc_real 0.998765647411 wgan -1.07777905464
iter 94999 inception_50k_std 0.0919005274773 acc_fake 0.917750000954 time 1.41755345893 dev_cost 1.74964499474 inception_50k 8.24363517761 cost -1.08436715603 acgan
0.00362730911002 acc_real 0.999234378338 wgan -1.08799433708
iter 95999 inception_50k_std 0.0791481882334 acc_fake 0.918562471867 time 1.42350446486 dev_cost 1.75195538998 inception_50k 8.18746566772 cost -1.09005200863 acgan
0.00292201270349 acc_real 0.999484360218 wgan -1.09297394753
iter 96999 inception_50k_std 0.0875578373671 acc_fake 0.917046904564 time 1.42211779284 dev_cost 1.7690885067 inception_50k 8.21474838257 cost -1.09408867359 acgan
0.0020965943113 acc_real 0.999828100204 wgan -1.09618532658
iter 97999 inception_50k_std 0.0880940034986 acc_fake 0.917890608311 time 1.42067415953 dev_cost 1.7812871933 inception_50k 8.22119998932 cost -1.09451711178 acgan
0.00177419034299 acc_real 0.999937474728 wgan -1.09629142284
iter 98999 inception_50k_std 0.116663098335 acc_fake 0.916062474251 time 1.42149197221 dev_cost 1.7917330265 inception_50k 8.21636009216 cost -1.10520339012 acgan
0.00143045117147 acc_real 0.999953150749 wgan -1.10663378239
iter 99999 inception_50k_std 0.0829349905252 acc_fake 0.917703151703 time 1.41688520479 dev_cost 1.80293142796 inception_50k 8.14605140686 cost -1.10709547997 acgan
0.00111517717596 acc_real 1.0 wgan -1.10821044445

High discriminator values at edges

Hey, I have been trying to make the improved wgan work on my dataset (using the toy examples) but I am getting the weird behaviour that the discriminator learns really high values at the edges of my data space, even though there are close to no samples there.

I don't know if you have any intuition as to what might cause this?
Practically it seems like the generator learns to spawn from the correct distribution but for some reason the discriminator assigns huge values to areas where it hasn't seen any real or fake samples.

conditional wgan

Could you release your source code of conditional WGAN-GP for cifar 10 (as shown in Figure 5 in your updated v2 manuscript)? Thanks.

I want to recurrent this Method with torch

I want to recurrent this Method with torch.But I don't no how to compute the gradient of 'the penalty of gradient' to update D's parameters.
Any suggestions?

Countour plotting needs transposition

Hey, I noticed a bug in your contour plotting as in this function: https://github.com/igul222/improved_wgan_training/blob/master/gan_toy.py#L163. You have to transpose the height values for the contour using .transpose()
i.e. plt.contour(x,y,disc_map.reshape((len(x), len(y))).transpose())

The reason the results look ok for the gaussian toy examples is because they are symmetrical. Try removing a few gaussians from the 8gaussian example like this and see what happens:

    elif DATASET == '8gaussians':
        scale = 2.
        centers = [
            (1,0),
            #(-1,0),
            #(0,1),
            (0,-1),
            (1./np.sqrt(2), 1./np.sqrt(2)),
            (1./np.sqrt(2), -1./np.sqrt(2)),
            #(-1./np.sqrt(2), 1./np.sqrt(2)),
            #(-1./np.sqrt(2), -1./np.sqrt(2))
        ]

This might improve your results on the swiss roll as well since it's not perfectly symmetrical.

Had me scratching my head for a bit on my own unsymmetrical dataset wondering why the controus looked so wrong haha.

Cheers

SubpixelConv2D or Deconv2D?

@igul222 Thanks for your good code!

I am confused about why you use SubpixelConv2D for shortcut upsampling but not Deconv2D? SubpixelConv2D brings better results?

Issues with inference when using BatchNorm

I'm trying to run some experiments using your model defined in gan_cifar_resnet.py.
However when doing inference, I've noticed some variations in samples that should be the same (e.g. when doing interpolations between 2 constant endpoints in latent space, the generated images of said endpoints don't remain exactly the same as they should). I'm suspecting this is because of BatchNorm which in the standard implementation is not set to differentiate between training and inference and keeps updating its internal values during inference.
I've tried passing the is_training parameter to lib.ops.batchnorm.Batchnorm(), and also tried switching to the commented-out "standard version", with no avail.
When I pass a constant boolean tensor to the is_training parameter and set update_moving_stats=False it runs, but I get completely "overblown" (very bright, mostly primary colors) output images.

Can somone tell me how to do this properly?

On another note, i've also noticed that the "vanilla-conditional" implementation does not work, as the conditional version of layernorm is missing... How would I go about using this?

cifar10 IOError: [Errno 2] No that directory: 'cifar10/data_batch_1

My cifar dataset:
batches.meta.txt data_batch_2.bin data_batch_4.bin readme.html
data_batch_1.bin data_batch_3.bin data_batch_5.bin test_batch.bin

when I run: python gan_cifar10 get error:
Traceback (most recent call last):
File "gan_cifar.py", line 171, in
train_gen, dev_gen = lib.cifar10.load(BATCH_SIZE, data_dir=DATA_DIR)
File "/mnt/data1/daniel/codes/GAN/improved_wgan_training/tflib/cifar10.py", line 32, in load
cifar_generator(['data_batch_1','data_batch_2','data_batch_3','data_batch_4','data_batch_5'], batch_size, data_dir),
File "/mnt/data1/daniel/codes/GAN/improved_wgan_training/tflib/cifar10.py", line 17, in cifar_generator
all_data.append(unpickle(data_dir + '/' + filename))
File "/mnt/data1/daniel/codes/GAN/improved_wgan_training/tflib/cifar10.py", line 9, in unpickle
fo = open(file, 'rb')
IOError: [Errno 2]No such directory: '/mnt/data1/daniel/codes/GAN/improved_wgan_training/data/cifar10/data_batch_1'

Inconsistency in interpolation.

Hi! Awesome work :)

I've noticed that in all files (but no in gan_toy.py) you use interpolates = real_data + (alpha*differences) but not interpolates = (1-alpha)*real_data + (alpha*differences). I don't know if results in paper where trained with that inconsistency or if it changes something, but I just wanted to mention it and ask if this is a bug or not.

Out Of Memory Error On Gan_Cifar Inception Scoring

Hi I am having an issue where everytime the get_inception_scores() method is called the GPU I am using (GTX 950) runs out of memory. Do you have any suggestions for changes I could make to still see the inception scores? Would it be possible to do this in batches instead of all at once?

A question about the structure of resnet

hi,thx for your code.
I have a question about the structure of resnet.I find that residual block's output is shortcut + (0.3*output) instead of shortcut + output.Is there any theoretical basis for it?Or it is a Experimental conclusion.It is not the same as the original resnet.

And the code is easy to read,but There is a place I do not understand : gen_64x64.py line 530 _dev_disc_cost = session.run(disc_cost, feed_dict={all_real_data_conv: _data}).Is it should be _dev_disc_cost = session.run(disc_cost, feed_dict={all_real_data_conv: images}).
thx

I wonder why my discriminater's loss is a positive value sometimes

when i use my own data,the loss of discriminater is a positive value, I want to know what should I do?

Gradient penalty on the generator

It seems that the gradient penalty term also has non-zero gradient w.r.t. the generator parameters. Shouldn't the penalty term be added to both the discriminator and the generator objective to ensure that the mini-max procedure finds a saddle point?

Decreasing Learning Rate Improves Convergence

Hey @igul222, thanks again for releasing this code. Has really helped me experiment with WGANs. I noticed in your paper protocol that you do NOT decrease the learning rate over 200k iterations.

I have found however, that after these first 200k iterations, if you decrease the learning rate of the discriminator it can find lower minima that defines a more accurate wasserstein distance. Was there a reason why you guys did not do this in your paper? Does this break any theoretical assumptions?

CPU BiasOp only supports NHWC.

Any solution on the error below?
E tensorflow/core/common_runtime/executor.cc:594] Executor failed to create kernel. Invalid argument: CPU BiasOp only supports NHWC.

Help with convergence

Hi all,

My team is running WGAN with gradient penalty with a ResNet generator/discriminator as in the code. We're trying to train the gan to generate mage data- 64x64 black and white spectrograms (i.e one channel instead of three).
However, our training runs keep suffering from seeming mode collapse. We get samples like:

both early on, and after a lot of iterations. We've played around with learning rates, the LAMBDA gp parameter, number of residual blocks, number of critic iters per gen iter, and even tried essentially pretraining the critic to ensure it was pretty optimal (at least in distinguishing noise from valid samples). We've doublechecked our data feeding algorithms to make sure that the real images are what they should be and get normalized correctly, in the same way as the imagenet example.

Any suggestions? We've been stuck for a while :(

Interpreting Critic Loss To Improve Convergence

Hey @igul222, thanks again for your help with comments earlier.

I've been rereading both WGAN papers, and I understand that the critic loss is supposed to estimate the wasserstein distance. Below is a few ideas I had to improve convergence.

Suppose you have a training where the critic loss starts at -170.0 and then tapers down to -30.0 by generator iteration 200k. Despite more training and lowering the learning rate, it stays at this -30.0.

Is it correct to interpret this finding as a problem with the generator? Shouldn't the generator architecture be designed in such a way so that this -30.0 approaches 0?

Another way of asking this is: To improve convergence, isn't it clear that it is generator's fault? The only way I could see it being the critic's fault is if the wasserstein distance approaches 0. In this case, it is probably that the critic doesn't fully capture the wasserstein distance.

From this line of thinking, isn't it wise to overpower the generator (increase generator's num of layers or dimensionality) until you hit a wasserstein distance of 0? Once, you hit this distance of 0ish, then it is justifiable to increase the critic's architecture.

Same image is produced when input size is 1

Hi,

Thanks for sharing this repo.
For my experiments, I used the gan_cifar.py and trained it for 50k iterations on wgan mode and am able to generate varied images.
On the other hand, when I give a size of (1,128) for the input vector, and randomly sample from a normal distribution, the same sample is produced no matter the input.

Any idea why this would happen or directions to solving this issue would be great.

Thanks in advance!

Is the gradient penalty loss problematic when input image is large?

Hi,
Here the sum of squares is computed
slopes = tf.sqrt(tf.reduce_sum(tf.square(gradients), reduction_indices=[1]))

However, when the input image is extremely large, the dimension of gradients would be huge. It seems possible that the resulting slopes would be extremely large compared to 1.
Is this the case?

G loss increases much suddenly?

@igul222

I have implemented wgan-gp by myself, In my application, G loss sometimes increases much suddenly

But D loss is still stable

And the gradient penalty is also stable

Any idea？ thanks!

Improve Convergence By Tracking Historical Generated Outputs

Hey @igul222 , just wanted to add a small update.

I've found that if you add the historical buffer as Apple detailed here, you can increase performance slightly and convergence faster.

Basically, when you train the discriminator, half the inputs generated, and the other half are from a buffer that contains previously generated samples. This allows the discriminator to "encompass" a wider range of fake examples.

However, when it comes to WGAN, I thought this technique would hurt the discriminator since we are estimating the wasserstein distance. However, in practice this has only helped me and wanted to let you know.

It is critical that when you get new generated samples, you randomly select some samples out of the buffer and replace with the newly generated samples. Otherwise, the discriminator conveys a wasserstein distance that is highly inaccurate.

"python gan_64x64.py" met errors

I've downloaded ImageNet small dataset (train_64x64.tar and valid_64x64.tar) and modified DATA_DIR in gan_64x64.py. I've also fixed a potential bug at line 116 (lib.concat -> tf.concat). But I still got the following error:

Traceback (most recent call last):
  File "gan_64x64.py", line 477, in <module>
    fake_data = Generator(BATCH_SIZE/len(DEVICES))
  File "gan_64x64.py", line 210, in GoodGenerator
    output = ResidualBlock('Generator.Res3', 2*dim, 2*dim, 3, output, resample='up')
  File "gan_64x64.py", line 186, in ResidualBlock
    he_init=False, biases=True, inputs=inputs)
  File "gan_64x64.py", line 120, in UpsampleConv
    output = lib.ops.conv2d.Conv2D(name, input_dim, output_dim, filter_size, output, he_init=he_init, biases=biases)
  File "/data1/home/weixue/cv/gan/improved_wgan_training/tflib/ops/conv2d.py", line 111, in Conv2D
    data_format='NCHW'
  File "/usr/lib/python2.7/site-packages/tensorflow/python/ops/gen_nn_ops.py", line 396, in conv2d
    data_format=data_format, name=name)
  File "/usr/lib/python2.7/site-packages/tensorflow/python/framework/op_def_library.py", line 763, in apply_op
    op_def=op_def)
  File "/usr/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 2329, in create_op
    set_shapes_for_outputs(ret)
  File "/usr/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 1717, in set_shapes_for_outputs
    shapes = shape_func(op)
  File "/usr/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 1667, in call_with_requiring
    return call_cpp_shape_fn(op, require_shape_fn=True)
  File "/usr/lib/python2.7/site-packages/tensorflow/python/framework/common_shapes.py", line 610, in call_cpp_shape_fn
    debug_python_shape_fn, require_shape_fn)
  File "/usr/lib/python2.7/site-packages/tensorflow/python/framework/common_shapes.py", line 675, in _call_cpp_shape_fn_impl
    raise ValueError(err.message)
ValueError: Dimensions must be equal, but are 256 and 128 for 'Generator.Res3.Shortcut/Conv2D' (op: 'Conv2D') with input shapes: [64,256,32,32], [1,1,128,128].

It seems that the source code is still envolving. Is git "master" in a runnable state?

Some issues about wgan-gp with acgan condition and conditional batchnorm

I'm interested in the wgan-gp with adding AC-GAN conditioning. But there is not introduction about this model in the paper. The code is gan_cifar_resnet.py . In this model, a conditional batchnorm is used. But this explanation in the code is too small for me to get a better understanding. So can you supply the some information or paper about conditional batchnorm @igul222 . In additional, I want it to generate 128x128 pixels images. Can you give me some advises?

Looking forward to your reply.

How to interpret the losses?

When I tried wgan-gp on my own problems, sometimes I got very unbalanced losses (e.g. the loss of discriminator is high, but the loss of generator is around 0. See this). What does this mean? Does it mean the generator is too good?

how to adapt gan_cifar_resnet.py to Resnet101 Implementation

Thanks for the author's wonderful theory analysis and beautiful model implementation. I have 3 questions about Improved WGAN model with resnet 101 discriminator.

Firstly, according to my understanding， implementation of improved wgan model in file adapt gan_cifar_resnet.py took of actually discriminator resnet of 14 layers( 4 residual blocks * 3 = 12 layers, input layers and output layers), generator renset of 15 layers(3 residual blocks * 3 = 9 layers, input layer , one conv layer and output layer). Is there any error about on my understanding?

Secondly, if my understanding is right, how can i modify the model to one with resnet101 discriminator?

Thirdly, is there any difference between implementation of your resnet and tensorflow.contrib.slim.nets.resnet_v1? Can i replace your resnet with tensorflow.contrib.slim.nets.resnet_v1 or others?

Thanks !

How to compute the two-order partial derivative in a non-graph based framework

I have noticed that this work is implemented by tensorflow where the graph of the gradient can be constructed. I wonder how to compute two-order partial derivative with non-graph based deep-learning framework like torch/pytorch/etc. It seems impossible to optimize the norm of the gradient with these frameworks.

Anyway, computing the gradient of norm of the gradient involves the dot product of Jacobian matrix and the gradient, thus the computation may be expensive. I wonder the efficiency of improved-wgan in computing the gradient?

who can tell me why the generate samples are have the same pictures? thank very much !

i use the ImageNet 64x64 datasets, and through run the programe, the results is there are many same generate samples in one batch sample, which are in one big picture. Like the pictures that i Upload, why there there are four wolf? I think it should be one is ok !

Is the output normal?

Hi @igul222
Please see the output from the command of "python gan_toy.py":
iter 99 disc cost -0.811114370823
iter 199 disc cost -0.382090866566
iter 299 disc cost -0.788745164871
iter 399 disc cost -1.34008586407
iter 499 disc cost -1.38578748703
iter 599 disc cost -1.25771450996
iter 699 disc cost -1.0593495369
iter 799 disc cost -0.865160703659
iter 899 disc cost -0.686692357063
iter 999 disc cost -0.57303828001
iter 1099 disc cost -0.454357266426

You see, the disc cost is minus, is this normal?

Thanks,
Yingjun

Clarifications on code

What's the output of Discriminator(interpolates)[0] on the code below in gan_language.py?
Knowing that gen_cost = -tf.reduce_mean(Discriminator(fake_inputs)), I assume that Discriminator(interpolates) returns the discriminator evaluation's of the first batch in the interpolates, though this doesn't seem to make any sense.

gradients = tf.gradients(Discriminator(interpolates)[0], [interpolates])[0]  
slopes = tf.sqrt(tf.reduce_sum(tf.square(gradients), reduction_indices=[1,2]))  
gradient_penalty = tf.reduce_mean((slopes-1.)**2)  
disc_cost += LAMBDA*gradient_penalty

gan_language.py has some reference errors

Hi I wanted to try out your very promising language model but I get the following errors when trying to run gan_language.py

Traceback (most recent call last):
  File "gan_language.py", line 89, in <module>
    fake_inputs = Generator(BATCH_SIZE)
  File "gan_language.py", line 64, in Generator
    output = tf.reshape(output, [-1, SETTINGS['dim_g'], SEQ_LEN])
NameError: global name 'SETTINGS' is not defined

To solve this I simply used DIM instead of SETTINGS['dim_g'] but I'm not sure whether this is the right value. When I then try to run again I get the following error

Traceback (most recent call last):
  File "gan_language.py", line 130, in <module>
    true_char_ngram_lms = [data_tools.NgramLanguageModel(i+1, lines[10*BATCH_SIZE:], tokenize=False) for i in xrange(4)]
NameError: name 'data_tools' is not defined

I believe that data_tools should be replaced with language_helpers

Typo in gan_langauge.py

https://github.com/igul222/improved_wgan_training/blob/master/gan_language.py#L89

Remove "_"

Fine-tuning Inception model before calculating inception-score for CIFAR-10 data

Is the 'inception net' final layer set to 10 classes and fine-tuned with CIFAR-10 data before the inception scores for 'gan_cifar' are calculated ?

implementation on LSUN bedroom dataset

I am trying using your improved wgan code to generate LSUN bedroom picture. However the quality of my generated pictures is not that good like yours and I think the reason may lie in the network architecture or hyper-parameters. I was wondering if you could post the specific implementation of wgan on that dataset and I would appreciate your kindness.

How to reproduce ResNet LSUN 128px experiment?

Could you please release the code for ResNet generator on LSUN 128px? Thanks!

Potential inconsistencies in calculation of gradient penalty between code and ArXiv paper

I could be wrong, but it seems like the calculation for the gradient penalty is not the same across different code examples in this repo. In the paper, I believe the calculation is shown in line 6 in Algorithm 1 (page 4 in ArXiv paper) -- that line suggests the second of the 2 options is correct. However, most code examples seem to use the first option below.

Option 1

In gan_mnist.py (Line 143-144), gan_64x64.py (495-496), gan_language.py (104-105), gan_cifar.py (130-131), and gan_cifar_resnet.py (260-261:

differences = fake_data - real_data
interpolates = real_data + (alpha*differences)

# After rearranging, equivalent to: 
# real_data + alpha*fake_data - alpha*real_data

Option 2

In gan_toy.py (Line 77) and ArXiv paper (Algorithm 1, line 6 on page 4):

interpolates = alpha*real_data + ((1-alpha)*fake_data)

# After rearranging, equivalent to: 
# fake_data + alpha*real_data - alpha*fake_data

real_data and fake_data seem to be transposed between the two options. Am I missing something?

Please see the result of gan_64x64.py

Hi @igul222
Please see the generated samples at iteration 199999. Is the result good? I am not sure what are generated. :-)
At the final iteration 199999, train disc cost is -1.49 and dev disc cost is -1.6. Is this good? I am not sure how to choose the best model in all iterations.

Thanks,
Yingjun

Tracking failures for WGAN-GP?

I wonder if the tips in https://github.com/soumith/ganhacks work under WGAN-GP as well, namely those in Sec. 10. Specifically I would like to confirm if the following is correct:

D loss is a large negative value: failure mode.
If loss of G steadily decreases (or D(G(z)) steadily increases), then G is fooling D with garbage.

Dimension Error in gan_language.py

Hi, guys
I'm running gan_language.py on CPU with python2.7 tf1.0 since our GPU work station doesn't work.
I have fixed the NHWC problem as said in Issue 11, except the last comment about
1 gan_mnist.py the output tensor in Discriminator should be changed from output = tf.reshape(inputs, [-1, 1, 28, 28]) to output = tf.reshape(inputs, [-1, 28, 28, 1])
2 conv2d.py from strides = [1, 1, stride, stride] to strides = [1, stride, stride, 1]

Then I get the following error:
ValueError: Dimensions must be equal, but are 32 and 512 for 'Generator.1.1/conv1d/Conv2D' (op: 'Conv2D') with input shapes: [64,1,512,32], [1,5,512,512]
Can anyone tell me how to fix the problem? Thank you very much!

MNIST example is missing batchnorm for DCGAN

In the MNIST example the batch normalization is only added for mode 'wgan', which seems to break the example when run in mode 'dcgan'. Adding batch normalization in mode 'dcgan' too fixes it.

Error in running gan_language.py

Hi, I have the following error in running gan_language.py. Any solution to this? I am using tensorflow v0.9

loaded 10000000 lines in dataset
Traceback (most recent call last):
File "gan_language.py", line 89, in
fake_inputs = Generator(BATCH_SIZE)
File "gan_language.py", line 65, in Generator
output = ResBlock('Generator.1', output)
File "gan_language.py", line 56, in ResBlock
output = lib.ops.conv1d.Conv1D(name+'.1', DIM, DIM, 5, output)
File "/auto/cmb-panasas2/ylu465/program/deeplearning/improved_wgan_training-master/tflib/ops/conv1d.py", line 93, in Conv1D
data_format='NCHW'
File "/auto/cmb-panasas2/ylu465/anaconda/envs/dl_env/lib/python2.7/site-packages/tensorflow/python/ops/nn_ops.py", line 1204, in conv1d
data_format=data_format)
File "/auto/cmb-panasas2/ylu465/anaconda/envs/dl_env/lib/python2.7/site-packages/tensorflow/python/ops/gen_nn_ops.py", line 394, in conv2d
data_format=data_format, name=name)
File "/auto/cmb-panasas2/ylu465/anaconda/envs/dl_env/lib/python2.7/site-packages/tensorflow/python/framework/op_def_library.py", line 703, in apply_op
op_def=op_def)
File "/auto/cmb-panasas2/ylu465/anaconda/envs/dl_env/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 2319, in create_op
set_shapes_for_outputs(ret)
File "/auto/cmb-panasas2/ylu465/anaconda/envs/dl_env/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 1711, in set_shapes_for_outputs
shapes = shape_func(op)
File "/auto/cmb-panasas2/ylu465/anaconda/envs/dl_env/lib/python2.7/site-packages/tensorflow/python/framework/common_shapes.py", line 230, in conv2d_shape
input_shape[3].assert_is_compatible_with(filter_shape[2])
File "/auto/cmb-panasas2/ylu465/anaconda/envs/dl_env/lib/python2.7/site-packages/tensorflow/python/framework/tensor_shape.py", line 108, in assert_is_compatible_with
% (self, other))
ValueError: Dimensions 1 and 512 are not compatible

WGan-gp test in the Celeba dataset.

I test the wgan-gp in the celeba dataset.
But the quality of the generative images is worse than the original dcgan.
and i just change the below code in the basic of w-gan using dcgan generator and discirmator.


#gradient penalty
differences = self.fake_images - self.images
 alpha = tf.random_uniform(shape=[self.batch_size, 1], minval=0., maxval=1.)
 interpolates = self.images + (alpha*differences)
gradients = tf.gradients(self.critic(interpolates, True), [interpolates])[0]
 ##2 norm
 slopes = tf.sqrt(tf.reduce_sum(tf.square(gradients), reduction_indices=[1]))
 gradient_penalty = tf.reduce_mean((slopes - 1.)**2)

And the reason?

Number of critic iterations

I am working on a 2D case similar to your toy examples but with a more complex distribution. I noticed big improvements in the contours (i.e. the energy surface learned by the discriminator) when increasing the critic iterations from 5 to 50.

I really think that 5 critic iterations is too low. I see you also use 5 iterations in the other examples like CIFAR and MNIST and is not showing the full potential of the network. The iterator should be given more time to converge.

After only 400 generator iterations I am already getting better results than the reported results in the paper for the swiss roll

Problems with Replacing ReLU with eLU

Hi I have been messing around with the Repo and I have lately been experimenting with switching out the relu activations in the gan_cifar.py with elu activations, however even with varying the lambda value I have not been able to get any convergence. I am wondering if elu activations pose theoretical issues that are not compatible with the wgan-gp (i.e. more non-linear and wider variance in slope values than reLU or leaky reLU), or if elu should be able to work with the wgan-gp (i.e. has your team gotten any models running that used elu activations). Thank you!