Comments (11)
from improved_wgan_training.
I have also experienced the same effect and ended up reducing the learning rate to compensate for it.
from improved_wgan_training.
I also experienced the same effect. Reducing learning rate does not have any effects on this issue.
I observed that W perturbing and diverged when I only trained critic networks. Any thoughts?
from improved_wgan_training.
@hiwonjoon , have you tried using weight norm in your conv1d? also have tried decreasing beta1?
from improved_wgan_training.
@NickShahML Can you explain why decreasing beta1 should help?
from improved_wgan_training.
my (very rough, hand-wavy) intuition: beta1 is a momentum term. if you think of momentum as using past gradients as an estimator for the current gradient, it follows that momentum might not be helpful on loss surfaces with sharp curvature. gradient penalty introduces a lot of this through multiplicative interactions between weights in the loss fn. this makes optimization with momentum less stable sometimes. (eLUs seem to be tricky to optimize for similar reasons). note that none of this means you can't make it work -- you'd just need to drop the learning rate so much that it's probably not worth it.
from improved_wgan_training.
Yea, I've found that dropping the learning rate from ELU does work though you have to drop it so much that they aren't worth it. You could try SELU instead but I've experienced the same effect.
from improved_wgan_training.
For curiosity's sake, would SELU eliminate the need of normalization in the Discriminator? @NickShahML
from improved_wgan_training.
@Jiaming-Liu I don't know if SELU would necessarily eliminate the need to normalize but in theory it should.
from improved_wgan_training.
@rkjones4 There is a theoretical reason. By adding the gradient penalty in the objective during the critic training, the resulting gradient update contains terms of second order derivatives of the network's activation functions. For non continuous second order derivatives this can lead to a collapse of the training. Remember that ELU has a non continuous second order derivative. This non continuity ruins the objective by producing strange behaviours in the gradient penalty.
Just have a look on the latest version of: https://arxiv.org/pdf/1704.00028v1.pdf
from improved_wgan_training.
from improved_wgan_training.
Related Issues (20)
- o._shape = TensorShape(new_shape) caused error in inception_score.py HOT 1
- Why the gradient penalty item decreases to zero and then grows to infinity ?
- This code is outdated seriously HOT 3
- inception_score.py: fixed the issue of ValueError "Cannot iterate over a shape with unknown rank"
- inception_score.py: ValueError in the method _init_inception() HOT 1
- Could it be possible to make the trained GAN publicly available?
- Mismatch between code and paper in the gradient penalty algorithm HOT 1
- Questions about the loss
- AttributeError: module '_pickle' has no attribute 'HIGHEST_PROTOCOL' HOT 1
- Error Conv2DCustomBackpropFilterOp only supports NHWC HOT 2
- Question of DEVICE in the gan_cifar10_resnet.py
- how to run it?
- Critic loss curve
- a question about loss
- reproducing inception score on gan_cifar.py HOT 2
- If I intend to calculate gradient penalty for two dataset in differet dimension, what should I do?
- gan_mnist.py's ERROR HOT 1
- Query: WGAN-GP FID SCORE (PyTorch)
- Wire gide
- Conv2DCustomBackpropInputOp only supports NHWC
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from improved_wgan_training.