Comments (8)
This makes sense! But I'm not sure you can draw that conclusion. Think of the generator as a student and the critic as a teacher; the student/generator does some work and the teacher/critic points out its flaws so that the generator can improve. The architectures of the generator and critic each limit the functions they can compute; in our analogy, the student is only capable of learning some things and the teacher is only capable of teaching some things. The critic loss (or estimated Wasserstein distance) is a measure of how bad the teacher thinks the student's work is.
Now suppose the student, by modifying its weights, could produce (say) better spelling but not better punctuation on its essays. And suppose the teacher, by modifying its weights, could detect punctuation flaws but not spelling flaws. Then there's a mismatch, and the estimated Wasserstein distance flattens out (because no matter what the student does it cannot improve its work in the teacher's eyes, who keeps telling it to improve its punctuation to no avail). In this case improving either the student or the teacher will help. If we improve the teacher's capacity so it's capable of noticing spelling flaws, the estimated Wasserstein distance will be greater than 30, but the gradients will actually help the student learn something it is capable of learning: it will modify its weights to produce better spelling. Similarly, if we improve the student's capacity so that it can change its weights to punctuate better, suddenly the gradients from the teacher will be useful and the Wasserstein distance will go down.
tl;dr -- if the estimated Wasserstein distance is close to 0, it's safe to say you should improve the critic architecture before the generator architecture. But if not, my intuition is that gains might be made by improving either architecture, and the estimated Wasserstein distance alone may not be enough to tell you where to focus your efforts.
from improved_wgan_training.
Thanks @alex-lew for the insightful commentary. In many of my natural language generation tasks, I have used very large generators and critics but it always seems that the critic's loss always converges to a fixed negative constant (e.g. -30)
However, my biggest issue with WGAN is that I know the generator's architecture can do better. By training is on max likelihood you can get good grammar and punctuation, which is a struggle to see in regular WGAN.
From this finding, you would think that it would have to be the critic's fault. But, I have confirmed that by training the discriminator on sigmoid cross entropy, it can easily distinguish between the two.
So this leads me to my final conclusion: Both the generator and discriminator architecture is sufficient. However it is the WGAN design itself that is flawed. Don't get me wrong, WGAN is an amazing breakthrough, but there is still something crucially wrong. Cramer GAN and other papers suggest different alternatives, but there is something still not right.
You would think also that you could generate really good images of faces with huge generators and critics but you can't. It may be too little training data, but I think there is something inherently wrong with WGAN.
from improved_wgan_training.
@NickShahML @alex-lew Hello, I met the same issue as yours. In my case, I trained a WGAN-GP model on a 3D voxel dataset. I trained the generator for one time, and trained the discriminator for five times in a iteration. After training for 20k iterations, I found that, the distance between the fake_loss and real_loss is stable at a fixed value, about 30. Although the d_loss which consists the fake_loss, real_loss, and gradient_penalty_loss is decreasing in a much slow rate, the distance between the fake_loss and real_loss is almost invariable, only the gradient_penalty_loss is decreasing to less than 10. And as the gradient_penalty_loss with the lambda 10 and the ideal the gradient is 1, I think the desired the gradient_penalty_loss is 10. So I think the training is difficult to converge persistently.
How do you like it?
from improved_wgan_training.
@li-zemin Hey! I'm in the exact same situation as you - the loss stabilises at -30.
Did you manage to improve this?
Cheers!
from improved_wgan_training.
Hi . My loss dropped down to -0.17, if I train it further loss starts increasing.
Shall i consider -0.17 as convergence point then. Any inputs please.
from improved_wgan_training.
@li-zemin嘿!我的情况和你完全相同 - 损失稳定在-30。 你设法改善这一点吗?
干杯!
Hi ! I met the same issue as yours. Have you ever fixed this problem ?
Thank you !
from improved_wgan_training.
Actually this is not an issue ! but this is how WGAN gets trained. During training, the loss value keep on dropping upto a certain point when it starts rising. That point is its convergence point which means model is trained now, and training needs to be stopped at that point (just before the rise of the value).
from improved_wgan_training.
Following paper has implemented WGAN -div and compared with other WGANs. https://www.nature.com/articles/s41598-022-22882-x,
Refer to supplementary draft as well. Paper also explains the implementation of WGAN.
from improved_wgan_training.
Related Issues (20)
- o._shape = TensorShape(new_shape) caused error in inception_score.py HOT 1
- Why the gradient penalty item decreases to zero and then grows to infinity ?
- This code is outdated seriously HOT 3
- inception_score.py: fixed the issue of ValueError "Cannot iterate over a shape with unknown rank"
- inception_score.py: ValueError in the method _init_inception() HOT 1
- Could it be possible to make the trained GAN publicly available?
- Mismatch between code and paper in the gradient penalty algorithm HOT 1
- Questions about the loss
- AttributeError: module '_pickle' has no attribute 'HIGHEST_PROTOCOL' HOT 1
- Error Conv2DCustomBackpropFilterOp only supports NHWC HOT 2
- Question of DEVICE in the gan_cifar10_resnet.py
- how to run it?
- Critic loss curve
- a question about loss
- reproducing inception score on gan_cifar.py HOT 2
- If I intend to calculate gradient penalty for two dataset in differet dimension, what should I do?
- gan_mnist.py's ERROR HOT 1
- Query: WGAN-GP FID SCORE (PyTorch)
- Wire gide
- Conv2DCustomBackpropInputOp only supports NHWC
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from improved_wgan_training.