Giter Club home page Giter Club logo

Comments (11)

igul222 avatar igul222 commented on July 20, 2024

from improved_wgan_training.

NickShahML avatar NickShahML commented on July 20, 2024

I have also experienced the same effect and ended up reducing the learning rate to compensate for it.

from improved_wgan_training.

hiwonjoon avatar hiwonjoon commented on July 20, 2024

I also experienced the same effect. Reducing learning rate does not have any effects on this issue.
I observed that W perturbing and diverged when I only trained critic networks. Any thoughts?

from improved_wgan_training.

NickShahML avatar NickShahML commented on July 20, 2024

@hiwonjoon , have you tried using weight norm in your conv1d? also have tried decreasing beta1?

from improved_wgan_training.

LynnHo avatar LynnHo commented on July 20, 2024

@NickShahML Can you explain why decreasing beta1 should help?

from improved_wgan_training.

igul222 avatar igul222 commented on July 20, 2024

my (very rough, hand-wavy) intuition: beta1 is a momentum term. if you think of momentum as using past gradients as an estimator for the current gradient, it follows that momentum might not be helpful on loss surfaces with sharp curvature. gradient penalty introduces a lot of this through multiplicative interactions between weights in the loss fn. this makes optimization with momentum less stable sometimes. (eLUs seem to be tricky to optimize for similar reasons). note that none of this means you can't make it work -- you'd just need to drop the learning rate so much that it's probably not worth it.

from improved_wgan_training.

NickShahML avatar NickShahML commented on July 20, 2024

Yea, I've found that dropping the learning rate from ELU does work though you have to drop it so much that they aren't worth it. You could try SELU instead but I've experienced the same effect.

from improved_wgan_training.

Jiaming-Liu avatar Jiaming-Liu commented on July 20, 2024

For curiosity's sake, would SELU eliminate the need of normalization in the Discriminator? @NickShahML

from improved_wgan_training.

NickShahML avatar NickShahML commented on July 20, 2024

@Jiaming-Liu I don't know if SELU would necessarily eliminate the need to normalize but in theory it should.

from improved_wgan_training.

jglombitza avatar jglombitza commented on July 20, 2024

@rkjones4 There is a theoretical reason. By adding the gradient penalty in the objective during the critic training, the resulting gradient update contains terms of second order derivatives of the network's activation functions. For non continuous second order derivatives this can lead to a collapse of the training. Remember that ELU has a non continuous second order derivative. This non continuity ruins the objective by producing strange behaviours in the gradient penalty.
Just have a look on the latest version of: https://arxiv.org/pdf/1704.00028v1.pdf

from improved_wgan_training.

igul222 avatar igul222 commented on July 20, 2024

from improved_wgan_training.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.