Problems with Replacing ReLU with eLU about improved_wgan_training HOT 11 CLOSED

igul222 commented on July 20, 2024 2

Problems with Replacing ReLU with eLU

from improved_wgan_training.

Comments (11)

igul222 commented on July 20, 2024

No theoretical reason it shouldn't work that I'm aware of, so decreasing the learning rate and/or setting beta1=0 should help.

…

________________________________ From: rkjones4 <[email protected]> Sent: Wednesday, April 26, 2017 12:14:17 AM To: igul222/improved_wgan_training Cc: Subscribed Subject: [igul222/improved_wgan_training] Problems with Switching out ReLU for eLU (#16) Hi I have been messing around with the Repo and I have lately been experimenting with switching out the relu activations in the gan_cifar.py with elu activations, however even with varying the lambda value I have not been able to get any convergence. I am wondering if elu activations pose theoretical issues that are not compatible with the wgan-gp (i.e. more non-linear and wider variance in slope values than reLU or leaky reLU), or if elu should be able to work with the wgan-gp (i.e. has your team gotten any models running that used elu activations). Thank you! — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub<#16>, or mute the thread<https://github.com/notifications/unsubscribe-auth/ABBP7hy5w8-kCZZ6TkF-mFXIaGICVx9Xks5rznA5gaJpZM4NIH4_>.

from improved_wgan_training.

NickShahML commented on July 20, 2024

I have also experienced the same effect and ended up reducing the learning rate to compensate for it.

from improved_wgan_training.

hiwonjoon commented on July 20, 2024

I also experienced the same effect. Reducing learning rate does not have any effects on this issue.
I observed that W perturbing and diverged when I only trained critic networks. Any thoughts?

from improved_wgan_training.

NickShahML commented on July 20, 2024

@hiwonjoon , have you tried using weight norm in your conv1d? also have tried decreasing beta1?

from improved_wgan_training.

LynnHo commented on July 20, 2024

@NickShahML Can you explain why decreasing beta1 should help?

from improved_wgan_training.

igul222 commented on July 20, 2024

my (very rough, hand-wavy) intuition: beta1 is a momentum term. if you think of momentum as using past gradients as an estimator for the current gradient, it follows that momentum might not be helpful on loss surfaces with sharp curvature. gradient penalty introduces a lot of this through multiplicative interactions between weights in the loss fn. this makes optimization with momentum less stable sometimes. (eLUs seem to be tricky to optimize for similar reasons). note that none of this means you can't make it work -- you'd just need to drop the learning rate so much that it's probably not worth it.

from improved_wgan_training.

NickShahML commented on July 20, 2024

Yea, I've found that dropping the learning rate from ELU does work though you have to drop it so much that they aren't worth it. You could try SELU instead but I've experienced the same effect.

from improved_wgan_training.

Jiaming-Liu commented on July 20, 2024

For curiosity's sake, would SELU eliminate the need of normalization in the Discriminator? @NickShahML

from improved_wgan_training.

NickShahML commented on July 20, 2024

@Jiaming-Liu I don't know if SELU would necessarily eliminate the need to normalize but in theory it should.

from improved_wgan_training.

jglombitza commented on July 20, 2024

@rkjones4 There is a theoretical reason. By adding the gradient penalty in the objective during the critic training, the resulting gradient update contains terms of second order derivatives of the network's activation functions. For non continuous second order derivatives this can lead to a collapse of the training. Remember that ELU has a non continuous second order derivative. This non continuity ruins the objective by producing strange behaviours in the gradient penalty.
Just have a look on the latest version of: https://arxiv.org/pdf/1704.00028v1.pdf

from improved_wgan_training.

igul222 commented on July 20, 2024

There’s a note on this in appendix D of the paper. My suggestion is just to use ReLU and not worry about it, but if you really want something like ELU, ((softmax(2x+2)/2)-1 is very close to ELU but smooth, so it works :)

…

On Wed, Jun 6, 2018 at 9:37 AM JGlombitza ***@***.***> wrote: @rkjones4 <https://github.com/rkjones4> There is a theoretical reason. By adding the gradient penalty in the objective during the critic training, the resulting gradient update contains terms of second order derivatives of the network's activation functions. For non continuous second order derivatives this can lead to a collapse of the training. Remember that ELU has a non continuous second order derivative. This non continuity ruins the objective by producing strange behaviours in the gradient penalty. — You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub <#16 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABBP7qkiTTZC_ttH4ALneNqGlSSpKPm0ks5t6AU8gaJpZM4NIH4_> .

from improved_wgan_training.

Problems with Replacing ReLU with eLU about improved_wgan_training HOT 11 CLOSED

Comments (11)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent