While discussing with <a class="user-mention notranslate" data-hovercard-type="user" d

I see, let's try to see with that fix it then! <span class="email-

I did rnn initialization in <a class="issue-link js-issue-link" data-error-text="Faile

Proper CUDNN RNN initialisation about speechbrain HOT 6 CLOSED

speechbrain commented on May 9, 2024

Proper CUDNN RNN initialisation

from speechbrain.

Comments (6)

JianyuanZhong commented on May 9, 2024 1

I think allowing initializations of CNNs in CRDNN. Since the rectified activation units at the CNN blocks produce unbounded outputs, bad initializations could lead to very large or very small variance on its output distribution. This could be problematic when one is building a very deep network...

from speechbrain.

mravanelli commented on May 9, 2024

We can do it. Actually, according to my experience initialization doesn't play a crucial role. For instance, in the current RNN class we already added orthogonal init for recurrent connections and the performance was the same. What really makes a big difference is batch normalization and recurrent droput (which cannot be added properly to the cudnn models). Feel free to ask for a pull request that changes the RNN class if there is evidence of some performance improvements...

from speechbrain.

TParcollet commented on May 9, 2024

It's not much a matter of performance but a matter of trainability. A wrong initialisation scheme might just makes the convergence impossible, especially with a larger number of neutrons. Indeed, because of the summation, you'll end up in the dead spot of the TANH function, and then ... well you won't converge. @jjery2243542 seems to have this problem with 2048 neutrons per LSTM layer.

from speechbrain.

mravanelli commented on May 9, 2024

I see, let's try to see with that fix it then!

…

On Wed, 20 May 2020 at 11:47, Parcollet Titouan ***@***.***> wrote: It's not much a matter of performance but a matter of trainability. A wrong initialisation scheme might just makes the convergence impossible, especially with a larger number of neutrons. Indeed, because of the summation, you'll end up in the dead spot of the TANH function, and then ... well you won't converge. @jjery2243542 <https://github.com/jjery2243542> seems to have this problem with 2048 neutrons per LSTM layer. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#104 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AEA2ZVSGKBQAC5UEHJGAU63RSP3P3ANCNFSM4NFZ26HQ> .

from speechbrain.

jjery2243542 commented on May 9, 2024

But if using batch norm it will not be the case I think. We can still do it.

from speechbrain.

jjery2243542 commented on May 9, 2024

I did rnn initialization in #110

from speechbrain.

Recommend Projects

Proper CUDNN RNN initialisation about speechbrain HOT 6 CLOSED

Comments (6)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent