Comments (6)
I think allowing initializations of CNNs in CRDNN. Since the rectified activation units at the CNN blocks produce unbounded outputs, bad initializations could lead to very large or very small variance on its output distribution. This could be problematic when one is building a very deep network...
from speechbrain.
We can do it. Actually, according to my experience initialization doesn't play a crucial role. For instance, in the current RNN class we already added orthogonal init for recurrent connections and the performance was the same. What really makes a big difference is batch normalization and recurrent droput (which cannot be added properly to the cudnn models). Feel free to ask for a pull request that changes the RNN class if there is evidence of some performance improvements...
from speechbrain.
It's not much a matter of performance but a matter of trainability. A wrong initialisation scheme might just makes the convergence impossible, especially with a larger number of neutrons. Indeed, because of the summation, you'll end up in the dead spot of the TANH function, and then ... well you won't converge. @jjery2243542 seems to have this problem with 2048 neutrons per LSTM layer.
from speechbrain.
from speechbrain.
But if using batch norm it will not be the case I think. We can still do it.
from speechbrain.
I did rnn initialization in #110
from speechbrain.
Related Issues (20)
- [Feature Request]: STT Example? HOT 1
- I attempted to compare the results of two different voice samples using the model I trained
- remove system link for windows HOT 1
- Beam search too slow on the transducer-conformer recipe HOT 1
- Why is my training loss so different from my test loss when i training wsj0-2mix HOT 1
- push to hub whisper model
- Cannot load pretrained model when using DDP HOT 4
- load pretrained file error in ddp mode training HOT 16
- VAD Example Code Broken: Need hparams['sample_rate'] HOT 1
- No module named 'speechbrain.inference' HOT 2
- DAC cannot be imported
- RuntimeError when processing VAD on short audio HOT 7
- [Feature Request]: Stree info in G2P output HOT 1
- [Feature Request]: Improve DAC interface
- unused parameters when using WavLM. cased crash when using DDP HOT 9
- [Feature Request]: Load a speechbrain-fine-tuned huggingface model checkpoint with the huggingface interface HOT 1
- Incorrect transformer mask size HOT 8
- loading fully trained brain (a brain who finished training) and then evaluating it with brain.evaluate. will cause a crash when using hpopt context but not reporting in test stage
- Hi! LibriSpeech char training!! HOT 1
- Wav2Vec2Pretrain (HFTransformersInterface implementation) samples padded values for mask_time_indices and negative_sample_indices HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from speechbrain.