Comments (17)
Open a new issue (maybe something like better convergence with custom dataset), I think people will find this useful!
from deepspeech.torch.
calculateInputSizes
calculates the real size of each sample in the audio tensor so we can ignore the padding in the gradient cost calculation (found in the CTCCriterion).
A way around this would be to do something like:
sizes:resize(outputs:size(1)):fill(outputs:size(2))
If each sample of your output has the same length. Hopefully this helps!
from deepspeech.torch.
I have variable length image-samples (same height, varying widths) so the alternate trick won't work. What should be passed in the sizes parameter to the CTCCriterion for loss calculation? (here) From what you suggest, it is the sequence length of the input samples. Can you please confirm?
So, in my case of images, since I pass a column-strip of the image at each time-step, sizes would be the width of each image in the batch after having been passed through the SpatialConv layer?
from deepspeech.torch.
Sorry for the late response!
From what I can tell you will not need to touch the calculateInputSizes
. They calculate the sizes respect to the convolution layers, not the input. So as long the input is given correctly in the similar format as the audio data is currently given it should automatically calculate the sizes to pass to the gradient calculation.
And just to confirm, it is the true length of the input samples AFTER going through the convolutional layers (which reduces the number of timesteps, that's why this is necessary).
from deepspeech.torch.
Thank @SeanNaren :)
calculateInputSize is really a pretty neat hack! Turns out my problem were the noisy samples in my dataset which had an image-width less than the width of the convolution kernels I was using. Simply removing these corrupted samples from the dataset did the job for me. Thanks again!
from deepspeech.torch.
Ah that is a good point! I think it be nice to add this somewhere into the documentation where appropriate, I ran into the same issue a lot when training these models!
from deepspeech.torch.
@SeanNaren I can send you a PR once I myself get the codes working fine. The model trains in a weird fashion currently for me. The training loss keeps fluctuating between really small values and inf :/ (Take a look at the train-logs below). Any tips on what might be going wrong? I am checking if this is indeed exploding gradients (not hopeful of exploding-grads as the loss shouldn't have come back to 'non-inf' values once it exploded right?)
Training Epoch: 3 Average Loss: 3.046032 Average Validation WER: inf Average Validation CER: inf
[======= 5892/5892 ==================================>] Tot: 1h13m | Step: 729ms
Training Epoch: 4 Average Loss: 2.324838 Average Validation WER: inf Average Validation CER: inf
[======= 5892/5892 ==================================>] Tot: 1h16m | Step: 698ms
Training Epoch: 5 Average Loss: 1.797586 Average Validation WER: nan Average Validation CER: nan
[======= 5892/5892 ==================================>] Tot: 1h9m | Step: 693ms
Training Epoch: 6 Average Loss: -inf Average Validation WER: inf Average Validation CER: inf
[======= 5892/5892 ==================================>] Tot: 1h9m | Step: 760ms
Training Epoch: 7 Average Loss: -inf Average Validation WER: nan Average Validation CER: nan
[======= 5892/5892 ==================================>] Tot: 1h9m | Step: 719ms
Training Epoch: 8 Average Loss: -inf Average Validation WER: nan Average Validation CER: nan
[======= 5892/5892 ==================================>] Tot: 1h12m | Step: 749ms
Training Epoch: 9 Average Loss: 0.579901 Average Validation WER: nan Average Validation CER: nan
[======= 5892/5892 ==================================>] Tot: 1h11m | Step: 705ms
Training Epoch: 10 Average Loss: 0.420499 Average Validation WER: nan Average Validation CER: nan
[======= 5892/5892 ==================================>] Tot: 1h12m | Step: 766ms
Training Epoch: 11 Average Loss: 0.287849 Average Validation WER: nan Average Validation CER: nan
[======= 5892/5892 ==================================>] Tot: 1h12m | Step: 706ms
Training Epoch: 12 Average Loss: 0.192960 Average Validation WER: nan Average Validation CER: nan
[======= 5892/5892 ==================================>] Tot: 1h13m | Step: 834ms
Training Epoch: 13 Average Loss: 0.122787 Average Validation WER: nan Average Validation CER: nan
[======= 5892/5892 ==================================>] Tot: 1h12m | Step: 710ms
Training Epoch: 14 Average Loss: -inf Average Validation WER: nan Average Validation CER: nan
[======= 5892/5892 ==================================>] Tot: 1h7m | Step: 506ms
Training Epoch: 15 Average Loss: 0.042043 Average Validation WER: nan Average Validation CER: nan
[======= 5892/5892 ==================================>] Tot: 43m38s | Step: 481ms
Training Epoch: 16 Average Loss: 0.023819 Average Validation WER: nan Average Validation CER: nan
[======= 5892/5892 ==================================>] Tot: 43m41s | Step: 464ms
Training Epoch: 17 Average Loss: 0.010227 Average Validation WER: nan Average Validation CER: nan
[======= 5892/5892 ==================================>] Tot: 43m51s | Step: 418ms
Training Epoch: 18 Average Loss: -inf Average Validation WER: nan Average Validation CER: nan
[======= 5892/5892 ==================================>] Tot: 44m14s | Step: 484ms
Training Epoch: 19 Average Loss: 0.005311 Average Validation WER: nan Average Validation CER: nan
[======= 5892/5892 ==================================>] Tot: 46m42s | Step: 493ms
Training Epoch: 20 Average Loss: -inf Average Validation WER: nan Average Validation CER: nan
..
..
..
Training Epoch: 33 Average Loss: 0.000093 Average Validation WER: nan Average Validation CER: nan
[===============5892/5892 ==================================>] Tot: 53m33s | Step: 530ms
Training Epoch: 34 Average Loss: 0.000966 Average Validation WER: nan Average Validation CER: nan
[===============5892/5892 ==================================>] Tot: 53m15s | Step: 570ms
Training Epoch: 35 Average Loss: -inf Average Validation WER: nan Average Validation CER: nan
[=============== 5892/5892 ==================================>] Tot: 53m49s | Step: 521ms
Training Epoch: 36 Average Loss: 0.000915 Average Validation WER: nan Average Validation CER: nan
[===============5892/5892 ==================================>] Tot: 54m17s | Step: 530ms
Training Epoch: 37 Average Loss: -0.000312 Average Validation WER: nan Average Validation CER: nan
[===============5892/5892 ==================================>] Tot: 54m24s | Step: 552ms
Training Epoch: 38 Average Loss: -inf Average Validation WER: nan Average Validation CER: nan
[===============5892/5892 ==================================>] Tot: 49m14s | Step: 509ms
Training Epoch: 39 Average Loss: -0.000470 Average Validation WER: nan Average Validation CER: nan
[===============5892/5892 ==================================>] Tot: 52m59s | Step: 599ms
Training Epoch: 40 Average Loss: 0.000786 Average Validation WER: nan Average Validation CER: nan
[===============5892/5892 ==================================>] Tot: 57m7s | Step: 504ms
Training Epoch: 41 Average Loss: -inf Average Validation WER: nan Average Validation CER: nan
[===============5892/5892 ==================================>] Tot: 52m26s | Step: 457ms
Training Epoch: 42 Average Loss: -0.000240 Average Validation WER: nan Average Validation CER: nan
[===============5892/5892 ==================================>] Tot: 50m47s | Step: 539ms
Training Epoch: 43 Average Loss: 0.000231 Average Validation WER: nan Average Validation CER: nan
[===============5892/5892 ==================================>] Tot: 51m42s | Step: 558ms
Training Epoch: 44 Average Loss: 0.000756 Average Validation WER: nan Average Validation CER: nan
[===============5892/5892 ==================================>] Tot: 51m53s | Step: 599ms
Training Epoch: 45 Average Loss: -inf Average Validation WER: nan Average Validation CER: nan
[===============5892/5892 ==================================>] Tot: 1h39s | Step: 852ms
Training Epoch: 46 Average Loss: -inf Average Validation WER: nan Average Validation CER: nan
[===============5892/5892 ==================================>] Tot: 1h22m | Step: 1s105ms
Training Epoch: 47 Average Loss: -inf Average Validation WER: nan Average Validation CER: nan
[===============5892/5892 ==================================>] Tot: 52m24s | Step: 533ms
Training Epoch: 48 Average Loss: -0.000156 Average Validation WER: nan Average Validation CER: nan
===============5892/5892 ==================================>] Tot: 1h2m | Step: 486ms
Training Epoch: 49 Average Loss: 0.000695 Average Validation WER: nan Average Validation CER: nan
[===============5892/5892 ==================================>] Tot: 51m34s | Step: 469ms
Training Epoch: 50 Average Loss: 0.000689 Average Validation WER: nan Average Validation CER: nan
[===============5892/5892 ==================================>] Tot: 52m41s | Step: 493ms
Training Epoch: 51 Average Loss: -inf Average Validation WER: nan Average Validation CER: nan
[===============5892/5892 ==================================>] Tot: 52m34s | Step: 613ms
Training Epoch: 52 Average Loss: 0.000671 Average Validation WER: nan Average Validation CER: nan
[=============== 5892/5892 ==================================>] Tot: 53m42s | Step: 512ms
Training Epoch: 53 Average Loss: 0.000359 Average Validation WER: nan Average Validation CER: nan
from deepspeech.torch.
Those are some fun losses, have you tried changing cutoff
to a lower value like 100?
from deepspeech.torch.
@SeanNaren I haven't tried that yet. On it. Btw, by cutoff you mean the MaxNorm right? For normalizing gradients?
from deepspeech.torch.
Sorry exactly! That is what I meant :)
From tests I've done if you keep trying to lower the maxNorm it helps prevent gradients from exploding!
from deepspeech.torch.
@SeanNaren I've tested running the codes by linearly bringing down the MaxNorm to a value as low as 10 but I still face the nan losses and inf WER/CER issue. From your experience, should I keep going down further or this is probably not the bug/parameter-tuning I am after? Please help.
from deepspeech.torch.
Also, I have tried to reduce the number of RNN-hidden layers to something like 3 instead of the 7 originally. Still no positive signs though.
from deepspeech.torch.
This goes against the grain of DS2, but could you try using cudnn.LSTMs instead of RNNs? Try keep the number of parameters around 80 million. LSTMs might help out since they have a lot of improvements to the standard recurrent net!
from deepspeech.torch.
@SeanNaren will simply changing this line do the trick here? Replacing that line to, self.rnn = cudnn.LSTM(outputDim, outputDim, 1)
.
I see that there are BLSTM implementations also available, so just confirming.
from deepspeech.torch.
Ah my apologies that would be a bit strange, I'd suggest doing this in the DeepSpeechModel.lua
class:
Change:
local function RNNModule(inputDim, hiddenDim, opt)
if opt.nGPU > 0 then
require 'BatchBRNNReLU'
return cudnn.BatchBRNNReLU(inputDim, hiddenDim)
else
require 'rnn'
return nn.SeqBRNN(inputDim, hiddenDim)
end
end
to something like:
local function RNNModule(inputDim, hiddenDim, opt)
require 'cudnn'
local rnn = nn.Sequential()
rnn:add(cudnn.BLSTM(inputDim, hiddenDim, 1)
rnn:add(nn.View(-1, 2, outputDim):setNumInputDims(2)) -- have to sum activations
rnn:add(nn.Sum(3))
return rnn
end
I would suggest changing the hidden size dimension to around 700 as the default for an LSTM would be pretty large!
from deepspeech.torch.
Thanks a lot for the clarification! Will update with results 😄
from deepspeech.torch.
@SeanNaren Can you tell me what role does the outputDim
play in rnn:add(nn.View(-1, 2, outputDim):setNumInputDims(2))
. What value does it signify?
from deepspeech.torch.
Related Issues (20)
- error when predict.lua
- CER calculation obviously wrong
- Librispeech training error HOT 2
- running pretrained net on CPU only HOT 2
- Getting wrong prediction results on pretrained librispeech model HOT 4
- Regarding the Language Model used HOT 7
- Error in CuDNN: CUDNN_STATUS_BAD_PARAM (cudnnGetConvolutionNdForwardOutputDim) HOT 4
- How to just evaluate a pre-trained network on an audio file? HOT 6
- Issue with Torch Dependencies HOT 6
- AN4 Setup problems HOT 4
- Realtime microphone prediction HOT 1
- Error running train.py using CUDA
- Totally wrong prediction during testing on pretrained model HOT 2
- No pretrained models params on main page + questions
- Unicode support HOT 6
- custom dataset formatting issue HOT 1
- Error Sample Rate 8khz HOT 7
- Out of memory issue when Train.lua HOT 1
- I get the same training error by every epoch HOT 2
- error when I test
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from deepspeech.torch.