seannaren / deepspeech.torch Goto Github PK

View Code? Open in Web Editor NEW

260.0 260.0 73.0 640 KB

Speech Recognition using DeepSpeech2 network and the CTC activation function.

License: MIT License

Lua 100.00%

deepspeech.torch's People

Contributors

Stargazers

Watchers

deepspeech.torch's Issues

Phoneme Branch validation not improving

After the new dataloading changes, the branch doesn't converge on validation (when looking at the validation WER/CER). Suspect decoding issues.

To run on small size GPU memory

I am using Quadro k420 GPU which is having 192 cores and 1024MB memory. I already run old code but faced out of memory issue also I discussed with you regarding that. Have you modified that code for small GPU memory?

ctchelpers missing

Hey Sean,
it seems your ctchelpers repo is missing which is needed to run the deepspeech net, mainly:

ctchelpers/Linear3D.lua
ctchelpers/CombineDimensions.lua

BatchNormalization error

[==================== 948/948 ================>] Tot: 13s751ms | Step: 14ms
Training Epoch: 1
lua: /home/../distro/install/share/lua/5.1/nn/Container.lua:67:
In 10 module of nn.Sequential:
/home/../distro/install/share/lua/5.1/nn/BatchNormalization.lua:80: got 11-feature tensor, expected 800
stack traceback:
[C]: in function 'assert'
/home/../distro/install/share/lua/5.1/nn/BatchNormalization.lua:80: in function 'checkInputDim'
/home/q/xingxing.tang/distro/install/share/lua/5.1/nn/BatchNormalization.lua:102: in function </home/q/xingxing.tang/distro/install/share/lua/5.1/nn/BatchNormalization.lua:101>
(tail call): ?
[C]: in function 'xpcall'
/home/q/../distro/install/share/lua/5.1/nn/Container.lua:58: in function 'rethrowErrors'
/home/q/../distro/install/share/lua/5.1/nn/Sequential.lua:44: in function </home/../distro/install/share/lua/5.1/nn/Sequential.lua:41>
(tail call): ?
./Network.lua:84: in function 'opfunc'
/home/../distro/install/share/lua/5.1/optim/sgd.lua:44: in function 'sgd'
./Network.lua:100: in function 'trainNetwork'
AN4CTCTrain.lua:41: in main chunk
[C]: ?

WARNING: If you see a stack trace below, it doesn't point to the place where this error occured. Please use only the one above.
stack traceback:
[C]: in function 'error'
/home/../distro/install/share/lua/5.1/nn/Container.lua:67: in function 'rethrowErrors'
/home/../distro/install/share/lua/5.1/nn/Sequential.lua:44: in function </home/q/xingxing.tang/distro/install/share/lua/5.1/nn/Sequential.lua:41>
(tail call): ?
./Network.lua:84: in function 'opfunc'
/home/..distro/install/share/lua/5.1/optim/sgd.lua:44: in function 'sgd'
./Network.lua:100: in function 'trainNetwork'
AN4CTCTrain.lua:41: in main chunk
[C]: ?
If i run lua AN4CTCTrain.lua i got this error.

Use of function "calculateInputSizes(sizes)" in DeepSpeechModel.lua?

@SeanNaren I would like to know what exactly is the use of the function calculateInputSizes. I am using my own image data for a scenetext task (I have updated the spatial-conv params accordingly).

It looks like it calculates the size of the tensors obtained after passing the inputs through the 2 spat-conv layers. However, this function is called just before doing the forward-backward passes (here) and that 'sizes' parameter is passed to the CTC-criterion.

AFAIK, the sizes passed in the CTC-criterion are the size of the target labels (as shown here) [NOTE : I might be getting it wrong. I posted the PR for updation of documentation on CTC-readme. So if I'm getting this all wrong, I need to update that readme too :P ]. So shouldn't the size-calculation code be something like the one below? (note that I take 'targets' as inputs instead of 'sizes' as was previously)

local function calculateInputSizes(targets)
    sizes = torch.Tensor(#targets)
    for i=1,#targets do
        sizes[i] = #targets[i]
    end
    return sizes
end

Please let me know what is going wrong here. I get an error saying...

THCudaCheck FAIL file=/tmp/luarocks_cutorch-scm-1-8514/cutorch/lib/THC/THCGeneral.c line=676 error=11 : invalid argument
/users/mohit.jain/torch/install/bin/luajit: ...it.jain/torch/install/share/lua/5.1/nnx/CTCCriterion.lua:74: cuda runtime error (11) : invalid argument at /tmp/luarocks_cutorch-scm-1-8514/cutorch/lib/THC/THCGeneral.c:676
stack traceback:
[C]: in function 'resize'
...it.jain/torch/install/share/lua/5.1/nnx/CTCCriterion.lua:77: in function 'inverseInterleave'
...it.jain/torch/install/share/lua/5.1/nnx/CTCCriterion.lua:53: in function 'backward'
./Network.lua:147: in function 'opfunc'
/users/mohit.jain/torch/install/share/lua/5.1/optim/sgd.lua:44: in function 'sgd'
./Network.lua:166: in function 'trainNetwork'
AN4CTCTrain.lua:42: in main chunk
[C]: in function 'dofile'
...jain/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk
[C]: at 0x00406670

If I go to the CTCCriterion.lua file at line-74, I see that its simply creating a new tensor local result = tensor.new():resize(sizes):zero(). By using the original calculateInputSizes function, my sizes tensor has -ve values and hence there are CUDA out of memory errors being thrown. If I however use my variation of the calculateInputSizes function, I'm getting the above stated invalid arguments error. Please help.

To use the code using CPU

Hi,

Thanks for sharing the code.

Is it possible to use the same using CPU instead of GPU (just for testing)? If so, what are the changes we need to do and in what files. Any help is appreciated.

How to choose convolutional layers parameters for different spectrogram sizes

Hey Sean, I was asking how the parameters of the convolutional layers related to the size of the spectrogram. I want to change the spectrogram parameters to be :
windowSize = 100
stride = 50
how to choose the parameters of the convolutional layers to work on spectrogram of this size?.

Thank you.

Issue with nn.bottle dependency

Hi Sean,

I am trying to refactor my code into your updated changes. However, I am facing an issue with the Bottle being a 'nil value'. Below is the log:
./DeepSpeechModel.lua:36: attempt to call field 'Bottle' (a nil value)
stack traceback:
./DeepSpeechModel.lua:36: in function <./DeepSpeechModel.lua:14>
./Network.lua:73: in function 'prepSpeechModel'
./Network.lua:57: in function 'init'
Libri_Train.lua:39: in main chunk

SGD Bug!

hi training will be Interrupted, if sum of gradOutput equal zore.I add gradOutput check fix this error.

    local function feval(x_new)
        local inputsCPU, targets = dataset:nextData()
        -- transfer over to GPU
        inputs:resize(inputsCPU:size()):copy(inputsCPU)
        gradParameters:zero()
        local predictions = self.model:forward(inputs)
        local loss = ctcCriterion:forward(predictions, targets)
        self.model:zeroGradParameters()
        local gradOutput = ctcCriterion:backward(predictions, targets)
        if  (gradOutput:sum()) ~= 0 then
            self.model:backward(inputs, gradOutput)
        end
        return loss, gradParameters
    end

Dataloading improvements

I want to generalise the dataset loader so that it's easier to add your own logic to create an LMDB dataset.

Generalise the text parsing and wav conversion into two separate functions
Implement AN4 in a generalised fashion, use helper scripts to create AN4 dataset
Add documentation to show how to do data preparation
Update technical documentation

Question about AN4 reference example

Hi, Thanks for sharing this wonderful reference!

I have been playing a bit with the AN4 example, and I had a question/comment: the AN4 corpus is extremely small as far as speech corpora go, and I was wondering if maybe the default parameters are a bit over specified?

Turning them down a bit to:

'-hiddenSize', 750
'-nbOfHiddenLayers', 6

I end up with the following loss graph with final error rates:

Average WER: 3.52
Average CER: 1.25

these final results varied somewhat over 15 or so different trials, from a lower bound of WER: 2.5, CER: 0.95, to an upper bound of WER: 6.78, CER: 2.23, but were consistently a bit lower than the current baseline.

Do you think this is still a consequence of noise/seed or does this hypothesis make sense?

Interestingly, while we cannot directly compare them, it is really cool to see that these values are well inside the same ballpark as those currently reported in the Kaldi AN4 reference results:

https://github.com/kaldi-asr/kaldi/blob/master/egs/an4/s5/RESULTS
- †Note however any comparison cannot be taken too seriously as the best Kaldi reference in this eg, at WER: 5.95, does not employ the DNN training stages [appears to be triphone, GMM only], and I am not 100% certain that the test and train partitions are the same in both setups.

Getting verypoor accuracy in cmu artic database

Hai,
I had trained with cmu artic database and i am getting a very low accuracy in that .The result obtained in that is given below
[==================== 439/439 ================>] Tot: 19s426ms | Step: 45ms
Without Spellcheck WER : 97.05 percent
[==================== 439/439 ================>] Tot: 2m8s | Step: 321ms
With context based Spellcheck WER : 96.51 percent
So to improve the result is it possible to modify the learning parameters ?.
and i had a doubt what is the use of lexicon dictionary and language model ?

More Time taken for an epoch in multi-gpu case

I am using ec2 instance g2.8xlarge that has 4 gpu's .
I have changed nGPU = 4 in AN4CTCTrain.lua .

I see that

nGPU =4
epoch takes on an avg 55 s
nGPU =1
epoch takes on an avg 48 s

Am I doing something wrong .

Option to replace RNNs with LSTMs

Performance seems to be better with LSTMs rather than RNNs, however this strays away from the DS2 structure. GRUs are very finnicky to get to converge whereas LSTMs are much more robust when it comes to convergence.

If the validation scores increase using LSTMs I'll add an option to switch the RNNs to BLSTMs from the cuDNN package.

add a script to illustrate how to use a trained model to predict a wav file

Model file saved error

hi the model is 1.3G,
1.3G May 31 10:37 model_epoch_1_20160531_103551_CTCNetwork.t7

error in training AN4data

I am getting this error while training the data .

lua: ...enspark/torch/install/share/lua/5.1/nn/Container.lua:67:
In 3 module of nn.Sequential:
...orch/install/share/lua/5.1/nn/BatchNormalization.lua:80: got 11-feature tensor, expected 400
stack traceback:
[C]: in function 'assert'
...orch/install/share/lua/5.1/nn/BatchNormalization.lua:80: in function 'checkInputDim'
...orch/install/share/lua/5.1/nn/BatchNormalization.lua:102: in function <...orch/install/share/lua/5.1/nn/BatchNormalization.lua:101>
(tail call): ?
[C]: in function 'xpcall'
...enspark/torch/install/share/lua/5.1/nn/Container.lua:58: in function 'rethrowErrors'
...nspark/torch/install/share/lua/5.1/nn/Sequential.lua:44: in function <...nspark/torch/install/share/lua/5.1/nn/Sequential.lua:41>
(tail call): ?
./Network.lua:95: in function 'opfunc'
...e/censpark/torch/install/share/lua/5.1/optim/sgd.lua:44: in function 'sgd'
./Network.lua:112: in function 'trainNetwork'
AN4CTCTrain.lua:40: in main chunk
[C]: ?

WARNING: If you see a stack trace below, it doesn't point to the place where this error occured. Please use only the one above.
stack traceback:
[C]: in function 'error'
...enspark/torch/install/share/lua/5.1/nn/Container.lua:67: in function 'rethrowErrors'
...nspark/torch/install/share/lua/5.1/nn/Sequential.lua:44: in function <...nspark/torch/install/share/lua/5.1/nn/Sequential.lua:41>
(tail call): ?
./Network.lua:95: in function 'opfunc'
...e/censpark/torch/install/share/lua/5.1/optim/sgd.lua:44: in function 'sgd'
./Network.lua:112: in function 'trainNetwork'
AN4CTCTrain.lua:40: in main chunk
[C]: ?

Cpu mode doesn't work

Just the change nGPU = -1 in AN4CTCTrain.lua doesnt work .There was NetworkCPU.lua before .Is that deprecated

Possible typo in ConvertAN4ToWav.sh

The last argument in the call to sox has one too many % chars and should probably read:
"${sample%.*}.wav"
Since it looks you are truncating one suffix and replacing with another.

Error: [read_audio] Unknown length at /tmp/luarocks_audio-0.1-0-5658/lua---audio/generic/sox.c:45

I get an error when I run the file either AN4CTCTest or AN4CTCTrain. The error details follow:

lua AN4CTCTest.lua lua: ...e/sherrie/torch/install/share/lua/5.1/audio/init.lua:56: [read_audio] Unknown length at /tmp/luarocks_audio-0.1-0-5658/lua---audio/generic/sox.c:45 stack traceback: [C]: in function 'load' ...e/sherrie/torch/install/share/lua/5.1/audio/init.lua:56: in function 'load' ./AudioData.lua:105: in function 'an4Dataset' ./AudioData.lua:67: in function 'retrieveAN4TestDataSet' AN4CTCTest.lua:112: in main chunk [C]: ?

Could you help me?

Support for Lua5.2?

This is not really an issue. I need some help. I see that you use warp-ctc by Baidu. Can you tell me what version Lua do you have with your torch? [ th> print(_VERSION) ] I recently upgraded my Lua version from 5.1 to 5.2 in torch to tackle some of the memory limitations of Lua5.1 and am unable to install warp-ctc thereafter.

Reference Issue : baidu-research/warp-ctc#36

Any help/inputs would much appreciated! :D

Possible error in docs

audio input: 128 x 500 Tensor -- the audio data (frequency x time)
truth text: 'deep speech is cool' -- The truth text used in evaluations/validation
truth label: {4,5,5,16,27,19,16,5,5,3,8,27,9,19,3,15,12} -- The label used in training

I believe the truth label should be:

{ 4,5,5,16,27,19,16,5,5,3,8,27,9,19,27,3,15,15,12 }

Notice the second 15 in the second to last position. Unless 'oo' is only supposed to be represented by a single "15".

V2 Branch

There are a few issues I'll be addressing in this branch:

Fix current branch; it does not converge as a few commits before
Convert AN4Train/AN4Test to use cmd to input arguments, create bash scripts to run these
Expose parameters of DeepSpeechModel to allow easier customisation
Revise current flow of objects such as Evaluator and Model Evaluator
Add training graph with CER/WER
Add example prediction script
Add WER/CER results in README

hang in data preparation (Librispeech)

I faced a problem in the following line (with Ubuntu 16.04, 4.4.0-38-generic, Lua 5.1.5,):

$ th FormatLibriSpeech.lua -rootPath LibriSpeech -newPath libri_dataset -threads 8
....................................... 34/2703 .....................................]  ETA: 14s840ms | Step: 5ms
^C

It stuck there. Below are the debugger lines:

(gdb) info threads
  Id   Target Id         Frame
* 1    Thread 0x7f180fc29740 (LWP 168437) "luajit" pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
  2    Thread 0x7f176b510700 (LWP 168443) "luajit" __lll_lock_wait_private () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:95
  3    Thread 0x7f176ad0f700 (LWP 168444) "luajit" __lll_lock_wait_private () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:95
  4    Thread 0x7f176a50e700 (LWP 168445) "luajit" __lll_lock_wait_private () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:95
  5    Thread 0x7f1769d0d700 (LWP 168446) "luajit" __lll_lock_wait_private () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:95
  6    Thread 0x7f176950c700 (LWP 168447) "luajit" __lll_lock_wait_private () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:95
  7    Thread 0x7f1768d0b700 (LWP 168448) "luajit" __lll_lock_wait_private () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:95
  8    Thread 0x7f1763fff700 (LWP 168449) "luajit" __lll_lock_wait_private () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:95
  9    Thread 0x7f17637fe700 (LWP 168450) "luajit" __lll_lock_wait_private () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:95
(gdb) quit

With "-threads 1", I can get through it. Looks like a race condition/deadlock.

A couple of suggestions

I noticed an opportunity to do a bit of divide and conquer -- you can generate a table of labels for both the training and testing transcripts by consolidating duplicated code into a routine, such as the following:

local function convertLineToLabels(labels, line)
local label = {}
line = string.lower(line)
-- Remove: beginning space, BOS, EOS, fileid, final space
line = line:gsub('^%s_',''):gsub('~~',''):gsub('~~',''):gsub('%(.+%)',''):gsub('%s_$','')
--Remove the space at the end of the line.
for i = 1, #line do
local character = string:sub(i, i)
table.insert(label, alphabetMapping[character])
end
table.insert(labels, label)
end

I have chained together regular expressions to delete everything but the desired text, for both the AN4 training and testing transcripts. I have no idea why Lua has implemented its own syntax for RE's, when there are perfectly good precedents, someone above my pay grade will have to answer that one.

My sense is that for both the training and testing data sets you can fill in the same data structure with triplets of {audio, truth text, labels}; this would make AudioData.lua more compact, and also be a useful template for other projects, such as VoxForge or TEDlium. Also, you can move the various parameters such as windowsize, stride, set sizes, and file locations into AudioData.lua. Fwiw, it is in practice a class, tied to the AN4 corpus, which supplies a table of the form {audio, truth text, labels} to the training, testing, and batch modules. Then, when you deal with other speech corpora, you can write a class which prepares their data, and offers them up in the same format. To extend the horizons a little bit: if you use CMU's lexicon, you can try out training your nets to generate phonemes and look at the phoneme error rate, which is arguably a better measure of how well the net is doing, and there are some 'tricks of the trade' you can leverage to turn them into correctly spelled word sequences, but that will have to be the topic of another conversation. Cheers from bat cave, CC

Errors when running the instructions

I am getting the following error when running prepare.sh in the prepare_a4 folder of CTC.

Shyamals-iMac-174:prepare_an4 shyamalchandra$ sudo ./prepare.sh 
rm: an4_raw.bigendian.tar.gz: No such file or directory
ln: ./Mapper.lua: File exists
ROOT_FOLDER: an4
Converting raw an4 dataset...
Generating Indices...
Generating LMDB...
/Users/shyamalchandra/torch/install/bin/luajit: ...hyamalchandra/torch/install/share/lua/5.1/trepl/init.lua:384: .../shyamalchandra/torch/install/share/lua/5.1/lmdb/ffi.lua:175: dlopen(liblmdb.dylib, 5): image not found
stack traceback:
    [C]: in function 'error'
    ...hyamalchandra/torch/install/share/lua/5.1/trepl/init.lua:384: in function 'require'
    /Users/shyamalchandra/CTCSpeechRecognition/Utils.lua:5: in main chunk
    [C]: in function 'require'
    generateLMDB.lua:1: in main chunk
    [C]: in function 'dofile'
    ...ndra/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk
    [C]: at 0x010ce1fcf0

What should I do next?

Module 'BatchRNNReLU' not found

Hi,

I was able to successfully train the model with the an4 data. However, after trying to run the testing script, I ran into module the error below:
'BatchRNNReLU' not found:No LuaRocks module found for BatchRNNReLU no field package.preload['BatchRNNReLU]

I clearly see the module within the folder. I was hoping you could help me out.

Thanks

result of AN4

I am new to speech and torch. I just want to check whether I run your example correctly.
It took about 28 mins using 1 GTX 970 GPU and got a model whose size is 940M.
WER is 89.13 and with content based spell check is 83.31.
Are these all right? Hope to get any help.
Thank you for your wonderful work.

lmdb Segmentation fault (core dumped)

At Utility.lua is there a reason to use the tnx.abort() as it logically will throw a Segmentation fault (core dumped) when kept in place and used to create the lmdb's:

xn:commit()
--> txn:abort()
db:close()

Using a custom dataset with deepspeech codes

NOTE : This is a continuation thread for any future readers who stumble upon similar issues. Before you start off here, do give the conversation on this issue a read.

I am trying to use the deepspeech model to train for scenetext tasks on images. So far, I have been able to convert my data to the LMDB format expected by the codes and run the training scripts, but the error acts really goofy and keeps skipping between inf/nan/+ve/-ve values. Initial trials on this included limiting the value of the MaxNorm of gradients to stop the exploding gradients but that didn't help. The next attempt was to replace the original vanilla RNNs of DeepSpeech2 with LSTM layers in hopes of limiting the gradient-explosion. To do so, one needs to change the RNNModule class in DeepSpeech.lua as pointed out by @SeanNaren below.

Change:

local function RNNModule(inputDim, hiddenDim, opt)
    if opt.nGPU > 0 then
        require 'BatchBRNNReLU'
        return cudnn.BatchBRNNReLU(inputDim, hiddenDim)
    else
        require 'rnn'
        return nn.SeqBRNN(inputDim, hiddenDim)
    end
end

to something like:

local function RNNModule(inputDim, hiddenDim, opt)
        require 'cudnn'
        local rnn = nn.Sequential()
        rnn:add(cudnn.BLSTM(inputDim, hiddenDim, 1)
        rnn:add(nn.View(-1, 2, outputDim):setNumInputDims(2)) -- have to sum activations
        rnn:add(nn.Sum(3))
        return rnn
end

@SeanNaren : can you help me out understanding what does the outputDim signify in the changed code? We have the output-dims different from the hidden-dims?

"nan" value of average loss when training using LibriSpeech corpus

Hi~@SeanNaren, I just get a "nan" value of average loss when training using LibriSpeech corpus:

I think the problem comes from indexer:nxt_inds() in Loader.lua. When:
self.lmdb_size = self.cnt + self.batch_size - 1
then:
self.cnt = self.batch_size - (self.lmdb_size - self.cnt) = 1
the following code will cause unexpected issues since self.cnt - 1 = 0:
for i = 1, self.cnt - 1 do
table.insert(inds, i)
end
It's that right?

Multi-gpu support

Currently the branch doesn't support multi-gpu. I'll be fixing this ASAP.

AN4 decoding result

Hi,
Does the model get reasonable WER？It's better putting decoding WER info into a file(e.g RESULTS) and commit it.

Recently, I warp cudnn-RNN and warp-ctc using C++ language, but the training process is abnormal.
I write test code for RecurrentForwardTraining, the output of cudnn LSTM is different with the result of the formula(has typo) in Step 1: Optimizing a Single Iteration, block here. So I wanna know the progress in here (there is error loading torch when I run the code, have not time to solve it.)

Best,
Feiteng

Cuda out of memory error while training on LibriSpeech dataset

I am trying to train the model on libri speech dev-clean dataset, where my train split = 2503 and val split = 200. I reduced my val split thinking this might be the issue . Based on the memory consumption (which I checked using nvidia-smi), I think all the training data is loaded at once and so is the validation right? . Did anyone face this issue?
Following is the stack trace

THCudaCheck FAIL file=/tmp/luarocks_cutorch-scm-1-6130/cutorch/lib/THC/generic/THCStorage.cu line=40 error=2 : out of memory
/home/sbp3624/torch/install/bin/luajit: /home/sbp3624/torch/install/share/lua/5.1/nn/Container.lua:67: 
In 2 module of nn.Sequential:
In 1 module of nn.Sequential:
In 5 module of cudnn.BatchBRNNReLU:
/home/sbp3624/torch/install/share/lua/5.1/cudnn/RNN.lua:308: cuda runtime error (2) : out of memory at /tmp/luarocks_cutorch-scm-1-6130/cutorch/lib/THC/generic/THCStorage.cu:40
stack traceback:
    [C]: in function 'resize'
    /home/sbp3624/torch/install/share/lua/5.1/cudnn/RNN.lua:308: in function </home/sbp3624/torch/install/share/lua/5.1/cudnn/RNN.lua:262>
    [C]: in function 'xpcall'
    /home/sbp3624/torch/install/share/lua/5.1/nn/Container.lua:63: in function 'rethrowErrors'
    /home/sbp3624/torch/install/share/lua/5.1/nn/Sequential.lua:44: in function </home/sbp3624/torch/install/share/lua/5.1/nn/Sequential.lua:41>
    [C]: in function 'xpcall'
    /home/sbp3624/torch/install/share/lua/5.1/nn/Container.lua:63: in function 'rethrowErrors'
    /home/sbp3624/torch/install/share/lua/5.1/nn/Sequential.lua:44: in function </home/sbp3624/torch/install/share/lua/5.1/nn/Sequential.lua:41>
    [C]: in function 'xpcall'
    /home/sbp3624/torch/install/share/lua/5.1/nn/Container.lua:63: in function 'rethrowErrors'
    /home/sbp3624/torch/install/share/lua/5.1/nn/Sequential.lua:44: in function 'forward'
    ./ModelEvaluator.lua:70: in function 'runEvaluation'
    ./Network.lua:78: in function 'testNetwork'
    ./Network.lua:170: in function 'trainNetwork'
    Train.lua:42: in main chunk
    [C]: in function 'dofile'
    ...3624/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk
    [C]: at 0x00406670

WARNING: If you see a stack trace below, it doesn't point to the place where this error occurred. Please use only the one above.
stack traceback:
    [C]: in function 'error'
    /home/sbp3624/torch/install/share/lua/5.1/nn/Container.lua:67: in function 'rethrowErrors'
    /home/sbp3624/torch/install/share/lua/5.1/nn/Sequential.lua:44: in function 'forward'
    ./ModelEvaluator.lua:70: in function 'runEvaluation'
    ./Network.lua:78: in function 'testNetwork'
    ./Network.lua:170: in function 'trainNetwork'
    Train.lua:42: in main chunk
    [C]: in function 'dofile'
    ...3624/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk
    [C]: at 0x00406670

Create pre-trained model on Librispeech

There has been a lot of attempts to get the model trained on Librispeech 1k hours of training data so providing a trained model I think would be very beneficial (as well as the steps to replicate).

Prepare dataset on file system
Create LMDB and make modifications to support this
Train till convergence
Add model to repo and document

Stretch goal:

Allow loading of previous model and training from these weights

Why not using nn.SeqBRNN ?

This is not an issue actually, but I was asking why not using nn.SeqBRNN in building the deepspeech model instead of using MaskRNN and ReverseMaskRNN ?. Thank you.

Data Loading Model Out of Memery

If i training use my own data,data loading out the memery of my michine,Can you change the data loading model like this code https://github.com/soumith/imagenet-multiGPU.torch

Issues to Run on GPU

Hi,
Thanks for your support up to now, We are simultaneously running on GPU also. We are using entry level NVIDIA GPU, Quadro K420, which is having 192 CUDA Cores and Total Memory 1024MB. I installed all the dependencies which is mentioned by you in README.md file. I am facing the following error. After this error also I checked the dependencies but no change.

"Training Epoch: 1
lua: /root/torch/install/share/lua/5.1/nn/Container.lua:67:
In 1 module of nn.Sequential:
In 1 module of nn.Sequential:
/root/torch/install/share/lua/5.1/cudnn/init.lua:58: Error in CuDNN: CUDNN_STATUS_BAD_PARAM (cudnnSetFilterNdDescriptor)
stack traceback:
[C]: in function 'error'
/root/torch/install/share/lua/5.1/cudnn/init.lua:58: in function 'errcheck'
...h/install/share/lua/5.1/cudnn/SpatialConvolution.lua:45: in function 'resetWeightDescriptors'
...h/install/share/lua/5.1/cudnn/SpatialConvolution.lua:358: in function <...h/install/share/lua/5.1/cudnn/SpatialConvolution.lua:357>
(tail call): ?
[C]: in function 'xpcall'
/root/torch/install/share/lua/5.1/nn/Container.lua:58: in function 'rethrowErrors'
/root/torch/install/share/lua/5.1/nn/Sequential.lua:44: in function </root/torch/install/share/lua/5.1/nn/Sequential.lua:41>
(tail call): ?
[C]: in function 'xpcall'
/root/torch/install/share/lua/5.1/nn/Container.lua:58: in function 'rethrowErrors'
/root/torch/install/share/lua/5.1/nn/Sequential.lua:44: in function </root/torch/install/share/lua/5.1/nn/Sequential.lua:41>
(tail call): ?
./Network.lua:95: in function 'opfunc'
/root/torch/install/share/lua/5.1/optim/sgd.lua:44: in function 'sgd'
./Network.lua:111: in function 'trainNetwork'
AN4CTCTrain.lua:40: in main chunk
[C]: ?"

Please support ...

problem with installing audio lib

After I installed the first two package named fftw3-dev and sox，I used” luarocks install https://raw.githubusercontent.com/soumith/lua---audio/master/audio-0.1-0.rockspec“ , it comes error like

Scanning dependencies of target audio
[ 50%] Building C object CMakeFiles/audio.dir/audio.c.o
Linking C shared module libaudio.so
/usr/bin/ld: /usr/local/lib/libfftw3.a(mapflags.o): relocation R_X86_64_32 against `.rodata' can not be used when making a shared object; recompile with -fPIC
/usr/local/lib/libfftw3.a: error adding symbols: Bad value
collect2: error: ld returned 1 exit status
make[2]: *** [libaudio.so] Error 1
make[1]: *** [CMakeFiles/audio.dir/all] Error 2
make: *** [all] Error 2

Error: Build error: Failed building.

I tried to figure this out but failed

What is the possible improvement on the performance of this project?

I've trained LibriSpeech train-clean-100 dataset with 20 epoch and get this results.

Training Epoch: 20 Average Loss: 2.504252 Average Validation WER: 40.84 Average Validation CER: 3.60
It doesn't seem maximally optimized as done in the paper(They suggest 1% of their dataset, 120 hours and got WER 29.23).

But the result seems encouraging for catching up their performance by adapting few techniques. Like SortaGrad or Language Models.

Could you suggest me what part of the paper is not implemented in here or possible improvements, then I could work on it improving performance of this project!

Using a Custom Dataset

Hi Sean,

I was trying to tweak your code to incorporate the Librispeech dataset, which has ~1000 hours of data (http://www.openslr.org/12/). However, for some reason the loss shows 'nan' after a few epochs. It is most likely something to do with how I am processing the data. I was hoping if you had any advice.

CTCCriterion.lua:36: assertion failed!

Hi @SeanNaren , I generate LMDB for AN4 dataset accroding to prepare_an4/prepare.sh script. Then run th AN4CTCTrain.lua and I get this error:

preparing sorted indices..
found previously saved inds..
/home/kongchang/nfs/torch/install/bin/luajit: ...ang/nfs/torch/install/share/lua/5.1/nnx/CTCCriterion.lua:36: assertion failed!
stack traceback:
        [C]: in function 'assert'
        ...ang/nfs/torch/install/share/lua/5.1/nnx/CTCCriterion.lua:36: in function 'forward'
        ./Network.lua:127: in function 'opfunc'
        .../kongchang/nfs/torch/install/share/lua/5.1/optim/sgd.lua:44: in function 'sgd'
        ./Network.lua:145: in function 'trainNetwork'
        AN4CTCTrain.lua:44: in main chunk
        [C]: in function 'dofile'
        .../nfs/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk
        [C]: at 0x00406670

I locate this error in CTCCriterion.lua:36 which is a ssert assert(acts:nDimension() == 2), so I print the value which is 0. Do you have any idea?

error when following instructions in data preparation (Librispeech)

The following fails:

$ th MakeLMDB.lua -rootPath prepare_datasets/libri_dataset -lmdbPath prepare_datasets/libri_lmdb -windowSize 0.02 -stride 0.01 -sampleRate 16000 -processes 8
luajit: MakeLMDB.lua:62: attempt to index field 'file' (a nil value)
stack traceback:
        MakeLMDB.lua:62: in function 'code'
        MakeLMDB.lua:170: in function 'f'
        (command line):4: in main chunk
        [C]: at 0x00405d50
MakeLMDB.lua:127: attempt to index local 'vec' (a nil value)
<parallel#000>  closing session

because at some point it tries:

$ find -L prepare_datasets/libri_dataset/train -type f -name '*.sph'

and the file extension in libri_dataset is .flac not .sph. Adding the extension parameter solves the problem:

$ th MakeLMDB.lua -rootPath prepare_datasets/libri_dataset  -lmdbPath prepare_datasets/libri_lmdb -windowSize 0.02 -stride 0.01 -sampleRate 16000 -processes 8 -audioExtension flac

I also suggest changing the "-processes 8" to "-processes $(nproc)"

BGRU error

when i change BGRU to GRU an new errror appear：

[==================== 948/948 ================>] Tot: 13s848ms | Step: 14ms
Training Epoch: 1
lua: /home/q/xingxing.tang/torch/install/share/lua/5.1/nn/Container.lua:67:
In 12 module of nn.Sequential:
/home/q/xingxing.tang/torch/install/share/lua/5.1/torch/Tensor.lua:462: Wrong size for view. Input size: 22x11x400. Output size: 242x800
stack traceback:
[C]: in function 'error'
/home//torch/install/share/lua/5.1/torch/Tensor.lua:462: in function 'view'
/home/g/torch/install/share/lua/5.1/ctchelpers/Linear3D.lua:70: in function </home/g/torch/install/share/lua/5.1/ctchelpers/Linear3D.lua:40>
(tail call): ?
[C]: in function 'xpcall'
/home/g/torch/install/share/lua/5.1/nn/Container.lua:58: in function 'rethrowErrors'
/homng/torch/install/share/lua/5.1/nn/Sequential.lua:44: in function </home/q/xg/torch/install/share/lua/5.1/nn/Sequential.lua:41>
(tail call): ?
./Network.lua:65: in function 'opfunc'
/home/q/xng/torch/install/share/lua/5.1/optim/sgd.lua:44: in function 'sgd'
./Network.lua:81: in function 'trainNetwork'
AN4CTCTrain.lua:45: in main chunk
[C]: ?

WARNING: If you see a stack trace below, it doesn't point to the place where this error occured. Please use only the one above.
stack traceback:

    [C]: ?

Training using AN4 phoneme dataset

@CCorfield was incredibly nice enough to modify the current code to now support phonemes (as well as clean the hell out of the code, will manually shift refactors into the main branch). Check it out here. Here is a graph showing the phoneme error rate as well as the training loss over time:

![alt text](http://i.imgur.com/eef8Ru9.png"Training graph")
Again thanks so much @CCorfield and I'll work on porting a large chunk of it over to master.

Mini-batching of audio data

I'll be implementing mini-batching via bucketing similar sizes, then padding to the largest tensor size in the batch.

Once this is implemented I will run evaluation to make sure this does not harm the word error rate.

training error in Training using AN4 phoneme with artic data

lua AN4CTCTrain.lua [==================== 1000/1000 ==============>] Tot: 10s819ms | Step: 10ms
[==================== 131/131 ================>] Tot: 1s417ms | Step: 10ms
Training Epoch: 1
*** Error in `lua': realloc(): invalid next size: 0x00000001408cc890 ***73ms
Aborted (core dumped)

how to find phoneme recognition rate from this code ?

Hai,
Thanks for sharing this code , i want the phoneme recognition error rate from this code ,So were to modify in this code to get phoneme recognition error rate
Next in this code I had not found any were using language models but it is in etc folder of an4 dataset.
One more thing i am doing experiments with this, is it possible to modify the spectrogram or I can directly input spectrogram to this code [generated from Matlab ]

Regards
---Binil---

"You must pass the size of each sequence in the batch as a tensor" bug when running.

Here is the full error message:

/opt/zbstudio/bin/linux/x64/lua: /home/pdat/torch/install/share/lua/5.1/nnx/CTCCriterion.lua:24: You must pass the size of each sequence in the batch as a tensor
stack traceback:
[C]: in function 'assert'
/home/pdat/torch/install/share/lua/5.1/nnx/CTCCriterion.lua:24: in function 'forward'
./Network.lua:79: in function 'opfunc'
/home/pdat/torch/install/share/lua/5.1/optim/sgd.lua:44: in function 'sgd'
./Network.lua:102: in function 'trainNetwork'
...dat/torch/workspace/CTCSpeechRecognition/AN4CTCTrain.lua:47: in main chunk
[C]: at 0x004054e8

Then I check CTCCriterion.lua in nnx package, see it requires 3 parameters to the 'forward' function.
Any help to fix this issue is very appreciated.

GRU?

What would it take to add a GRU variation to compare with LSTMs?

seannaren / deepspeech.torch Goto Github PK

deepspeech.torch's People

Contributors

Stargazers

Watchers

Forkers

deepspeech.torch's Issues

Recommend Projects

Recommend Topics

Recommend Org