hawkaaron / rnn-transducer Goto Github PK

View Code? Open in Web Editor NEW

135.0 8.0 31.0 50 KB

MXNet implementation of RNN Transducer (Graves 2012): Sequence Transduction with Recurrent Neural Networks

Python 82.47% Shell 17.53%

end-to-end asr transducers mxnet rnn-transducer timit speech-recognition sequence-transduction rnnt-joint rnnt-model

rnn-transducer's Introduction

End-to-End Speech Recognition using RNN-Transducer

File description

eval.py: rnnt joint model decode
model.py: rnnt model, which contains acoustic / phoneme model
model2012.py: rnnt model refer to Graves2012
seq2seq/*: seq2seq with attention
rnnt_np.py: rnnt loss function implementation on mxnet, support for both symbol and gluon refer to PyTorch implementation
DataLoader.py: data process
train.py: rnnt training script, can be initialized from CTC and PM model
train_ctc.py: ctc training script
train_att.py: attention training script

Directory description

conf: kaldi feature extraction config

Reference Paper

RNN Transducer (Graves 2012): Sequence Transduction with Recurrent Neural Networks
RNNT joint (Graves 2013): Speech Recognition with Deep Recurrent Neural Networks
E2E criterion comparison (Baidu 2017): Exploring Neural Transducers for End-to-End Speech Recognition
Seq2Seq-Attention: Attention-Based Models for Speech Recognition

Run

Compile RNNT Loss Follow the instructions in here to compile MXNET with RNNT loss.
Extract feature link kaldi timit example dirs (local steps utils ) excute run.sh to extract 40 dim fbank feature run feature_transform.sh to get 123 dim feature as described in Graves2013
Train RNNT model:

python train.py --lr 1e-3 --bi --dropout .5 --out exp/rnnt_bi_lr1e-3 --schedule

Evaluation

Default only for RNNT

Greedy decoding:

python eval.py <path to best model parameters> --bi

Beam search:

python eval.py <path to best model parameters> --bi --beam <beam size>

Results

CTC

Decode PER

greedy 20.36

beam 100 20.03
Transducer

Decode PER

greedy 20.74

beam 40 19.84

Decode	PER
greedy	20.36
beam 100	20.03

Decode	PER
greedy	20.74
beam 40	19.84

Requirements

Python 3.6
MxNet 1.1.0
numpy 1.14

TODO

beam serach accelaration
Seq2Seq with attention

rnn-transducer's People

Contributors

Stargazers

Watchers

rnn-transducer's Issues

DataLoader.py return these errors...

################################################################################

WARNING, path does not exist: KALDI_ROOT=/mnt/matylda5/iveselyk/Tools/kaldi-trunk

(please add 'export KALDI_ROOT=<your_path>' in your $HOME/.profile)

(or run as: KALDI_ROOT=<your_path> python <your_script>.py)

################################################################################

{'': 0, 'sil': 1, 'aa': 2, 'ae': 3, 'ah': 4, 'ao': 5, 'aw': 6, 'ax': 7, 'ay': 8, 'b': 9, 'ch': 10, 'cl': 11, 'd': 12, 'dh': 13, 'dx': 14, 'eh': 15, 'el': 16, 'en': 17, 'epi': 18, 'er': 19, 'ey': 20, 'f': 21, 'g': 22, 'hh': 23, 'ih': 24, 'ix': 25, 'iy': 26, 'jh': 27, 'k': 28, 'l': 29, 'm': 30, 'n': 31, 'ng': 32, 'ow': 33, 'oy': 34, 'p': 35, 'r': 36, 's': 37, 'sh': 38, 't': 39, 'th': 40, 'uh': 41, 'uw': 42, 'v': 43, 'vcl': 44, 'w': 45, 'y': 46, 'z': 47, 'zh': 48, '#0': 49, '#1': 50}
/home/wxt/kaldi/src/featbin/copy-feats scp:data/train/feats.scp ark:-
/home/wxt/kaldi/src/featbin/apply-cmvn --utt2spk=ark:data/train/utt2spk scp:data/train/cmvn.scp ark:- ark:-
/home/wxt/kaldi/src/featbin/add-deltas --delta-order=2 ark:- ark:-
/home/wxt/kaldi/src/nnetbin/nnet-forward data/final.feature_transform ark:- ark:-
Traceback (most recent call last):
File "/home/wxt/kaldi/egs/timit/s5/DataLoader.py", line 118, in
SequentialLoader('train')._dump()
File "/home/wxt/kaldi/egs/timit/s5/DataLoader.py", line 63, in _dump
with open('data-npy/'+k+'.y', 'wb') as f:
FileNotFoundError: [Errno 2] No such file or directory: 'data-npy/FAEM0_SI1392.y'
Exception in thread Thread-1:
Traceback (most recent call last):
File "/usr/lib/python3.6/threading.py", line 916, in _bootstrap_inner
self.run()
File "/usr/lib/python3.6/threading.py", line 864, in run
self._target(*self._args, **self._kwargs)
File "/home/wxt/RNNTgraves2013/venv/lib/python3.6/site-packages/kaldi_io/kaldi_io.py", line 97, in cleanup
raise SubprocessFailed('cmd %s returned %d !' % (cmd,ret))
kaldi_io.kaldi_io.SubprocessFailed: cmd /home/wxt/kaldi/src/featbin/copy-feats scp:data/train/feats.scp ark:- | /home/wxt/kaldi/src/featbin/apply-cmvn --utt2spk=ark:data/train/utt2spk scp:data/train/cmvn.scp ark:- ark:- | /home/wxt/kaldi/src/featbin/add-deltas --delta-order=2 ark:- ark:- | /home/wxt/kaldi/src/nnetbin/nnet-forward data/final.feature_transform ark:- ark:- returned 141 !

What is the license for the code/model in this repository?

Can you please add a LICENSE file to your repository stating the license of the files?

A problem about getting 123 dim feature

speech@speech-All-Series:/media/speech/speech1/实验/RNN-Transducer-graves2013$ ./feature_transform.sh

feat-to-dim 'ark:copy-feats scp:data/train/feats.scp ark:- | apply-cmvn --utt2spk=ark:data/train/utt2spk scp:data/train/cmvn.scp ark:- ark:- | add-deltas --delta-order=2 ark:- ark:- |' -
copy-feats scp:data/train/feats.scp ark:-
apply-cmvn --utt2spk=ark:data/train/utt2spk scp:data/train/cmvn.scp ark:- ark:-
add-deltas --delta-order=2 ark:- ark:-
WARNING (feat-to-dim[5.5.719~1-22a4]:Close():kaldi-io.cc:515) Pipe copy-feats scp:data/train/feats.scp ark:- | apply-cmvn --utt2spk=ark:data/train/utt2spk scp:data/train/cmvn.scp ark:- ark:- | add-deltas --delta-order=2 ark:- ark:- | had nonzero return status 36096
nnet-initialize: symbol lookup error: /home/speech/kaldi/src/lib/libkaldi-cudamatrix.so: undefined symbol: _ZN5kaldi16g_cuda_allocatorE

Put the testing process to gpu

Hi, @HawkAaron , have you tried to put the testing process to gpu? Since for beam search way, it will take much time is the beam size is big. I have tried to modify the code. But there exists some error. I changed the context = mx.cpu(0) to context = mx.gpu(0).

Loss is decrease but SER is increase

Hello, I used RNNT training on the Chinese speech recognition library of more than 300 hours (the encoder did pretrain, but the decoder is a random initialization parameter). After training dozens of epoch, the loss first quickly dropped from more than 1000 to 60. Then slowly dropped to more than 20, but the SER of inference has risen from 2 to 20. Is this normal? It seems that you mentioned this phenomenon elsewhere.
Thank you very much!

Phoneme Mapping

It seems you map all the label to 48 classes.

RNN-Transducer/DataLoader.py

Line 6 in 786fa75

with open('data/lang/phones.txt', 'r') as f:

But the number of classes is 62 in Transducer setting. Is it true ? Thanks !

your greed decode implement is wrong.

I think your transducer greed decode implement is wrong.
here is my implement of pytorch.

Training epochs

Hi @HawkAaron ,

I have trained for over 100 epochs so far and the training loss is around -200000 and decreasing. Since I am not familiar with the RNNTLoss, the loss seemed quite large to me and I am not sure if it's converging.

And also I ran the evaluation just to have a sneak peek at the PER, which is around 97% for a greedy search (beam search took way much longer).

Did 200 epochs get you around 20% PER? I just want to know if my training is on the right track.

Thanks and your feedback would be really appreciated!

Why should we use log_softmax when training?

@HawkAaron Why should we use log_softmax when training at here. When training the CTC model, there is no activation function such as softmax and log_softmax at here

Is there something different?

Where does this come from? self.loss = gluon.loss.RNNTLoss(blank_label=blank)

Hello,

I am trying to run your code and I discovered that you used gluon.loss.RNNTLoss() which does not exist. Could you clarify on this function? Did you mean from warprnnt_pytorch import RNNTLoss?

Thanks.

mxnet.base.MXNetError: [11:42:46] src/operator/contrib/./rnnt_loss-inl.h:218: Check failed: dshape[0] == lshape[0] (474 vs. 1) The batch size for the labels and data arrays must be the same.

When I run the code and I get the current error. I try to get the size of variable ytu, ys, their size are 1x474x60x62 and 1x59, is there something wrong?

Traceback (most recent call last):
File "train.py", line 133, in
train()
File "train.py", line 99, in train
loss = model(xs, ys, xlen, ylen)
File "/disk2/dongsq/environments/python3.6/lib/python3.6/site-packages/mxnet/gluon/block.py", line 360, in call
Exception in thread Thread-1:
Traceback (most recent call last):
File "/disk2/dongsq/environments/python3.6/lib/python3.6/threading.py", line 916, in _bootstrap_inner
self.run()
File "/disk2/dongsq/environments/python3.6/lib/python3.6/threading.py", line 864, in run
self._target(*self._args, **self._kwargs)
File "/disk2/dongsq/environments/python3.6/lib/python3.6/site-packages/kaldi_io/kaldi_io.py", line 82, in cleanup
raise SubprocessFailed('cmd %s returned %d !' % (cmd,ret))
kaldi_io.kaldi_io.SubprocessFailed: cmd copy-feats scp:data/train/feats.scp ark:- | apply-cmvn --utt2spk=ark:data/train/utt2spk scp:data/train/cmvn.scp ark:- ark:- | add-deltas --delta-order=2 ark:- ark:- | nnet-forward data/final.feature_transform ark:- ark:- returned 141 !

return self.forward(*args)

File "/disk2/dongsq/asr/RNN-Transducer/model.py", line 87, in forward
loss = self.loss(ytu, ys, xlen, ylen)
File "/disk2/dongsq/environments/python3.6/lib/python3.6/site-packages/mxnet/gluon/block.py", line 360, in call
return self.forward(*args)
File "/disk2/dongsq/environments/python3.6/lib/python3.6/site-packages/mxnet/gluon/block.py", line 575, in forward
return self.hybrid_forward(ndarray, x, *args, **params)
File "/disk2/dongsq/asr/RNN-Transducer/rnnt_mx.py", line 24, in hybrid_forward
blank_label=self.blank_label)
File "", line 75, in RNNTLoss
File "/disk2/dongsq/environments/python3.6/lib/python3.6/site-packages/mxnet/_ctypes/ndarray.py", line 92, in _imperative_invoke
ctypes.byref(out_stypes)))
File "/disk2/dongsq/environments/python3.6/lib/python3.6/site-packages/mxnet/base.py", line 146, in check_call
raise MXNetError(py_str(_LIB.MXGetLastError()))
mxnet.base.MXNetError: [11:42:46] src/operator/contrib/./rnnt_loss-inl.h:218: Check failed: dshape[0] == lshape[0] (474 vs. 1) The batch size for the labels and data arrays must be the same.

A question about vocab_size

hi, thanks for your work. I am a newbie, I want to know how to build the vocab file, and the vocab file whether including blank or not. looking forward to your reply.

src/operator/rnn.cc:254:30: error: ‘kCuDNNDropoutDesc’ is not a member of ‘mxnet::ResourceRequest’

CentOS 7 + CUDA8.0 + CuDNN 6.0.21

When I reinstall MxNet with source code and make it, I got the current error. There is nothing about it on the Internet, Can you help me?

g++ -std=c++11 -c -DMSHADOW_FORCE_STREAM -Wall -Wsign-compare -O3 -DNDEBUG=1 -I/disk2/dongsq/asr/mxnet/3rdparty/mshadow/ -I/disk2/dongsq/asr/mxnet/3rdparty/dmlc-core/include -fPIC -I/disk2/dongsq/asr/mxnet/3rdparty/tvm/nnvm/include -I/disk2/dongsq/asr/mxnet/3rdparty/dlpack/include -I/disk2/dongsq/asr/mxnet/3rdparty/tvm/include -Iinclude -funroll-loops -Wno-unused-parameter -Wno-unknown-pragmas -Wno-unused-local-typedefs -msse3 -mf16c -I/usr/local/cuda/include -DMSHADOW_USE_CBLAS=1 -DMSHADOW_USE_MKL=0 -I/disk2/dongsq/asr/mxnet/3rdparty/mkldnn/build/install/include -DMSHADOW_RABIT_PS=0 -DMSHADOW_DIST_PS=0 -DMSHADOW_USE_PASCAL=0 -DMXNET_USE_MKLDNN=1 -DUSE_MKL=1 -I/disk2/dongsq/asr/mxnet/src/operator/nn/mkldnn/ -I/disk2/dongsq/asr/mxnet/3rdparty/mkldnn/build/install/include -DMXNET_USE_OPENCV=1 -I/disk2/dongsq/environments/OpenCV-3.4.6/include/opencv -I/disk2/dongsq/environments/OpenCV-3.4.6/include -fopenmp -DMXNET_USE_OPERATOR_TUNING=1 -DMXNET_USE_LAPACK -DMSHADOW_USE_CUDNN=1 -I/disk2/dongsq/asr/mxnet/3rdparty/nvidia_cub -DMXNET_ENABLE_CUDA_RTC=1 -DMXNET_USE_NCCL=0 -DMXNET_USE_LIBJPEG_TURBO=0 -MMD -c src/operator/convolution_v1.cc -o build/src/operator/convolution_v1.o
g++ -std=c++11 -c -DMSHADOW_FORCE_STREAM -Wall -Wsign-compare -O3 -DNDEBUG=1 -I/disk2/dongsq/asr/mxnet/3rdparty/mshadow/ -I/disk2/dongsq/asr/mxnet/3rdparty/dmlc-core/include -fPIC -I/disk2/dongsq/asr/mxnet/3rdparty/tvm/nnvm/include -I/disk2/dongsq/asr/mxnet/3rdparty/dlpack/include -I/disk2/dongsq/asr/mxnet/3rdparty/tvm/include -Iinclude -funroll-loops -Wno-unused-parameter -Wno-unknown-pragmas -Wno-unused-local-typedefs -msse3 -mf16c -I/usr/local/cuda/include -DMSHADOW_USE_CBLAS=1 -DMSHADOW_USE_MKL=0 -I/disk2/dongsq/asr/mxnet/3rdparty/mkldnn/build/install/include -DMSHADOW_RABIT_PS=0 -DMSHADOW_DIST_PS=0 -DMSHADOW_USE_PASCAL=0 -DMXNET_USE_MKLDNN=1 -DUSE_MKL=1 -I/disk2/dongsq/asr/mxnet/src/operator/nn/mkldnn/ -I/disk2/dongsq/asr/mxnet/3rdparty/mkldnn/build/install/include -DMXNET_USE_OPENCV=1 -I/disk2/dongsq/environments/OpenCV-3.4.6/include/opencv -I/disk2/dongsq/environments/OpenCV-3.4.6/include -fopenmp -DMXNET_USE_OPERATOR_TUNING=1 -DMXNET_USE_LAPACK -DMSHADOW_USE_CUDNN=1 -I/disk2/dongsq/asr/mxnet/3rdparty/nvidia_cub -DMXNET_ENABLE_CUDA_RTC=1 -DMXNET_USE_NCCL=0 -DMXNET_USE_LIBJPEG_TURBO=0 -MMD -c src/operator/spatial_transformer.cc -o build/src/operator/spatial_transformer.o
src/operator/rnn.cc: In lambda function:
src/operator/rnn.cc:254:30: error: ‘kCuDNNDropoutDesc’ is not a member of ‘mxnet::ResourceRequest’
request.emplace_back(ResourceRequest::kCuDNNDropoutDesc);
^

Pre-training LM

Hi @HawkAaron ,

I am trying out pre-training acoustic and prediction models separately. Could you share about how you trained the prediction model independent of RNN-T model and what the data set you used for that?

Thanks!

A Question About Padding

Hi, I am new with ASR and RNN-Transducer. I have a question when I read the code. Why pad the label sequence with the last value instead of a special token <pad> like usually done in NLP? The inputs corresponding to the padding part are zero-values.

Looking forward to your reply~

model structure Json file

Hi @HawkAaron ,

Thanks for your help with training the network. I am trying to extract the model structure Json file with corresponding parameters files. Since I am new to MXnet and HybridBlock was not used for building the network, do you have a work-around to save the model structure?

I understand this is not RNN-T related issue and I would appreciate it a lot if you could offer some insights.

Thanks.

question in greedy decoder

Hi @HawkAaron ,

I don't quite understand why you have vocab_size-1 in your model.py greedy_decode code
line 70: y = mx.nd.zeros((1, 1, self.vocab_size-1)) # first zero vector
Could you tell me what vocab you're excluding here?

Thanks!

greedy_decorder no output

Hi, for testing part, the greedy_decorder or beam search gives a empty output. Below is the corresponding screens.

The performance gets worse when I use a bigger model

@HawkAaron I tried to train the transducer on my own dataset. When I increase the hidden_size from 250 to 512 and increase the num_layers from 3 to 5, the performance became worse, why? The bigger model should be over-fitting more easily, isn't it?

MXNet with Version 1.1.0 doesn't contain the install files (/docs/install), Version 1.4.x doesn't work (Shape Error)

An error is thrown on version 1.4.x
In file included from src/operator/contrib/rnnt_loss.cc:27:0: src/operator/contrib/./rnnt_loss-inl.h: In member function 'virtual bool mxnet::op::RNNTLossProp::InferShape(std::vector<mxnet::TShape>*, std::vector<mxnet::TShape>*, std::vector<mxnet::TShape>*) const': src/operator/contrib/./rnnt_loss-inl.h:230:20: error: no matching function for call to 'mxnet::TShape::TShape(int)' TShape oshape(1); ^ In file included from include/mxnet/./base.h:38:0, from include/mxnet/operator.h:38, from src/operator/contrib/./rnnt_loss-inl.h:32, from src/operator/contrib/rnnt_loss.cc:27: include/mxnet/./tuple.h:519:10: note: candidate: template<int dim> mxnet::TShape::TShape(mshadow::Shape<ndim>&&) inline TShape(mshadow::Shape<dim> &&s) {// NOLINT(*) ^ include/mxnet/./tuple.h:519:10: note: template argument deduction/substitution failed: In file included from src/operator/contrib/rnnt_loss.cc:27:0: src/operator/contrib/./rnnt_loss-inl.h:230:20: note: mismatched types 'mshadow::Shape<ndim>' and 'int' TShape oshape(1); ^ In file included from include/mxnet/./base.h:38:0, from include/mxnet/operator.h:38, from src/operator/contrib/./rnnt_loss-inl.h:32, from src/operator/contrib/rnnt_loss.cc:27: include/mxnet/./tuple.h:514:10: note: candidate: template<int dim> mxnet::TShape::TShape(const mshadow::Shape<ndim>&) inline TShape(const mshadow::Shape<dim> &s) {// NOLINT(*) ^ include/mxnet/./tuple.h:514:10: note: template argument deduction/substitution failed: In file included from src/operator/contrib/rnnt_loss.cc:27:0: src/operator/contrib/./rnnt_loss-inl.h:230:20: note: mismatched types 'const mshadow::Shape<ndim>' and 'int' TShape oshape(1); ^ In file included from include/mxnet/./base.h:38:0, from include/mxnet/operator.h:38, from src/operator/contrib/./rnnt_loss-inl.h:32, from src/operator/contrib/rnnt_loss.cc:27: include/mxnet/./tuple.h:449:10: note: candidate: template<class RandomAccessIterator, typename std::enable_if<std::is_same<typename std::iterator_traits<_Iterator>::iterator_category, std::random_access_iterator_tag>::value, int>::type <anonymous> > mxnet::TShape::TShape(RandomAccessIterator, RandomAccessIterator) inline TShape(RandomAccessIterator begin, ^ include/mxnet/./tuple.h:449:10: note: template argument deduction/substitution failed: In file included from src/operator/contrib/rnnt_loss.cc:27:0: src/operator/contrib/./rnnt_loss-inl.h:230:20: note: candidate expects 2 arguments, 1 provided TShape oshape(1); ^ In file included from include/mxnet/./base.h:38:0, from include/mxnet/operator.h:38, from src/operator/contrib/./rnnt_loss-inl.h:32, from src/operator/contrib/rnnt_loss.cc:27: include/mxnet/./tuple.h:434:10: note: candidate: mxnet::TShape::TShape(mxnet::Tuple<long int>&&) inline TShape(Tuple<dim_t>&& s) { // NOLINT(*) ^ include/mxnet/./tuple.h:434:10: note: no known conversion for argument 1 from 'int' to 'mxnet::Tuple<long int>&&' include/mxnet/./tuple.h:427:10: note: candidate: mxnet::TShape::TShape(std::initializer_list<long int>) inline TShape(std::initializer_list<dim_t> init) { ^ include/mxnet/./tuple.h:427:10: note: no known conversion for argument 1 from 'int' to 'std::initializer_list<long int>' include/mxnet/./tuple.h:416:10: note: candidate: mxnet::TShape::TShape(const mxnet::Tuple<long int>&) inline TShape(const Tuple<dim_t>& s) { // NOLINT(*) ^ include/mxnet/./tuple.h:416:10: note: no known conversion for argument 1 from 'int' to 'const mxnet::Tuple<long int>&' include/mxnet/./tuple.h:406:10: note: candidate: mxnet::TShape::TShape(int, dim_t) inline TShape(const int ndim, const dim_t value) { // NOLINT(*) ^ include/mxnet/./tuple.h:406:10: note: candidate expects 2 arguments, 1 provided include/mxnet/./tuple.h:398:3: note: candidate: mxnet::TShape::TShape() TShape() { ^ include/mxnet/./tuple.h:398:3: note: candidate expects 0 arguments, 1 provided include/mxnet/./tuple.h:395:7: note: candidate: mxnet::TShape::TShape(const mxnet::TShape&) class TShape : public Tuple<dim_t> { ^ include/mxnet/./tuple.h:395:7: note: no known conversion for argument 1 from 'int' to 'const mxnet::TShape&' include/mxnet/./tuple.h:395:7: note: candidate: mxnet::TShape::TShape(mxnet::TShape&&) include/mxnet/./tuple.h:395:7: note: no known conversion for argument 1 from 'int' to 'mxnet::TShape&&'

On version 1.1.0 the install file no longer exists. ./install_mxnet_ubuntu_python.sh

mxnet.base.MXNetError: [16:48:44] ../src/operator/./rnn-inl.h:507: RNN on GPU is only available for cuDNN at the moment.

RNNTLoss issue with model 2013

Hi @HawkAaron ,

I was trying out your joint network but ran into the argument error for RNNTLoss function. Below is the error message:

Traceback (most recent call last):
File "train_char_wsj.py", line 260, in
train()
File "train_char_wsj.py", line 182, in train
loss = model(xs, ys, xlen, ylen)
File "/home/zichengr/anaconda3/lib/python3.6/site-packages/mxnet-1.5.0-py3.6.egg/mxnet/gluon/block.py", line 540, in call
out = self.forward(*args)
File "/home/zichengr/RNN_T_mxnet/RNN-Transducer/modelX_char_wsj.py", line 87, in forward
loss = self.loss(ytu, ys, xlen, ylen)
File "/home/zichengr/anaconda3/lib/python3.6/site-packages/mxnet-1.5.0-py3.6.egg/mxnet/gluon/block.py", line 540, in call
out = self.forward(*args)
File "/home/zichengr/anaconda3/lib/python3.6/site-packages/mxnet-1.5.0-py3.6.egg/mxnet/gluon/block.py", line 917, in forward
return self.hybrid_forward(ndarray, x, *args, **params)
File "/home/zichengr/RNN_T_mxnet/RNN-Transducer/rnnt_mx.py", line 18, in hybrid_forward
blank_label=self.blank_label)
File "", line 80, in RNNTLoss
File "/home/zichengr/anaconda3/lib/python3.6/site-packages/mxnet-1.5.0-py3.6.egg/mxnet/_ctypes/ndarray.py", line 92, in _imperative_invoke
ctypes.byref(out_stypes)))
File "/home/zichengr/anaconda3/lib/python3.6/site-packages/mxnet-1.5.0-py3.6.egg/mxnet/base.py", line 252, in check_call
raise MXNetError(py_str(_LIB.MXGetLastError()))
mxnet.base.MXNetError: [04:48:42] src/c_api/../imperative/imperative_utils.h:347: Check failed: num_inputs == infered_num_inputs (4 vs. 5) Operator _contrib_RNNTLoss expects 5 inputs, but got 4 instead.

Stack trace returned 10 entries:
[bt] (0) /home/zichengr/incubator-mxnet/lib/libmxnet.so(dmlc::StackTraceabi:cxx11+0x179) [0x7f6604491729]
[bt] (1) /home/zichengr/incubator-mxnet/lib/libmxnet.so(mxnet::imperative::SetNumOutputs(nnvm::Op const*, nnvm::NodeAttrs const&, int const&, int*, int*)+0xdfb) [0x7f6606e6022b]
[bt] (2) /home/zichengr/incubator-mxnet/lib/libmxnet.so(MXImperativeInvokeImpl(void*, int, void**, int*, void***, int, char const**, char const**)+0xd71) [0x7f6606e5cce1]
[bt] (3) /home/zichengr/incubator-mxnet/lib/libmxnet.so(MXImperativeInvokeEx+0x426) [0x7f6606e5eba6]
[bt] (4) /home/zichengr/anaconda3/lib/python3.6/lib-dynload/../../libffi.so.6(ffi_call_unix64+0x4c) [0x7f66787baec0]
[bt] (5) /home/zichengr/anaconda3/lib/python3.6/lib-dynload/../../libffi.so.6(ffi_call+0x22d) [0x7f66787ba87d]
[bt] (6) /home/zichengr/anaconda3/lib/python3.6/lib-dynload/_ctypes.cpython-36m-x86_64-linux-gnu.so(_ctypes_callproc+0x2ce) [0x7f66789cfe2e]
[bt] (7) /home/zichengr/anaconda3/lib/python3.6/lib-dynload/_ctypes.cpython-36m-x86_64-linux-gnu.so(+0x12865) [0x7f66789d0865]
[bt] (8) /home/zichengr/anaconda3/bin/python3(_PyObject_FastCallDict+0x8b) [0x55fc017edd7b]
[bt] (9) /home/zichengr/anaconda3/bin/python3(+0x19e7ce) [0x55fc0187d7ce]

Do you have any clue of where came from?

Thanks!

Dataset and run.sh

Excuse me, could you show how to use the Kaldi-timit scripts in your source code?

where is model2012 from

Hi @HawkAaron ,

I was trying to run the eval.py and realized that no module named 'model2012'. Could you let me know where I can import this module?

Thanks!

Why adding a while loop when decoding on the gpu?

I want to know, why adding a while loo when decoding on the gpu? Now I introduced an embedding layer into the prediction network to replace the one-hot. However it will clapse into the while loop and cannot get out when decoding.

Model 2012 reproduce

Hi, thanks for your work.
The training code is for CTC model ，did you get the rnn-t result (greedy | 20.74) by jsut swiching imported Transducer model with all other hyparameters the same???

hawkaaron / rnn-transducer Goto Github PK

rnn-transducer's Introduction

End-to-End Speech Recognition using RNN-Transducer

File description

Directory description

Reference Paper

Run

Evaluation

Results

Requirements

TODO

rnn-transducer's People

Contributors

Stargazers

Watchers

Forkers

rnn-transducer's Issues

WARNING, path does not exist: KALDI_ROOT=/mnt/matylda5/iveselyk/Tools/kaldi-trunk

(please add 'export KALDI_ROOT=<your_path>' in your $HOME/.profile)

(or run as: KALDI_ROOT=<your_path> python <your_script>.py)

speech@speech-All-Series:/media/speech/speech1/实验/RNN-Transducer-graves2013$ ./feature_transform.sh

Recommend Projects

Recommend Topics

Recommend Org