Giter Club home page Giter Club logo

rnn-transducer's Introduction

End-to-End Speech Recognition using RNN-Transducer

File description

  • eval.py: rnnt joint model decode
  • model.py: rnnt model, which contains acoustic / phoneme model
  • model2012.py: rnnt model refer to Graves2012
  • seq2seq/*: seq2seq with attention
  • rnnt_np.py: rnnt loss function implementation on mxnet, support for both symbol and gluon refer to PyTorch implementation
  • DataLoader.py: data process
  • train.py: rnnt training script, can be initialized from CTC and PM model
  • train_ctc.py: ctc training script
  • train_att.py: attention training script

Directory description

  • conf: kaldi feature extraction config

Reference Paper

Run

  • Compile RNNT Loss Follow the instructions in here to compile MXNET with RNNT loss.

  • Extract feature link kaldi timit example dirs (local steps utils ) excute run.sh to extract 40 dim fbank feature run feature_transform.sh to get 123 dim feature as described in Graves2013

  • Train RNNT model:

python train.py --lr 1e-3 --bi --dropout .5 --out exp/rnnt_bi_lr1e-3 --schedule

Evaluation

Default only for RNNT

  • Greedy decoding:
python eval.py <path to best model parameters> --bi
  • Beam search:
python eval.py <path to best model parameters> --bi --beam <beam size>

Results

  • CTC

    Decode PER
    greedy 20.36
    beam 100 20.03
  • Transducer

    Decode PER
    greedy 20.74
    beam 40 19.84

Requirements

  • Python 3.6
  • MxNet 1.1.0
  • numpy 1.14

TODO

  • beam serach accelaration
  • Seq2Seq with attention

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.