vimalmanohar / pychain-old Goto Github PK
View Code? Open in Web Editor NEWPyTorch implementation of LF-MMI for End-to-end ASR
PyTorch implementation of LF-MMI for End-to-end ASR
Create a new class (maybe derived from Function
) with the arguments:
A struct containing various options relevant to LF-MMI training:
SparseTensor of size N x N x D, where
N = num states in denominator graph
D = num pdfs
T = chunk size
B = minibatch size
A sparse tensor representation of the transition probabilities of the arcs in the denominator graph. Depending on the implementation, we might require this to be stored as real numbers or in log format.
Tensor of size N x 1
A tensor containing the initial probabilities for the states in the denominator graph. This is also needed for leaky-hmm computation.
Tensor of size D x T x B
A tensor of neural network activations.
Tensor of size D x T x B
Gradients computed by the network. In a python implementation, this will be computed using autograd backward() function.
DataLoader
class to load MFCC features and e2e-format FSTs from disk into a pair of features Tensor and FST and create minibatches.
We can follow the way Deepspeech does this i.e. to use an SCP-like file that has the filepaths so that the features and FSTs can be read at the time of creating the minibatch. This should be implemented such that it can return a single pair (features, fst) read from disk when used with python iterator.
Sampler to create minibatches based on the length of the utterances so that similar length utterances are together.
This needs to be moved to a separate module outside the loss function module.
It has access to a list of utterances sorted by length so that same-size batches can be created easily.
[Note from Dan: we could consider padding with silence or speed-perturbing (like Hossein does) to make sure the utterance lengths are not all distinct. Also, Iโd like to be able to support chunks of utterances, but this is not a hurry right now. You may have to have a mechanism to use different minibatch sizes for different utterance lengths, if you have very variable utterance lengths.]
Implement the denominator computation in C++ instead of python as in #1.
This will involve writing CUDA kernels such as the one already in kaldi (https://github.com/kaldi-asr/kaldi/blob/master/src/chain/chain-denominator.cc and https://github.com/kaldi-asr/kaldi/blob/master/src/chain/chain-kernels.cu). However this might not actually be required if the python version is already fast.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.