Giter Club home page Giter Club logo

las_mandarin_pytorch's Introduction

LAS_Mandarin_PyTorch

standard-readme compliant

中文说明 | English

This code is a PyTorch implementation for paper: Listen, Attend and Spell, a nice work on End-to-End ASR, Speech Recognition model.

also provides a Chinese Mandarin ASR pretrained model.

  • Dataset
  • Usage
    • generate vocab file
    • training
    • test
    • infer
  • Demo

Listen-Attend-Spell

Google Blog Page

Improving End-to-End Models For Speech Recognition

The LAS architecture consists of 3 components. The listener encoder component, which is similar to a standard AM, takes the a time-frequency representation of the input speech signal, x, and uses a set of neural network layers to map the input to a higher-level feature representation, henc. The output of the encoder is passed to an attender, which uses henc to learn an alignment between input features x and predicted subword units {yn, … y0}, where each subword is typically a grapheme or wordpiece. Finally, the output of the attention module is passed to the speller (i.e., decoder), similar to an LM, that produces a probability distribution over a set of hypothesized words.

Components of the LAS End-to-End Model.

Components of the LAS End-to-End Model.


This repository contains:

  1. model code which implemented the paper.
  2. generate vocab file, you can use to generate your vocab file for your dataset.
  3. training scripts to train the model.
  4. testing scripts to test the model.

Table of Contents


Requirement

pip install -r requirements.txt

Usage

preprocess

First, we should generate our vocab file from dataset's transcripts file. Please reference code in generate_vocab_file.py. If you want train aishell data, you can use generate_vocab_file_aishell.py directly.

python generate_vocab_file_aishell.py --input_file $DATA_DIR/data_aishell/transcript_v0.8.txt --output_file ./aishell_vocab.txt --mode character --vocab_size 5000

it will create a vocab file named aishell_vocab.txt in your folder.

train

Before training, you need to write your dataset code in package dataset.

If you want use my aishell dataset code, you also should take care about the transcripts file path in data/aishell.py line 26:

src_file = "/data/Speech/SLR33/data_aishell/" + "transcript/aishell_transcript_v0.8.txt"

When ready.

Let's train:

python main.py --config ./config/aishell_asr_example_lstm4atthead1.yaml

you can write your config file, please reference config/aishell_asr_example_lstm4atthead1.yaml

specific variables: corpus's path & vocab_file

test

python main.py --config ./config/aishell_asr_example_lstm4atthead1.yaml --test

Pretrained

English

Chinese Mandarin

a pretrained model training on AISHELL-Dataset

download from Google Drive


Demo

inference:

python infer.py

Reference

  1. Listen, Attend and Spell, W Chan et al.
  2. Neural Machine Translation of Rare Words with Subword Units, R Sennrich et al.
  3. Attention-Based Models for Speech Recognition, J Chorowski et al.
  4. Connectionist Temporal Classification: Labelling Unsegmented Sequence Data with Recurrent Neural Networks, A Graves et al.
  5. Joint CTC-Attention based End-to-End Speech Recognition using Multi-task Learning, S Kim et al.
  6. Advances in Joint CTC-Attention based End-to-End Speech Recognition with a Deep CNN Encoder and RNN-LM, T Hori et al.

License

MIT © Kun

las_mandarin_pytorch's People

Contributors

jackaduma avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.