Giter Club home page Giter Club logo

speech-transformer-pytorch_lightning's Introduction

End2End chinese-english code-switch speech recognition in pytorch


## This is a mixed project borrowing from many awesome projects opened recently.
With pytorch-lightning, experiments can be carried out easily.
And i will try to make evey calculation in a batched and cleaned way.
(such as add bos & eos into batched target and spec augment) Any ideas can be put into the issues,
and welcome for discussion. (This project is still being building and reorganizing)

project features:

    joint attention & ctc beam search decode with rnn lm
    multi dataset
    using pytorch lightning for 16bit training
    Chinese-char level & English-word level tokenizer
    sentence piece tokenizer for english tokenizing
    rnn_lm training
    label smoothing
    customized transformer encoder and decoder see: src/model/modules/transformer_encoder...
    *rezero transformer for some converge problem with half precision and speed consideration

feature:

    log fbank with sub sample
    speed augment
    a spec augment using gpu as a layer in model
    customized feature filtering , see src/loader/utils/build_fbank remove_empty_line_2d

optimizer:

    Ranger

model:

    rezero transformer
    restricted encoder field
    better mask  (may be a little slower than other project but effective)

loss:

    lambda * ce loss + (1-lambda * ctc loss) + code switch loss

requirement:

    see docker/


references:

    https://github.com/ZhengkunTian/OpenTransformer
    https://github.com/espnet/espnet
    https://github.com/jadore801120/attention-is-all-you-need-pytorch
    https://github.com/alphadl/lookahead.pytorch
    https://github.com/LiyuanLucasLiu/RAdam
    https://github.com/vahidk/tfrecord
    https://github.com/kaituoxu/Speech-Transformer
    https://github.com/majumderb/rezero
    https://github.com/lessw2020/Ranger-Deep-Learning-Optimizer



data:
    aishell1 170h
    aishell2 1000h
    magic data 750h
    prime 100h not used
    stcmd 100h not used
    datatang 200h
    datatang 500h
    datatang mix 200h
    librispeech 960h

train step

    english -> eng(sub) + mix + chinese -> chinese + mix -> mix

speech-transformer-pytorch_lightning's People

Contributors

tongjinle123 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

speech-transformer-pytorch_lightning's Issues

wer/cer

Great work!
I'd appreciate it if you could tell me the wer/cer of this project on the various mainstream corpuses.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.