Giter Club home page Giter Club logo

icefall's Introduction

Introduction

The icefall project contains speech-related recipes for various datasets using k2-fsa and lhotse.

You can use sherpa, sherpa-ncnn or sherpa-onnx for deployment with models in icefall; these frameworks also support models not included in icefall; please refer to respective documents for more details.

You can try pre-trained models from within your browser without the need to download or install anything by visiting this huggingface space. Please refer to document for more details.

Installation

Please refer to document for installation.

Recipes

Please refer to document for more details.

ASR: Automatic Speech Recognition

Supported Datasets

More datasets will be added in the future.

Supported Models

The LibriSpeech recipe supports the most comprehensive set of models, you are welcome to try them out.

CTC

  • TDNN LSTM CTC
  • Conformer CTC
  • Zipformer CTC

MMI

  • Conformer MMI
  • Zipformer MMI

Transducer

  • Conformer-based Encoder
  • LSTM-based Encoder
  • Zipformer-based Encoder
  • LSTM-based Predictor
  • Stateless Predictor

Whisper

If you are willing to contribute to icefall, please refer to contributing for more details.

We would like to highlight the performance of some of the recipes here.

This is the simplest ASR recipe in icefall and can be run on CPU. Training takes less than 30 seconds and gives you the following WER:

[test_set] %WER 0.42% [1 / 240, 0 ins, 1 del, 0 sub ]

We provide a Colab notebook for this recipe: Open In Colab

Please see RESULTS.md for the latest results.

test-clean test-other
WER 2.42 5.73

We provide a Colab notebook to test the pre-trained model: Open In Colab

test-clean test-other
WER 6.59 17.69

We provide a Colab notebook to test the pre-trained model: Open In Colab

test-clean test-other
greedy_search 3.07 7.51

We provide a Colab notebook to test the pre-trained model: Open In Colab

test-clean test-other
modified_beam_search (beam_size=4) 2.56 6.27

We provide a Colab notebook to test the pre-trained model: Open In Colab

WER (modified_beam_search beam_size=4 unless further stated)

  1. LibriSpeech-960hr
Encoder Params test-clean test-other epochs devices
Zipformer 65.5M 2.21 4.79 50 4 32G-V100
Zipformer-small 23.2M 2.42 5.73 50 2 32G-V100
Zipformer-large 148.4M 2.06 4.63 50 4 32G-V100
Zipformer-large 148.4M 2.00 4.38 174 8 80G-A100
  1. LibriSpeech-960hr + GigaSpeech
Encoder Params test-clean test-other
Zipformer 65.5M 1.78 4.08
  1. LibriSpeech-960hr + GigaSpeech + CommonVoice
Encoder Params test-clean test-other
Zipformer 65.5M 1.90 3.98
Dev Test
WER 10.47 10.58

Conformer Encoder + Stateless Predictor + k2 Pruned RNN-T Loss

Dev Test
greedy_search 10.51 10.73
fast_beam_search 10.50 10.69
modified_beam_search 10.40 10.51
Dev Test
greedy_search 10.31 10.50
fast_beam_search 10.26 10.48
modified_beam_search 10.25 10.38
test
CER 10.16

We provide a Colab notebook to test the pre-trained model: Open In Colab

test
CER 4.38

We provide a Colab notebook to test the pre-trained model: Open In Colab

WER (modified_beam_search beam_size=4)

Encoder Params dev test epochs
Zipformer 73.4M 4.13 4.40 55
Zipformer-small 30.2M 4.40 4.67 55
Zipformer-large 157.3M 4.03 4.28 56

1 Trained with all subsets:

test
CER 29.08

We provide a Colab notebook to test the pre-trained model: Open In Colab

TEST
PER 19.71%

We provide a Colab notebook to test the pre-trained model: Open In Colab

TEST
PER 17.66%

We provide a Colab notebook to test the pre-trained model: Open In Colab

dev test
modified_beam_search (beam_size=4) 6.91 6.33

We provide a Colab notebook to test the pre-trained model: Open In Colab

dev test
modified_beam_search (beam_size=4) 6.77 6.14

We provide a Colab notebook to test the pre-trained model: Open In Colab

Dev Test
greedy_search 5.53 6.59
fast_beam_search 5.30 6.34
modified_beam_search 5.27 6.33

We provide a Colab notebook to test the pre-trained model: Open In Colab

Dev Test-Net Test-Meeting
greedy_search 7.80 8.75 13.49
fast_beam_search 7.94 8.74 13.80
modified_beam_search 7.76 8.71 13.41

We provide a Colab notebook to test the pre-trained model: Open In Colab

Dev Test-Net Test-Meeting
greedy_search 8.78 10.12 16.16
fast_beam_search 9.01 10.47 16.28
modified_beam_search 8.53 9.95 15.81
Eval Test-Net
greedy_search 31.77 34.66
fast_beam_search 31.39 33.02
modified_beam_search 30.38 34.25

We provide a Colab notebook to test the pre-trained model: Open In Colab

The best results for Chinese CER(%) and English WER(%) respectively (zh: Chinese, en: English):

decoding-method dev dev_zh dev_en test test_zh test_en
greedy_search 7.30 6.48 19.19 7.39 6.66 19.13
fast_beam_search 7.18 6.39 18.90 7.27 6.55 18.77
modified_beam_search 7.15 6.35 18.95 7.22 6.50 18.70

We provide a Colab notebook to test the pre-trained model: Open In Colab

TTS: Text-to-Speech

Supported Datasets

Supported Models

Deployment with C++

Once you have trained a model in icefall, you may want to deploy it with C++ without Python dependencies.

Please refer to the document for how to do this.

We also provide a Colab notebook, showing you how to run a torch scripted model in k2 with C++. Please see: Open In Colab

icefall's People

Contributors

csukuangfj avatar danpovey avatar yaozengwei avatar jinzr avatar marcoyang1998 avatar yfyeung avatar pkufool avatar luomingshuang avatar pzelasko avatar desh2608 avatar glynpu avatar pingfengluo avatar huangruizhe avatar teowenshen avatar wgb14 avatar wangtiance avatar yuekaizhang avatar ezerhouni avatar karelvesely84 avatar teapoly avatar kobenaxie avatar emreozkose avatar rouseabout avatar shanguanma avatar videodanchik avatar waynewiser avatar zhuangweiji avatar yaguanghu avatar shcxlee avatar rickychanhoyin avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.