Giter Club home page Giter Club logo

automatic-speech-recognition-with-vue's Introduction

总体架构

前端使用Vue,后端使用Falsk,总体架构如下:


流程图

# tree -L 1
├── LICENSE
├── MANIFEST.in
├── README.md
├── README_en.md
├── Vue                          # 前端Vue
├── automatic_speech_recognition # 模型的构建
├── checkpoint                   # 保存checkpoint
├── data                         # 训练数据
├── environment-gpu.yml
├── environment.yml
├── examples
├── flask_server                 # 后端Flask
├── img
├── predict.py                   # 模型预测
├── papers                       # 参考论文
├── run.sh
├── setup.py
├── tests
├── train.py                     # 模型的训练
├── transfer
└── utils

Automatic Speech Recognition

使用tensorflow==2.1.0,预测过程如下所示,注意音频需要是16kHz。

import automatic_speech_recognition as asr

file = 'to/test/sample.wav'  # sample rate 16 kHz, and 16 bit depth
sample = asr.utils.read_audio(file)
pipeline = asr.load('deepspeech2', lang='en')
pipeline.model.summary()     # TensorFlow model
sentences = pipeline.predict([sample])

模型支持英语(感谢Open Seq2Seq)。下表列出了英语基准LibriSpeech dev-clean数据集的评估结果。作为参考,DeepSpeech(Mozilla)的WER约为7.5%,而最新技术(RWTH Aachen University)的WER约为2.3%(最新评估结果可在此处找到)。他们俩都使用外部语言模型来提高结(也就是在Encoder-Decoder之上加一个语言模型)。相比之下,人类在这里达到了5.83%的WER(LibriSpeech dev-clean)

Model Name Decoder WER-dev
deepspeech2 greedy 6.71

模型的训练过程如下所示:

import numpy as np
import tensorflow as tf
import automatic_speech_recognition as asr
# dataset = asr.dataset.Audio.from_csv('data/train.csv', batch_size=32)
# dev_dataset = asr.dataset.Audio.from_csv('dev.csv', batch_size=32)
dataset = asr.dataset.Audio.from_csv('data/TIMIT/timit_test.csv', batch_size=2)
dev_dataset = asr.dataset.Audio.from_csv(
    'data/TIMIT/timit_test.csv', batch_size=2)
alphabet = asr.text.Alphabet(lang='en')
features_extractor = asr.features.FilterBanks(
    features_num=160,
    winlen=0.02,
    winstep=0.01,
    winfunc=np.hanning
)
model = asr.model.get_deepspeech2(
    input_dim=160,
    output_dim=29,
    rnn_units=800,
    is_mixed_precision=False
)
# model.load_weights(
#     'automatic_speech_recognition/load/models/en-deepspeech2-weights-0.1.h5')
optimizer = tf.optimizers.Adam(
    lr=1e-4,
    beta_1=0.9,
    beta_2=0.999,
    epsilon=1e-8
)
decoder = asr.decoder.GreedyDecoder()
pipeline = asr.pipeline.CTCPipeline(
    alphabet, features_extractor, model, optimizer, decoder
)
pipeline.fit(dataset, dev_dataset, epochs=1)
pipeline.save('./checkpoint')

test_dataset = asr.dataset.Audio.from_csv(
    'data/TIMIT/timit_test.csv', batch_size=1)
wer, cer = asr.evaluate.calculate_error_rates(pipeline, test_dataset)
print(f'WER: {wer}   CER: {cer}')

参考

The fundamental repositories:

automatic-speech-recognition-with-vue's People

Contributors

sunlanchang avatar dependabot[bot] avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.