Giter Club home page Giter Club logo

speech-transformer-1's Introduction

Speech Transformer

Introduction

This is a PyTorch re-implementation of Speech-Transformer: A No-Recurrence Sequence-to-Sequence Model for Speech Recognition.

Dataset

Aishell is an open-source Chinese Mandarin speech corpus published by Beijing Shell Shell Technology Co.,Ltd.

400 people from different accent areas in China are invited to participate in the recording, which is conducted in a quiet indoor environment using high fidelity microphone and downsampled to 16kHz. The manual transcription accuracy is above 95%, through professional speech annotation and strict quality inspection. The data is free for academic use. We hope to provide moderate amount of data for new researchers in the field of speech recognition.

@inproceedings{aishell_2017,
  title={AIShell-1: An Open-Source Mandarin Speech Corpus and A Speech Recognition Baseline},
  author={Hui Bu, Jiayu Du, Xingyu Na, Bengu Wu, Hao Zheng},
  booktitle={Oriental COCOSDA 2017},
  pages={Submitted},
  year={2017}
}

In data folder, download speech data and transcripts:

$ wget http://www.openslr.org/resources/33/data_aishell.tgz

Performance

Evaluate with 7176 audios in Aishell test set:

$ python test.py

Results

Model CER Download
Speech Transformer 11.5 Link

Dependency

  • Python 3.6.8
  • PyTorch 1.3.0

Usage

Data Pre-processing

Extract data_aishell.tgz:

$ python extract.py

Extract wav files into train/dev/test folders:

$ cd data/data_aishell/wav
$ find . -name '*.tar.gz' -execdir tar -xzvf '{}' \;

Scan transcript data, generate features:

$ python pre_process.py

Now the folder structure under data folder is sth. like:

data/
    data_aishell.tgz
    data_aishell/
        transcript/
            aishell_transcript_v0.8.txt
        wav/
            train/
            dev/
            test/
    aishell.pickle

Train

$ python train.py

If you want to visualize during training, run in your terminal:

$ tensorboard --logdir runs

Demo

Please download the pretrained model then run:

$ python demo.py

It picks 10 random test examples and recognize them like these:

Audio Out GT
audio_0.wav 我国的经济处在爬破过凯的重要公考
我国的经济处在爬破过凯的重要公口
我国的经济处在盘破过凯的重要公考
我国的经济处在爬破过凯的重要公靠
我国的经济处在爬坡过凯的重要公考
我国的经济处在爬坡过坎的重要关口
audio_1.wav 完善主地承包经一全流市市场
完善主地承包经一全六市市场
完善主地承包经营全流市市场
完善主地承包经一权流市市场
完善主地承包经营全六市市场
完善土地承包经营权流转市场
audio_2.wav 临长各类设施使用年限
严长各类设施使用年限
延长各类设施使用年限
很长各类设施使用年限
难长各类设施使用年限
延长各类设施使用年限
audio_3.wav 苹果此举是为了节约用电量
苹果此举是是了节约用电量
苹果此举是为了解约用电量
苹果此举是为了节约用电令
苹果此举只为了节约用电量
苹果此举是为了节约用电量
audio_4.wav 反他们也可以有机会参与体育运动
让他们也可以有机会参与体育运动
反她们也可以有机会参与体育运动
范他们也可以有机会参与体育运动
但他们也可以有机会参与体育运动
让他们也可以有机会参与体育运动
audio_5.wav 陈言希穿着粉色上衣
陈闫希穿着粉色上衣
陈延希穿着粉色上衣
陈言琪穿着粉色上衣
陈演希穿着粉色上衣
陈妍希穿着粉色上衣
audio_6.wav 说起自己的伴女大下
说起自己的伴理大下
说起自己的半女大下
说起自己的办女大下
说起自己的半理大下
说起自己的伴侣大侠
audio_7.wav 每日经济新闻记者注意到
每日经济新闻记者朱意到
每日经济新闻记者注一到
每日经济新闻记者注注到
每日经济新闻记者注以到
每日经济新闻记者注意到
audio_8.wav 这是今年五月份以来库存环比增幅幅小了一次
这是今年五月份以来库存环比增幅最小了一次
这是今年五月份以来库存环比增幅幅小的一次
这是今年五月份以来库存环比增幅最小的一次
这是今年五月份以来库存环比增幅幅小小一次
这是今年五月份以来库存环比增幅最小的一次
audio_9.wav 一个人的精使生命就将走向摔老
一个连的精使生命就将走向摔老
一个人的金使生命就将走向摔老
一个人的坚使生命就将走向摔老
一个连的金使生命就将走向摔老
一个人的精神生命就将走向衰老

speech-transformer-1's People

Contributors

foamliu avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.