Giter Club home page Giter Club logo

abc_asr's Introduction

Multi-modal Speech Recognition for ABCS Corpus

This respository is the official implementation of "End-to-End Multi-Modal Speech Recognition on an Air and Bone Conducted Speech Corpus" for TASLP 2023.

Installation

  1. If you just need the module only, run

    pip install espnet
    

    first, and you can use the modules in abc_asr/model.

  2. If you want to do full experiments, you need to correctly install ESPnet and kaldi first. See Installation.

    Next, run

    pip install -r requirements.txt
    

    to install the required packages.

Data Preparation

  1. Download dataset.

    Download the ABCS Corpus here: Links.

    Download the noisy air conducted data (ns_air_data.zip) here: [Onedrive] or [Baidu Cloud]

    Unzip the noisy data into ABCS's directory:

    unzip -d <ABCS dir>/Audio/ ns_air_data.zip
    
  2. Execute the data preparation script.

    For inference only:

    python3 data_prep --dataset_root <ABCS dir> --test
    

    For full experiments:

    python3 data_prep --dataset_root <ABCS dir>
    

Inference

  1. Ensure that kaldi and ESPnet are properly installed on your environment. Next, have correctly adjust the third line in test.sh:

    export ESPNETROOT=<Your Espnet Root>
    
  2. Download the model parameters file here [Onedrive] or [Baidu Cloud]

    mv model.acc.best <Your Path>/abc_asr/results
    
  3. Run

    bash test.sh
    

Results (CER %)

SNR=-5dB SNR=0dB SNR=5dB SNR=10dB SNR=15dB SNR=20dB Clean
The proposed MMT 17.5 14.9 11.8 9.4 7.9 7.1 6.7

TODO

The training pipeline.

Citing

If you found this code helpful, please consider citing it as follows:

@ARTICLE{9961873,
  author={Wang, Mou and Chen, Junqi and Zhang, Xiao-Lei and Rahardja, Susanto},
  journal={IEEE/ACM Transactions on Audio, Speech, and Language Processing}, 
  title={End-to-End Multi-Modal Speech Recognition on an Air and Bone Conducted Speech Corpus}, 
  year={2023},
  volume={31},
  number={},
  pages={513-524},
  keywords={Speech recognition;Speech processing;Signal to noise ratio;Spectrogram;Headphones;Microphones;Synchronization;Speech recognition;multi-modal speech processing;bone conduction;air- and bone-conducted speech corpus},
  doi={10.1109/TASLP.2022.3224305}}

abc_asr's People

Contributors

aaaceo890 avatar

Stargazers

 avatar  avatar echo avatar Mou Wang avatar

Watchers

 avatar

abc_asr's Issues

some questions

When I was executing the step of ''python3 data_prep --dataset_root '', I encountered the following problem. What should I do to solve it? I couldn't find the CDPR in the directory, and there is another issue: thread>=2.0.0 in the requirements. What should I install?Thank you for sharing the code and dataset. I am really interested in this project and hope to replicate it myself. I am a beginner and I hope to receive your answer!

Traceback (most recent call last):
File "data_prep.py", line 12, in
from modules.DataGenerator import BoneConductDataGenerator, BoneConductNSDataGenerator
File "/mnt/e/code/abc_asr-master/preprocessing/modules/DataGenerator/BoneConductDataGenerator.py", line 1, in
from CDPR.modules.DataGenerator.data_generator import DataGenerator as InterFace
ModuleNotFoundError: No module named 'CDPR'

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.