Giter Club home page Giter Club logo

vistt's Introduction

Vietnamese Speech-to-Text 😄

Introduction

In this project, I aimed to develop an end-to-end automatic speech recognition system in which I used diffenent frameworks such as Pytorch-lightning. Flask, Docker, Uwsgi, Nginx, and AWS service to deploy and extend the project into a simple production scenario. Also, I focused on Vietnamese language but other ones are easily modified and used.

About the model, here I used several well-known ones(Deepspeech, Conformer CTC and RNN-Transducer). As usual, I'd love to write some notes about the models I used 🙋. But now I don't have much time, so I update serveral days later...

Setup

Data preparation

The project used VIVOs, which is a widely-used Vietnamese public dataset. Download it and put it into ./local/data:

└───VASR
    ├───local
    │   ├───data
    │   ├───ckpts
    │   ├───configs
    │   ├───outputs
    │   │   └───version_3        
    │   │       └───checkpoints  
    │   ├───src
    │   │   ├───augs
    │   │   ├───datasets
    │   │   ├───engine
    │   │   ├───losses
    │   │   ├───metrics
    │   │   ├───models
    │   │   │   ├───conformer
    │   │   │   │   └───base
    │   │   │   └───deepspeech
    │   │   ├───optimizers
    │   │   └───utils
    │   └───tools
    └───server
        ├───flask
        │   └───templates
        └───nginx

Checkpoints

All the needed checkpoints can be found [here](Not now bro). You can download and put them into the project as the folder architecture above.

EC2 service

First, if you are new to AWS, please create an account and access to the EC2 service, then you launch a new instance and choose the instance type and suitable resource at the same time. Then create a new key pair for logging into the server you've just chosen. After all, you will see an new dashboard showing up and you shoud copy the public DNS then run on terminal:

chmod 400 speech_recognition.pem
scp -i speech_recognition.pem -r server/ ubuntu@<paste here>:~

Now you run: ssh -i speech_recognition.pem ubuntu@<paste here> to log into the server!

Tools

Several tools are necessary to run the final web app, just follow the instruction:

Denoise

Due to the cleanliness of the VIVOs dataset, it's hard to apply the trained model for noisy datasets and even real-life cases and therefore it needs to be added a denoising tool. I'm developing it which is seperately implemented with this project and then utilizing it through a public api ... Should be done soon.

Train and Evaluate

Modify the config file and simply just run:

export PYTHONPATH=/home/path_to_project/VASR/local
python3 tools/train.py

Manage antifacts

Tensorboard was used here:

tensorboard --logdir <path_to_log_folder> --load_fast true

Web Demo

Finally, after you log into the server, you just need to run:

chomd +x init.sh
./init.sh

Future works

Lots of works should be done later, it will take time haha:

  • Develop denoising tool
  • Enjoy the benefits of semi-supervised learning
  • Build speech augmentation modules
  • Experience more approaches such as fine-tuning wav2vec based and
  • Develop better web user interface, new branch with streamlit

References

Many thanks to the authors of the related papers and great implementations that I mainly based on:

vistt's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

Forkers

thientc

vistt's Issues

Demo

A promising project! Do you have any demo yet?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.