Giter Club home page Giter Club logo

sova-asr's Introduction

SOVA ASR

SOVA ASR is a fast speech recognition solution based on Wav2Letter architecture. It is designed as a REST API service and it can be customized (both code and models) for your needs.

Installation

The easiest way to deploy the service is via docker-compose, so you have to install Docker and docker-compose first. Here's a brief instruction for Ubuntu:

Docker installation

  • Install Docker:
$ sudo apt-get update
$ sudo apt-get install \
    apt-transport-https \
    ca-certificates \
    curl \
    gnupg-agent \
    software-properties-common
$ curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -
$ sudo apt-key fingerprint 0EBFCD88
$ sudo add-apt-repository \
   "deb [arch=amd64] https://download.docker.com/linux/ubuntu \
   $(lsb_release -cs) \
   stable"
$ sudo apt-get update
$ sudo apt-get install docker-ce docker-ce-cli containerd.io
$ sudo usermod -aG docker $(whoami)

In order to run docker commands without sudo you might need to relogin.

  • Install docker-compose:
$ sudo curl -L "https://github.com/docker/compose/releases/download/1.25.5/docker-compose-$(uname -s)-$(uname -m)" -o /usr/local/bin/docker-compose
$ sudo chmod +x /usr/local/bin/docker-compose
  • (Optional) If you're planning on using CUDA run these commands:
$ curl -s -L https://nvidia.github.io/nvidia-container-runtime/gpgkey | \
  sudo apt-key add -
$ distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
$ curl -s -L https://nvidia.github.io/nvidia-container-runtime/$distribution/nvidia-container-runtime.list | \
  sudo tee /etc/apt/sources.list.d/nvidia-container-runtime.list
$ sudo apt-get update
$ sudo apt-get install nvidia-container-runtime

Add the following content to the file /etc/docker/daemon.json:

{
    "runtimes": {
        "nvidia": {
            "path": "nvidia-container-runtime",
            "runtimeArgs": []
        }
    },
    "default-runtime": "nvidia"
}

Restart the service:

$ sudo systemctl restart docker.service

Build and deploy

In order to run service with pretrained models you will have to download http://dataset.sova.ai/SOVA-ASR/data.tar.gz.

  • Clone the repository, download the pretrained models archive and extract the contents into the project folder:
$ git clone --recursive https://github.com/sovaai/sova-asr.git
$ cd sova-asr/
$ wget http://dataset.sova.ai/SOVA-ASR/data.tar.gz
$ tar -xvf data.tar.gz && rm data.tar.gz
  • Build docker image

    • If you're planning on using GPU (it is required for training and can be used for inference): build sova-asr image using the following command:
    $ sudo docker-compose build
    • If you're planning on using CPU only: modify Dockerfile, docker-compose.yml (remove the runtime and environment sections) and config.ini (cpu should be set to 0) and build sova-asr image:
    $ sudo docker-compose build
  • Run web service in a docker container

    $ sudo docker-compose up -d sova-asr

Testing

To test the service you can send a POST request:

$ curl --request POST 'http://localhost:8888/asr' --form 'audio_blob=@"data/test.wav"'

Finetuning acoustic model

If you want to finetune the acoustic model you can set hyperparameters and paths to your own train and validation manifest files and run the training service.

  • Set training options in Train section of config.ini. Train and validation csv manifest files should contain comma-separated audio file paths and reference texts in each line. For instance:
    data/audio/000000.wav,добрый день
    data/audio/000001.wav,как ваши дела
    ...
  • Run training in docker container:
    $ sudo docker-compose up -d sova-asr-train

Customizations

If you want to train your own acoustic model refer to PuzzleLib tutorials. Check KenLM documentation for building your own language model. This repository was tested on Ubuntu 18.04 and has pre-built .so Trie decoder files for Python 3.6 running inside the Docker container, for modifications you can get your own .so files using Wav2Letter++ code for building Python bindings. Otherwise you can use a standard Greedy decoder (set in config.ini).

sova-asr's People

Contributors

sxdxfan avatar zubarevegor avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.