Giter Club home page Giter Club logo

deepspeech-kabyle's Introduction

Automatic Speech Recognition (ASR) - DeepSpeech - Kabyle

This project develops a working Speech-To-Text module for Kabyle language using Mozilla DeepSpeech, that can be used for any audio processing pipeline. Mozilla DeepSpeech is an open-source automatic speech recognition (ASR) toolkit based on Baidu's Deep Speech research paper. The DeepSpeech project uses Google's TensorFlow to make the implementation easier.

This Readme is written for DeepSpeech v0.8.2. Refer to Mozillla DeepSpeech for lastest updates.

Get Started

Prerequisites

  • A running setup of NVIDIA Docker
  • A host directory with 100 GB at least for training and producing intermediate data
  • Host directory must be writable by trainer user (uid 999) (User defined in the Dockerfile)
  • Kabyle Common Voice dataset kab.tar.gz from https://voice.mozilla.org/kab/datasets inside sources/subdirectory of your host directory

Build the image

docker build -t dskabyle .

Parameters

Parameters for the model:

  • batch_size : ( default 96 ) to specify the number of elements un a batch for training, dev and test dataset
  • n_hidden : ( default 2048 ) to specify the layer width to use when initializing layers
  • epochs : ( default 75 ) to specify the number of epochs to run training for
  • learning_rate : ( default 0.001 ) to define the learning rate of Adam optimizer
  • dropout : ( default 0.05 ) to define the applied dropout rate for feedforward layers
  • lm_alpha: ( default 0.66 ) defines the alpha hyperparameters of the CTC decoder. Language Model weight. Word insertion weight.
  • lm_beta : ( default 1.45 ) define the beta hyperparameters of the CTC decoder
  • beam_width : ( default 500 ) to define the beam width used in the CTC decoder when building candidate transcriptions
  • early_stop : ( default 1 ) to indicate early stop during training to avoid overfitting
  • duplicate_sentence_count : ( default 1 ) to specify the maximum number of times a sentence can appear in the common-voice corpus

These training parameters can always be modified at runtime using Docker environment variables.

Run the image

Should you have got your host directory containing the needed dataset, it is to be mounted when running the Docker image. Il will contain intermediate files, checkpoints, and the final generated model files.

docker run --tty --mount type=bind,src=PATH-TO-HOST-DIRECTORY,dst=/mnt dskabyle

Using environment variables, use the following command to run the image, with a différent number of epochs for instance.

docker run --tty 
--mount type=bind,src=PATH-TO-HOST-DIRECTORY,dst=/mnt 
-e EPOCHS=50
dskabyle

Models and results

Models and training intermediate data are to be found in your host directory.

Subdirectory models/ contains the generated model output_graph.pb. However, such a file needs to be loaded in memory when running inference.

Therefore, additional models in tflite format, to be run on mobile devices, and mmap-able format, to read data directly from disk, are generated as well.

Training intermediate data are kept in checkpoints/ subdirectory. The purpose of checkpoints is to allow interruption and later resume training.

For further information, be pleased to consult DeepSpeech Documentation.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.