Giter Club home page Giter Club logo

deepspeech-server's Introduction

DeepSpeech Server

https://travis-ci.org/MainRo/deepspeech-server.svg?branch=master

Key Features

This is an http server that can be used to test the Mozilla DeepSpeech project. You need an environment with DeepSpeech and a model to run this server.

Installation

You first need to install deepspeech. Depending on your system you can use the CPU package:

pip3 install deepspeech

Or the GPU package:

pip3 install deepspeech-gpu

Then you can install the deepspeech server:

python3 setup.py install

The server is also available on pypi, so you can install it with pip:

pip3 install deepspeech-server

Note that python 3.5 is the minimum version required to run the server.

Starting the server

deepspeech-server --config config.json

You can use deepspeech without training a model yourself. Pre-trained models are provided by Mozilla in the release page of the project (See the assets section of the release not):

https://github.com/mozilla/DeepSpeech/releases

Once your downloaded a pre-trained model, you can untar it and directly use the sample configuration file:

cp config.sample.json config.json
deepspeech-server --config config.json

Server configuration

The configuration is done with a json file, provided with the "--config" argument. Its structure is the following one:

{
  "deepspeech": {
    "model" :"models/output_graph.pb",
    "alphabet": "models/alphabet.txt",
    "lm": "models/lm.binary",
    "trie": "models/trie",
    "features": {
      "n_features": 26,
      "n_context": 9,
      "beam_width": 500,
      "lm_alpha": 0.75,
      "lm_beta": 1.85
    }
  },
  "server": {
    "http": {
      "host": "0.0.0.0",
      "port": 8080,
      "request_max_size": 1048576
    }
  },
  "log": {
    "level": [
      { "logger": "deepspeech_server", "level": "DEBUG"}
    ]
  }
}

The configuration file contains several sections and sub-sections.

deepspeech section configuration

Section "deepspeech" contains configuration of the deepspeech engine:

model is the protobuf model that was generated by deepspeech

alphabet is the alphabet dictionary (as available in the "data" directory of the DeepSpeech sources).

lm is the language model.

trie is the trie file.

features contains the features settings that have been used to train the model. This field can be set to null to keep the default settings.

Section "server" contains configuration of the access part, with on subsection per protocol:

http section configuration

request_max_size (default value: 1048576, i.e. 1MiB) is the maximum payload size allowed by the server. A received payload size above this threshold will return a "413: Request Entity Too Large" error.

host (default value: "0.0.0.0") is the listen address of the http server.

port (default value: 8080) is the listening port of the http server.

log section configuration

The log section can be used to set the log levels of the server. This section contains a list of log entries. Each log entry contains the name of a logger and its level. Both follow the convention of the python logging module.

Using the server

Inference on the model is done via http post requests. For example with the following curl command:

curl -X POST --data-binary @testfile.wav http://localhost:8080/stt

deepspeech-server's People

Contributors

mainro avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.