Giter Club home page Giter Club logo

google-speech-v2's Introduction

Google Speech API v2:

NOTICE

Google has since launched it's official Google Cloud Speech API. I strongly recommend looking over there.

Host:

https://www.google.com/speech-api/v2/recognize

Parameters

output: json, xml not supported.

lang: any valid locale (en-us, nl-be, fr-fr, etc.)

key: Please get one from the Google Developers Console

Key is not optional.

app: optional

You can specify an optional query string called app, which returns some extra transcripts for some reason.

client: optional, seems to do nothing in particular

Data:

FLAC

Flac file; 44100Hz 32bit float, exported with Audacity. Check the audio folder in this repository for some hilarious examples.

Channels       : 2
Sample Rate    : 44100
Precision      : 32-bit
Sample Encoding: 32-bit Float

16-bit PCM

The following audio options are confirmed working for 16-bit PCM sample encoding:

Channels       : 1
Sample Rate    : 16000
Precision      : 16-bit
Sample Encoding: 16-bit Signed Integer PCM

One-line sox recording command:

rec --encoding signed-integer --bits 16 --channels 1 --rate 16000 test.wav

Headers:

Content-Type:

Content-Type: audio/x-flac; rate=44100;

Set the rate to be equal to the rate of the FLAC file (generally 44100Hz) but it supports different rates.

Content-Type: audio/l16; rate=16000; is also supported with a rate of 44100Hz or 16000Hz for files encoded with LPCM 16-bit signed-integer.

NOTE: Make sure the rate in your header matches the sample rate you used for your audio capture.

User-Agent:

not required, but for spoofing purposes use one of Chrome’s userAgent strings.

Response:

When Google is 100% confident in it's translation, it will return the following object:

{
   "result":[
      {
         "alternative":[
            {
               "transcript":"good morning Google how are you feeling today"
            }
         ],
         "final":true
      }
   ],
   "result_index":0
}

When it's doubtful, it adds a confidence parameter for you. It also seems to add multiple transcripts for some reason.

{
  "result":[
    {
      "alternative":[
        {
          "transcript":"this is a test",
          "confidence":0.97321892
        },
        {
          "transcript":"this is a test for"
        }
      ],
      "final":true
    }
  ],
  "result_index":0
}

Example

Install sox

On OS X with Homebrew installed:

brew install sox

Record audio

rec --encoding signed-integer --bits 16 --channels 1 --rate 16000 test.wav

Send the request

curl -X POST \
--data-binary @'audio/hello (16bit PCM).wav' \
--header 'Content-Type: audio/l16; rate=16000;' \
'https://www.google.com/speech-api/v2/recognize?output=json&lang=en-us&key=yourkey'

Or for FLAC encoded audio:

curl -X POST \
--data-binary @audio/good-morning-google.flac \
--header 'Content-Type: audio/x-flac; rate=44100;' \
'https://www.google.com/speech-api/v2/recognize?output=json&lang=en-us&key=yourkey'

Caveats

Here are a few caveats you have to know about, should you decide to use this API in a production environment. (I don't recommend it)

  • The API only accepts up to ~10-15 seconds of audio.
  • Generating your own Speech API Key, you can only make 50 requests per day.

google-speech-v2's People

Contributors

gillesdemey avatar bryant1410 avatar

Watchers

James Cloos avatar kkuno avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.