Giter Club home page Giter Club logo

selfservicekiosk-audio-streaming's Introduction

License

Google Cloud / Dialogflow - Self Service Kiosk Demo

Open in Cloud Shell

A best practice for streaming audio from a browser microphone to Dialogflow or Google Cloud STT by using websockets.

Airport SelfService Kiosk demo, to demonstrate how microphone streaming to GCP works, from a web application.

It makes use of the following GCP resources:

  • Dialogflow & Knowledge Bases
  • Speech to Text
  • Text to Speech
  • Translate API
  • (optionally) App Engine Flex

In this demo, you can start recording your voice, it will display answers on a screen and synthesize the speech.

alt text

alt text

Live demo

A working demo can be found here: http://selfservicedesk.appspot.com/

Blog posts

I wrote very extensive blog articles on how to setup your streaming project. Want to exactly learn how this code works? Have a start here:

Blog 1: Introduction to the GCP conversational AI components, and integrating your own voice AI in a web app.
Blog 2: Building a client-side web application which streams audio from a browser microphone to a server.
Blog 3: Building a web server which receives a browser microphone stream and uses Dialogflow or the Speech to Text API for retrieving text results.
Blog 4: Getting Audio Data from Text (Text to Speech) and play it in your browser.

Slides & Video

There's a presentation and a video that accompanies the tutorial.

Slidedeck AudioStreaming

Setup Local Environment

Get a Node.js environment

  1. apt-get install nodejs -y

  2. apt-get npm

Get an Angular environment

  1. sudo npm install -g @angular/cli

Clone Repo

  1. git clone https://github.com/dialogflow/selfservicekiosk-audio-streaming.git selfservicekiosk

  2. Set the PROJECT_ID variable: export PROJECT_ID=[gcp-project-id]

  3. Set the project: gcloud config set project $PROJECT_ID

  4. Download the service account key.

  5. Assign the key to environment var: GOOGLE_APPLICATION_CREDENTIALS

LINUX/MAC export GOOGLE_APPLICATION_CREDENTIALS=/path/to/service_account.json WIN set GOOGLE_APPLICATION_CREDENTIALS=c:\path\to\service_account.json

  1. Login: gcloud auth login

  2. Open server/env.txt, change the environment variables and rename the file to server/.env

  3. Enable APIs:

 gcloud services enable \
 appengineflex.googleapis.com \
 containerregistry.googleapis.com \
 cloudbuild.googleapis.com \
 cloudtrace.googleapis.com \
 dialogflow.googleapis.com \
 logging.googleapis.com \
 monitoring.googleapis.com \
 sourcerepo.googleapis.com \
 speech.googleapis.com \
 mediatranslation.googleapis.com \
 texttospeech.googleapis.com \
 translate.googleapis.com
  1. Build the client-side Angular app:

    cd client && sudo npm install
    npm run-script build
    
  2. Start the server Typescript app, which is exposed on port 8080:

    cd ../server && sudo npm install
    npm run-script watch
    
  3. Browse to http://localhost:8080

Setup Dialogflow

  1. Create a Dialogflow agent at: http://console.dialogflow.com

  2. Zip the contents of the dialogflow folder, from this repo.

  3. Click settings > Import, and upload the Dialogflow agent zip, you just created.

  4. Caution: Knowledge connector settings are not currently included when exporting, importing, or restoring agents.

    Make sure you have enabled Beta features in settings.

    1. Select Knowledge from the left menu.
    2. Create a Knowledge Base: Airports
    3. Add the following Knowledge Base FAQs, as text/html documents:
    1. As a response it requires the following custom payload:
    {
    "knowledgebase": true,
    "QUESTION": "$Knowledge.Question[1]",
    "ANSWER": "$Knowledge.Answer[1]"
    }
    
    1. And to make the Text to Speech version of the answer working add the following Text SSML response:
    $Knowledge.Answer[1]
    

Deploy with App Engine Flex

This demo makes heavy use of websockets and the microphone getUserMedia() HTML5 API requires to run over HTTPS. Therefore, I deploy this demo with a custom runtime, so I can include my own Dockerfile.

  1. Edit the app.yaml to tweak the environment variables. Set the correct Project ID.

  2. Deploy with: gcloud app deploy

  3. Browse: gcloud app browse

Examples

The selfservice kiosk is a full end to end application. To showcase smaller examples, I've created 6 small demos. Here's how you can get these running:

  1. Install the required libraries, run the following command from the examples folder:

    npm install

  2. Start the simpleserver node app:

    npm --EXAMPLE=1 --PORT=8080 --PROJECT_ID=[your-gcp-project-id] run start

To switch to the various examples, edit the EXAMPLE variable to one of these:

  • Example 1: Dialogflow Speech Intent Detection
  • Example 2: Dialogflow Speech Detection through streaming
  • Example 3: Dialogflow Speech Intent Detection with Text to Speech output
  • Example 4: Speech to Text Transcribe Recognize Call
  • Example 5: Speech to Text Transcribe Streaming Recognize
  • Example 6: Text to Speech in a browser
  1. Browse to http://localhost:8080. Open the inspector, to preview the Dialogflow results object.

The code required for these examples can be found in simpleserver.js for the different Dialogflow & STT calls. - example1.html - example5.html will show the client-side implementations.

License

Apache 2.0

This is not an official Google product.

selfservicekiosk-audio-streaming's People

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.