Giter Club home page Giter Club logo

ibm / watson-speech-translator Goto Github PK

View Code? Open in Web Editor NEW
85.0 15.0 42.0 3.13 MB

Use Watson Speech to Text, Language Translator, and Text to Speech in a web app with React components

Home Page: https://developer.ibm.com/technologies/artificial-intelligence/patterns/build-a-real-time-translation-service-with-watson-api-kit

License: Apache License 2.0

JavaScript 94.73% CSS 5.27%
watson-speech speech-translator ibm-cloud-pak language-translation watson-services audio voices react-components microphone vocalization

watson-speech-translator's Introduction

WARNING: This repository is no longer maintained

This repository will not be updated. The repository will be kept available in read-only mode.

Create a language translator app with voice input and output

In this code pattern, we will create a language translator web app. Built with React components and a Node.js server, the app will capture audio input and stream it to a Watson Speech to Text service. As the input speech is transcribed, it will also be sent to a Watson Language Translator service to be translated into the language you select. Both the transcribed and translated text will be displayed by the app in real time. Each completed phrase will be sent to Watson Text to Speech to be spoken in your choice of locale-specific voices.

The best way to understand what is real-time transcription/translation vs. "completed phrase" vocalization is to try it out. You'll notice that the text is updated as words and phrases are completed and become better understood in context. To avoid backtracking or overlapping audio, only completed phrases are vocalized. These are typically short sentences or utterances where a pause indicates a break.

For the best live experience, wear headphones to listen to the translated version of what your microphone is listening to. Alternatively, you can use the toggle buttons to record and transcribe first without translating. When ready, select a language and voice and then enable translation (and speech).

When you have completed this code pattern, you will understand how to:

  • Stream audio to Speech to Text using a WebSocket
  • Use Language Translator with a REST API
  • Retrieve and play audio from Speech to Text using a REST API
  • Integrate Speech to Text, Language Translator, and Text to Speech in a web app
  • Use React components and a Node.js server

NOTE: This code pattern includes instructions for running Watson services on IBM Cloud or with the Watson API Kit on IBM Cloud Pak for Data. Click here for more information about IBM Cloud Pak for Data.

architecture

Flow

  1. User presses the microphone button and captures the input audio.
  2. The audio is streamed to Speech to Text using a WebSocket.
  3. The transcribed text from Speech to Text is displayed and updated.
  4. The transcribed text is sent to Language Translator and the translated text is displayed and updated.
  5. Completed phrases are sent to Text to Speech and the result audio is automatically played.

Steps

  1. Create the Watson services
  2. Deploy the server
  3. Use the web app

Create the Watson services

Note: You can skip this step if you will be using the Deploy to Cloud Foundry on IBM Cloud button below. That option automatically creates the services and binds them (providing their credentials) to the application.

Provision the following services:

  • Speech to Text
  • Language Translator
  • Text to Speech

The instructions will depend on whether you are provisioning services using IBM Cloud Pak for Data or on IBM Cloud.

Click to expand one:

IBM Cloud Pak for Data

Use the following instructions for each of the three services.

Install and provision service instances

The services are not available by default. An administrator must install them on the IBM Cloud Pak for Data platform, and you must be given access to the service. To determine whether the service is installed, Click the Services icon (services_icon) and check whether the service is enabled.

Gather credentials

  1. For production use, create a user to use for authentication. From the main navigation menu (☰), select Administer > Manage users and then + New user.
  2. From the main navigation menu (☰), select My instances.
  3. On the Provisioned instances tab, find your service instance, and then hover over the last column to find and click the ellipses icon. Choose View details.
  4. Copy the URL to use as the {SERVICE_NAME}_URL when you configure credentials.
  5. Optionally, copy the Bearer token to use in development testing only. It is not recommended to use the bearer token except during testing and development because that token does not expire.
  6. Use the Menu and select Users and + Add user to grant your user access to this service instance. This is the user name (and password) you will use when you configure credentials to allow the Node.js server to authenticate.
IBM Cloud

Create the service instances
  • If you do not have an IBM Cloud account, register for a free trial account here.
  • Click here to create a Speech to Text instance.
  • Click here to create a Language Translator instance.
  • Click here to create a Text to Speech instance.
Gather credentials
  1. From the main navigation menu (☰), select Resource list to find your services under Services.
  2. Click on each service to find the Manage view where you can collect the API Key and URL to use for each service when you configure credentials.

Deploy the server

Click on one of the options below for instructions on deploying the Node.js server.

local openshift cf

Use the web app

NOTE: The app was developed using Chrome on macOS. Browser compatibility issues are still being worked out.

watson-speech-translator.gif

  1. Browse to your app URL

    • Use the URL provided at the end of your selected deployment option.
  2. Select a speech recognition model

    • The drop-down will be populated with models supported by your Speech to Text service.
  3. Select an output language and voice

    • The drop-down will only include voices that are supported by your Text to Speech service. The list is also filtered to only show languages that can be translated from the source language using Language Translator.
  4. Use the Speech to Text toggle

    • Use the Speech to Text button (which becomes Stop Listening) to begin recording audio and streaming it to Speech to Text. Press the button again to stop listening/streaming.
  5. Use the Language Translation toggle

    • The Language Translation button (which becomes Stop Translating) is also a toggle. You can leave it enabled to translate while transcribing, or use it after you see the transcribed text that you'd like to translate and say.
  6. Disable Text to Speech

    • By default, the app automatically uses Text to Speech to read the translated output. The checkbox allows you to disable Text to Speech.
  7. Changing the language and voice

    • If you change the voice while language translation is enabled, any current transcribed text will be re-translated (and spoken if enabled).
  8. Resetting the transcribed text

    • The transcribed text will be cleared when you do any of the following:

      • Press Speech to Text to restart listening
      • Refresh the page
      • Change the speech recognition model

License

This code pattern is licensed under the Apache License, Version 2. Separate third-party code objects invoked within this code pattern are licensed by their respective providers pursuant to their own separate licenses. Contributions are subject to the Developer Certificate of Origin, Version 1.1 and the Apache License, Version 2.

Apache License FAQ

watson-speech-translator's People

Contributors

dependabot[bot] avatar markstur avatar rhagarty avatar sanjeevghimire avatar stevemar avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

watson-speech-translator's Issues

Speech recognition model not loading anything

I followed all the steps with no errors in the process. When I click on the app url, the page loads but it gets stuck where the speech recognition model ha to load the models supported by the speech to text service. I checked the speech to text service's activity feed and it says "an instance of the app crashed: APP/PROC/WEB: Exited with status 137 (out of memory)"
I stopped and restarted the Cloud Foundry app but to the same result. The app's instance seems to be using the entire 256 Mb of memory allocated.
Why does is generate the "out of memory" error? How can this issue be solved?
watson-speech-translator

Thank you

Reading the audio from the stream

I would like to use the audio stream which is received over the Websocket as input to the SpeechToText service instead of input from the microphone.

Is that possible and is there any example of how to do that?

Thank you in advance.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.