Use Amazon IVS in conjunction with Amazon Transcribe to deliver real-time captions for live streams.

License: MIT No Attribution

JavaScript 70.35% Shell 20.05% Dockerfile 3.02% HTML 0.80% CSS 5.79%

amazon-ivs ivs transcribe captions accessibility ivs-lowlatency web lvl-400

amazon-ivs-auto-captions-web-demo's Introduction

Amazon IVS Auto-captions Web demo

A demo web application for demonstrating how you can use Amazon IVS in conjunction with Amazon Transcribe to deliver real-time captions for live streams. This demo also shows how Amazon Translate can be used to deliver auto-translated captions to viewers (optional during deployment).

This project is intended for education purposes only and not for production usage.

This is a serverless web application, leveraging Amazon IVS, Amazon Transcribe, Amazon ECS, Amazon API Gateway, AWS Lambda, Amazon DynamoDB, Amazon S3 and Amazon CloudFront. The web user interface is a single page application built using React.js and the Amazon IVS Player. The demo showcases how you can add real-time live captioning to an Amazon IVS stream using Amazon Transcribe. It also showcases how to configure image overlays to appear on top of the video player based on specific keywords, using TimedMetadata. This demo uses Amazon API Gateway WebSockets to deliver the captions to the connected clients, which are then used as a WebVTT track.

Getting Started

⚠️ Please note that this demo is experimental, and should only be used for educational purposes.

⚠️ Deploying this demo application in your AWS account will create and consume AWS resources, which will cost money.

To get the demo running in your own AWS account, follow these instructions.

If you do not have an AWS account, please see How do I create and activate a new Amazon Web Services account?
Log into the AWS console if you are not already. Note: If you are logged in as an IAM user, ensure your account has permissions to create and manage the necessary resources and components for this application.
Follow the instructions for deploying to AWS.

Deploying to AWS

To deploy this demo, follow the deployment instructions.
Once deployed, to configure (optional), follow the configuration instructions.

Architecture

⚠️ Known issues and limitations

The solution was built for demonstration purposes only and not for production use.
The solution requires streaming to an ECS container instead of directly to Amazon IVS, which may add points of failure and additional latency.
The solution is currently limited to a maximum connected viewers of ~200 (this limitation comes from the captions delivery mechanism, not Amazon IVS). Starting from 200 connected users and up, the execution time of the process to deliver the captions to connected clients increases and causes a timeout in the Lambda function (which is set at 3 seconds), resulting in captions not being delivered at all. A possible alternative approach to overcome this limitation would require replacing the WebSocket infrastructure (built on top of API Gateway, Lambda and DynamoDB) with a custom WebSocket Server implementation running in Amazon ECS and AWS Fargate. Read more here.
The solution's client-side caption syncing mechanism currently relies on an undocumented Player API. This API may be changed or deprecated in the future without notice.
In Firefox, captions may appear very close to the bottom border of the video when there are 4 or more rows of captions.
The solution was only tested in us-west-2 (Oregon) and us-east-1 (N. Virginia) regions. Additional regions may be supported depending on service availability.
You may explore using this demo as an alternative, which has fewer limitations.

Estimated costs

Deploying this solution in your AWS account will create and consume AWS resources, which will cost money.

Below is a table with estimated costs for scenarios with 1, 10, and 100 viewers, each receiving video in 1080p resolution during 1 hour with four translations enabled.

Note: These costs are estimates. Cost may vary depending on multiple factors such as (but not limited to) region, amount of viewers, duration, number of captions in the video, enabling the Translate feature, the number of translations that are activated, etc. Note that the estimated prices are in dollars and do not include taxes.

Service	1 viewer	10 viewers	100 viewers
Amazon Translate	30.78	30.78	30.78
Elastic Container Service	2.27	2.27	2.27
Interactive Video Service	2.15	3.5	17
Transcribe	0.73	0.73	0.73
CloudWatch	0.09	0.09	0.09
DynamoDB	0.02	0.25	2.5
API Gateway	0.02	0.25	2.5
Elastic Container Registry	0.09	0.09	0.09
Lambda	0.00	0.02	0.25
S3	0.00	0.00	0.00
CloudFront	0.00	0.00	0.02
Total estimated cost	36.15	37.98	56.23

About Amazon IVS

Amazon Interactive Video Service (Amazon IVS) is a managed live streaming solution that is quick and easy to set up, and ideal for creating interactive video experiences. Learn more.
Amazon IVS docs
User Guide
API Reference
Learn more about Amazon IVS on IVS.rocks
View more demos like this

amazon-ivs-auto-captions-web-demo's People

Contributors

Stargazers

Watchers

Forkers

barnesfoundation gregpailet marcanovas bimsbaby metromancn andreido kenr000 parone francojung jesamkim achojak wanghaisheng abrosen27 chenggangschool orgchao berkkulaksiz aws-educate-tw lindarr915

amazon-ivs-auto-captions-web-demo's Issues

How to support more user?

In the limitations, it said "Current solution has a maximum limit of 200 users connected at the same time to a given stream with the same captions language selected." If we want to support more users, what should we do? Thanks!

Can't change transcribe to another language

We want to use Chinese transcribe. Basically, people speak Chinese and get transcribe.
Tried to following suggestion to change
"This demo was built with English (en-US) as the language used for auto-captions.
You can change the default language/locale by editing LANGUAGE_CODE, DEFAULT_LANGUAGE_CODE and
MEDIA_SAMPLE_RATE_HERTZ in the constant.js file.

You can see a list of supported languages by Amazon Transcribe here: https://docs.aws.amazon.com/transcribe/latest/dg/supported-languages.html"

Diff is like following.

in ./serverless/transcribe-server/src/constants.js
change
LANGUAGE_CODE: process.env.LANGUAGE_CODE ?? "en-US",
to
LANGUAGE_CODE: process.env.LANGUAGE_CODE ?? "zh-CN",

change
MEDIA_SAMPLE_RATE_HERTZ:
process.env.LANGUAGE_CODE == "en-US" || process.env.LANGUAGE_CODE == "es-US"
to
MEDIA_SAMPLE_RATE_HERTZ:
process.env.LANGUAGE_CODE == "zh-CN" || process.env.LANGUAGE_CODE == "zh-CN"

change
DEFAULT_LANGUAGE_CODE: "en",
to
DEFAULT_LANGUAGE_CODE: "zh",

in ./serverless/translate-server/src/constants.js
change
DEFAULT_SOURCE_LANGUAGE_CODE: "en",
to
DEFAULT_SOURCE_LANGUAGE_CODE: "zh",
in ./deployment/translate-languages.json
add
"English [en]": true,
in ./serverless/transcribe-server/build/task-definition-dev.json
change
"value": "en-US"
to
"value": "zh-CN"
in ./serverless/transcribe-server/build/task-definition-prod.json
change
"value": "en-US"
to
"value": "zh-CN"

But didn't seem to work.
Do I miss any file that need to change but I didn't?
Also tired to just change 1 ./serverless/transcribe-server/src/constants.js, and doesn't work neither.
Do you have any idea how can I make it work with another language?
Thanks a lot!

How to change caption language

Cloudformation cannot create.

run bash deploy.sh error:

console:

aws console:

Failed to build images

Hi team. Please add this line to the docker file for Stream Server

RUN echo "deb http://archive.debian.org/debian stretch main" > /etc/apt/sources.list

Or change the base image please.

Regards!

Lambda runtime needs changing in cloudformation

Lambda runtime is nodejs14.

Works if changed to nodejs20

Audio data from microphone stream to Transcribe service

I'm trying to adapt the transcription example to allow an incoming microphone stream to be forwarded to the Transcribe service but can't work out how to pipe the data from the incoming socket to the AudioStream async iterator. NodeJS code:

io.on('connection', function (socket) {
  console.log(`Client connected [id=${socket.id}]`);
  socket.emit('server-ready', `AWS STT Server ready [id=${socket.id}]`);

  socket.on('stt-start', async function () {
    await startRecognitionStream();
  });

  socket.on('stt-end', function () {
    stopRecognitionStream();
  });

  socket.on('stt-data', function (data) {
    // Incoming microphone audio data
    // How to get this into the transcribe AudioStream???
  });

  const transcribeIterator = async function* () {
    for await (const chunk of ???) {
      console.log(chunk);
      yield {
        AudioEvent: {
          AudioChunk: chunk,
        },
      };
    }
  };

  async function startRecognitionStream() {
    speechClient = new TranscribeStreamingClient({
      region: process.env.AWS_DEFAULT_REGION,
      credentials: {
        accessKeyId: process.env.AWS_ACCESS_KEY_ID,
        secretAccessKey: process.env.AWS_SECRET_ACCESS_KEY,
      },
    });
    
    const startStreamTranscriptionCommand = new StartStreamTranscriptionCommand({
      LanguageCode: 'en-GB',
      MediaSampleRateHertz: 16000,
      MediaEncoding: 'pcm',
      AudioStream: transcribeIterator(),
    });

    const startStreamTranscriptionCommandOutput = await speechClient.send(startStreamTranscriptionCommand);

    for await (const transcriptionEvent of startStreamTranscriptionCommandOutput.TranscriptResultStream) {
      if (transcriptionEvent.TranscriptEvent.Transcript) {
        const results = transcriptionEvent.TranscriptEvent.Transcript.Results;
        console.log(results);
      }
    }
  }

Can anyone help?

Doesn't seem work as this

Hi,

Tried the code. Doesn't seem work as this.
We have some questions.

Looks like the download address for pcre-8.44.tar.gz is not available, need the following changes.
In ./serverless/stream-server/Dockerfile, need to change.
wget https://ftp.pcre.org/pub/pcre/pcre-8.44.tar.gz &&
to
wget https://sourceforge.net/projects/pcre/files/pcre/8.44/pcre-8.44.tar.gz && \
In aws configure, how should we set "Default output format"?
With the change 1 which I listed above. When run "bash deploy.sh", do you have any idea what this error means and how can I fix it?
Waiter StackCreateComplete failed: Waiter encountered a terminal failure state: For expression "Stacks[].StackStatus" we matched expected path: "ROLLBACK_COMPLETE" at least once

Generating environment variables file for Player App...node:internal/fs/utils:344
throw err;
^

Error: ENOENT: no such file or directory, open 'stack.json'
at Object.openSync (node:fs:585:3)
at Object.readFileSync (node:fs:453:35)
at Object. (/Users/tianyu/Downloads/amazon-ivs-auto-captions-web-demo-2.0.2/deployment/generate-player-app-env-vars.js:13:33)
at Module._compile (node:internal/modules/cjs/loader:1101:14)
at Object.Module._extensions..js (node:internal/modules/cjs/loader:1153:10)
at Module.load (node:internal/modules/cjs/loader:981:32)
at Function.Module._load (node:internal/modules/cjs/loader:822:12)
at Function.executeUserEntryPoint [as runMain] (node:internal/modules/run_main:81:12)
at node:internal/main/run_main_module:17:47 {
errno: -2,
syscall: 'open',
code: 'ENOENT',
path: 'stack.json'
}

Thanks a lot!

Is there a discussion group for these examples?

I apologize if this is not the correct place to ask - I glanced at repost and tried googling and nothing obvious appeared.

I'm playing with the code in this example and have some questions about design choices, e.g. using web sockets for the communications between the translate and transcribe servers and the send transcription lambda, instead of say SNS.

I don't want to bother the developers (beyond asking this question) as:

This project an illustrative example and, as pointed out, not a production ready service,
So I don't expect each design choice to have been fully examined,
And I'd rather they create more of these types of things than waste their time defending design choices

But, I also wouldn't mind discussing, why certain things were done certain ways as a way to figure out best practices.

So, can you point me to a discussion group?

Thanks.

Failed with alternative approach using websocket-server

I am using alternative approach using websocket-server.
After creating webSocket-server file , removed unwanted code and modify with websocket-server in cloud formation and related bash file.
I was able to create ECR image but deployment is failing.
getting the below error:

Could anyone, please provide the cloudformation.yaml for an alternative approach?

js v3? can i still use v2?

NOTE: We are formalizing our plans to enter AWS SDK for JavaScript (v2) into maintenance mode in 2023.

Please migrate your code to use AWS SDK for JavaScript (v3).

aws-samples / amazon-ivs-auto-captions-web-demo Goto Github PK