guest271314 / speechsynthesisrecorder Goto Github PK

Get audio output from window.speechSynthesis.speak() call as ArrayBuffer, AudioBuffer, Blob, MediaSource, MediaStream, ReadableStream, other object or data types

JavaScript 100.00%

speech-api speech-synthesis mediarecorder audio

speechsynthesisrecorder's Introduction

SpeechSynthesisRecorder.js

Use navigator.mediaDevices.getUserMedia() and MediaRecorder to get audio output from window.speechSynthesis.speak() call as ArrayBuffer, AudioBuffer, Blob, MediaSource, ReadableStream, or other object or data types, see MediaStream, ArrayBuffer, Blob audio result from speak() for recording?.

Install

Add the following script tag

<script type="text/javascript" src="https://unpkg.com/[email protected]/SpeechSynthesisRecorder.js"></script>

or npm install

$ npm install --save speech-synthesis-recorder

Usage

Select Monitor of Built-in Audio Analog Stereo option instead of Built-in Audio Analog Stereo option at navigator.mediaDevices.getUserMedia() prompt.

let ttsRecorder = new SpeechSynthesisRecorder({
  text: "The revolution will not be televised", 
  utteranceOptions: {
    voice: "english-us espeak",
    lang: "en-US",
    pitch: .75,
    rate: 1
  }
});

ArrayBuffer

ttsRecorder.start()
  // `tts` : `SpeechSynthesisRecorder` instance, `data` : audio as `dataType` or method call result
  .then(tts => tts.arrayBuffer())
  .then(({tts, data}) => {
    // do stuff with `ArrayBuffer`, `AudioBuffer`, `Blob`,
    // `MediaSource`, `MediaStream`, `ReadableStream`
    // `data` : `ArrayBuffer`
    tts.audioNode.src = URL.createObjectURL(new Blob([data], {type:tts.mimeType}));
    tts.audioNode.title = tts.utterance.text;
    tts.audioNode.onloadedmetadata = () => {
      console.log(tts.audioNode.duration);
      tts.audioNode.play();
    }
  })

AudioBuffer

ttsRecorder.start()
  .then(tts => tts.audioBuffer())
  .then(({tts, data}) => {
    // `data` : `AudioBuffer`
    let source = tts.audioContext.createBufferSource();
    source.buffer = data;
    source.connect(tts.audioContext.destination);
    source.start()
  })

Blob

ttsRecorder.start()
  .then(tts => tts.blob())
  .then(({tts, data}) => {
    // `data` : `Blob`
    tts.audioNode.src = URL.createObjectURL(blob);
    tts.audioNode.title = tts.utterance.text;
    tts.audioNode.onloadedmetadata = () => {
      console.log(tts.audioNode.duration);
      tts.audioNode.play();
    }
  })

ReadableStream

ttsRecorder.start()
  .then(tts => tts.readableStream())
  .then(({tts, data}) => {
    // `data` : `ReadableStream`
    console.log(tts, data);
    data.getReader().read().then(({value, done}) => {
      tts.audioNode.src = URL.createObjectURL(value[0]);
      tts.audioNode.title = tts.utterance.text;
      tts.audioNode.onloadedmetadata = () => {
        console.log(tts.audioNode.duration);
        tts.audioNode.play();
      }
    })
  })

MediaSource

ttsRecorder.start()
  .then(tts => tts.mediaSource())
  .then(({tts, data}) => {
    console.log(tts, data);
    // `data` : `MediaSource`
    tts.audioNode.srcObj = data;
    tts.audioNode.title = tts.utterance.text;
    tts.audioNode.onloadedmetadata = () => {
      console.log(tts.audioNode.duration);
      tts.audioNode.play();
    }
  })

MediaStream

let ttsRecorder = new SpeechSynthesisRecorder({
  text: "The revolution will not be televised", 
  utternanceOptions: {
    voice: "english-us espeak",
    lang: "en-US",
    pitch: .75,
    rate: 1
  }, 
  dataType:"mediaStream"
});
ttsRecorder.start()
  .then(({tts, data}) => {
    // `data` : `MediaStream`
    // do stuff with active `MediaStream`
  })
  .catch(err => console.log(err))

Demo

plnkr

speechsynthesisrecorder's People

Contributors

Stargazers

Watchers

Forkers

yerkopalma rosspeckomplekt lordkaybanks drewwebster l0s3rc0d3r ahsquared marksutton-fmc waldenn suryatmodulus aahedi beingsane rikutech 514315702 prassein hyuts natasa1806 enteleform-forks nagyist

speechsynthesisrecorder's Issues

Why do we need to use navigator.mediaDevices.getUser() and MediaRecorder to get audio output of window.speechSynthesis.speak()?

Why do we need to use navigator.mediaDevices.getUser() and MediaRecorder() to get audio output of window.speechSynthesis.speak()?

Can the recorded audioOutput be saved to the file system?

Just wondering if there is an option of saving the recording to file system

Stop disable MediaStream from navigator.mediaDevices.getUserMedia() when playback or recording completes

We do no need MediaStream from navigator.mediaDevices.getUserMedia() when MediaRecorder stops recording or when SpeechSynthesisUtterance ended event has been dispatched.

Issue with recorded media at Firefox

Choppy playback of recorded media at Firefox.

Saving audio w/o using speaker and microphone

@guest271314 I don't understand the JS code well.
Is there an easy way to save the output audio file without first playing and then recording?
Also, is there a python wrapper around this library that I could not find?

Not working on latest versions of chrome 71

I see that this does not work on latest version of chrome 71 because chrome 66 onwards, audiocontext() can be called only after user intervention, for example button click. I did that change by adding a button onclick, but then I was hit by DOMException in start method.

audioBuffer() not working in chrome

.then(ab => this.audioContext.decodeAudioData(ab))

Uncaught (in promise) TypeError: Failed to execute 'decodeAudioData' on 'BaseAudioContext': parameter 1 is not of type 'ArrayBuffer'.

'audiooutput' does not mean system audio output

        .then(stream => navigator.mediaDevices.enumerateDevices()
        .then(devices => {
          const audiooutput = devices.find(device => device.kind == "audiooutput");
          stream.getTracks().forEach(track => track.stop())
          if (audiooutput) {
            const constraints = {
              deviceId: {
                exact: audiooutput.deviceId
              }
            };
            return navigator.mediaDevices.getUserMedia({
              audio: constraints
            });
          }
          return navigator.mediaDevices.getUserMedia({
            audio: true
          });
        }))

does not actually select an audio output device https://bugs.chromium.org/p/chromium/issues/detail?id=1114422#c7.

Chromium does not support capture of monitor devices by default

Chromium does not support capture of monitor devices by default https://bugs.chromium.org/p/chromium/issues/detail?id=1114422.

Workarounds https://github.com/guest271314/captureSystemAudio.

Firefox Uncaught (in promise) NavigatorUserMediaError {name: "TrackStartError", message: "", constraintName: ""}

Steps to reproduce:

Call navigator.getUserMedia({audio:true})
Set Monitor Built-in Audio Analog Stereo at RecordStream option at Recording tab of system Sound Settings
Call MediaRecorder with MediaStream from navigator.getUserMedia({audio:true}) call as

Operating System: Linux 4.8.0-54-lowlatency

Firefox version: 53.0.3 (32-bit)

Actual results:

The resulting Blob of recorded media is played at HTMLMediaElement contains reverb and input from system microphone. When the page is refreshed and permission is granted again for user media, error Uncaught (in promise) NavigatorUserMediaError {name: "TrackStartError", message: "", constraintName: ""}.

RecordStream option set to Monitor of Built-in Audio Analog Stereo set at OS connection is removed from OS Sound Settings GUI.

Refresh again repeating steps above results in Firefox closing.

If both Chromium and Firefox are tried using above settings, after Firefox closes Chromium receives error Uncaught (in promise) NavigatorUserMediaError {name: "TrackStartError", message: "", constraintName: ""}.

Expected results:

At Chromium 58 MediaRecorder records the output to speakers, without reverb or input from system microphone, and does not remove option from system Sound Setting or close browser.

https://bugzilla.mozilla.org/show_bug.cgi?id=1373364

Microsoft "Natural" voices are not captured

When setting utteranceOptions.voice to a "Natural" voice, the resulting audio contains only silence.

For example, these are the default voices that exist on an unconfigured installation of Microsoft Edge:

Microsoft Edge Voices

Microsoft David - English (United States)
Microsoft Mark - English (United States)
Microsoft Zira - English (United States)
Microsoft Natasha Online (Natural) - English (Australia)
Microsoft William Online (Natural) - English (Australia)
Microsoft Clara Online (Natural) - English (Canada)
Microsoft Liam Online (Natural) - English (Canada)
Microsoft Sam Online (Natural) - English (Hongkong)
Microsoft Yan Online (Natural) - English (Hongkong)
Microsoft Neerja Online (Natural) - English (India) (Preview)
Microsoft Neerja Online (Natural) - English (India)
Microsoft Prabhat Online (Natural) - English (India)
Microsoft Connor Online (Natural) - English (Ireland)
Microsoft Emily Online (Natural) - English (Ireland)
Microsoft Asilia Online (Natural) - English (Kenya)
Microsoft Chilemba Online (Natural) - English (Kenya)
Microsoft Mitchell Online (Natural) - English (New Zealand)
Microsoft Molly Online (Natural) - English (New Zealand)
Microsoft Abeo Online (Natural) - English (Nigeria)
Microsoft Ezinne Online (Natural) - English (Nigeria)
Microsoft James Online (Natural) - English (Philippines)
Microsoft Rosa Online (Natural) - English (Philippines)
Microsoft Luna Online (Natural) - English (Singapore)
Microsoft Wayne Online (Natural) - English (Singapore)
Microsoft Leah Online (Natural) - English (South Africa)
Microsoft Luke Online (Natural) - English (South Africa)
Microsoft Elimu Online (Natural) - English (Tanzania)
Microsoft Imani Online (Natural) - English (Tanzania)
Microsoft Libby Online (Natural) - English (United Kingdom)
Microsoft Maisie Online (Natural) - English (United Kingdom)
Microsoft Ryan Online (Natural) - English (United Kingdom)
Microsoft Sonia Online (Natural) - English (United Kingdom)
Microsoft Thomas Online (Natural) - English (United Kingdom)
Microsoft Aria Online (Natural) - English (United States)
Microsoft Ana Online (Natural) - English (United States)
Microsoft Christopher Online (Natural) - English (United States)
Microsoft Eric Online (Natural) - English (United States)
Microsoft Guy Online (Natural) - English (United States)
Microsoft Jenny Online (Natural) - English (United States)
Microsoft Michelle Online (Natural) - English (United States)
Microsoft Roger Online (Natural) - English (United States)
Microsoft Steffan Online (Natural) - English (United States)

The first 3 voices record as expected, but none of the subsequent "Natural" voices are captured.

Is there an additional step that must be taken in order for these voices to be captured?

Uncaught TypeError: Failed to set the 'volume' property on 'SpeechSynthesisUtterance': The provided float value is non-finite.

error in Chrome 83.

SpeechSynthesisRecorder.js:45 Uncaught TypeError: Failed to set the 'volume' property on 'SpeechSynthesisUtterance': The provided float value is non-finite.
    at Function.assign (<anonymous>)
    at new SpeechSynthesisRecorder (SpeechSynthesisRecorder.js:45)
    at <anonymous>:1:1

run code.

new SpeechSynthesisRecorder({
    text: 'The revolution will not be televised',
    utteranceOptions: {
        voice: 'english-us espeak',
        lang: 'en-US',
        pitch: 0.75,
        rate: 1,
    },
})
    .start()
    .then((tts) => tts.blob())
    .then(({ tts, data }) => {
        // `data` : `Blob`
        tts.audioNode.src = URL.createObjectURL(data);
        tts.audioNode.title = tts.utterance.text;
        tts.audioNode.onloadedmetadata = () => {
            console.log(tts.audioNode.duration);
            tts.audioNode.play();
        };
    });

Do not add elements to the DOM

Here you are adding an audio element to the DOM. IMO, an API like this should be less opinionated, in fact the whole audioNode property could be dropped.

This is again recording from microphone, not from audiooutput device

Since this was not working on latest chrome 71, I downgraded to chrome 60. I see that this program is recording from microphone instead from speechSynthesis.speak(). I feel the reason is because both audioinput and audiooutput have same deviceId="default". So how can I make it record from speak() ?