Giter Club home page Giter Club logo

Comments (4)

github-actions avatar github-actions commented on June 30, 2024

Thanks for the feedback! We are routing this to the appropriate team for follow-up. cc @robch.

from azure-sdk-for-net.

MikeAlhayek avatar MikeAlhayek commented on June 30, 2024

@robch here is how the recording is captured using JavaScript and send to the SignalR hub to call the service above

navigator.mediaDevices.getUserMedia({ audio: true })
    .then(stream => {
        const mediaRecorder = new MediaRecorder(stream, {
            mimeType: "audio/ogg; codecs=opus",
        });

        const subject = new signalR.Subject();

        mediaRecorder.addEventListener("dataavailable", async e => {
            // convert blob to base64 to send to the SignalR as string
            const uint8Array = new Uint8Array(await e.data.arrayBuffer());

            const binaryString = uint8Array.reduce((str, byte) => str + String.fromCharCode(byte), '');

            var base64 = btoa(binaryString);
            subject.next(base64);
        });

        // When the recording stops, complete the request for the SignalR hub so that the StopContinuousRecognitionAsync is called.
        mediaRecorder.addEventListener("stop", () => {
            subject.complete();
        });

        // when the recording starts, send the subject to the SignalR hub/server
        mediaRecorder.addEventListener("start", () => {
            connection.send('UploadStream', sessionId, currentRecordingId, subject);
        });

        // start recording when the record button is clicked
        recordButton.addEventListener("click", () => {
            if (mediaRecorder.state == "recording") {
                mediaRecorder.stop();
            } else {
                mediaRecorder.start(1000);
            }
        });

    }).catch(err => {
        // If the user denies permission to record audio, then display an error.
        console.log('Error: ' + err);
        alert('You must allow Microphone access to use this feature.');
    });

from azure-sdk-for-net.

MikeAlhayek avatar MikeAlhayek commented on June 30, 2024

Alternatively, I tried to use RecognizeOnceAsync() instead of continuous recognizer as you can see in code below. The request times out every time.

public async Task<string> GetTextAsync(Stream stream, AudioInterpreterTextContext context = null)
{
    ArgumentNullException.ThrowIfNull(stream);

    stream.Position = 0;
    byte[] bytes = null;

    if (stream is not MemoryStream memoryStream)
    {
        memoryStream = new MemoryStream();
        await stream.CopyToAsync(memoryStream);
        bytes = memoryStream.ToArray();

        memoryStream.Dispose();
    }

    bytes ??= memoryStream.ToArray();

    var index = FindHeaderEndIndex(bytes);

    if (index > -1)
    {
        stream.Position = index + 1;
    }
    else
    {
        stream.Position = 0;
    }

    var format = AudioStreamFormat.GetCompressedFormat(AudioStreamContainerFormat.OGG_OPUS);

    using var audioStream = AudioInputStream.CreatePushStream(format);

    // Do we have to write the bytes in chunks here? Not sure why can't we do audioStream.Write(bytes) instead of the next 14 lines.
    using (var binaryReader = new BinaryReader(stream, Encoding.UTF8, leaveOpen: true))
    {
        byte[] readBytes;
        do
        {
            readBytes = binaryReader.ReadBytes(_bufferSize);

            if (readBytes.Length == 0)
            {
                break;
            }
            audioStream.Write(readBytes, readBytes.Length);
        } while (readBytes.Length > 0);
    }

    var speechConfig = SpeechConfig.FromSubscription(_options.Key, _options.Region);

    if (context != null && context.Data.TryGetValue("Language", out var lang))
    {
        speechConfig.SpeechRecognitionLanguage = lang?.ToString();
    }

    using var audioConfig = AudioConfig.FromStreamInput(audioStream);
    using var speechRecognizer = new SpeechRecognizer(speechConfig, audioConfig);

    var speechRecognitionResult = await speechRecognizer.RecognizeOnceAsync();

    if (speechRecognitionResult.Reason == ResultReason.RecognizedSpeech)
    {
        return speechRecognitionResult.Text;
    }

    LogErrors(speechRecognitionResult);

    return null;
}

Here is the debug trace for the above approach

024-05-15 08:14:35.1589||||Microsoft.CognitiveServices.Speech|DEBUG|[491387]: 211125ms SPX_TRACE_ERROR:  base_gstreamer.cpp:211 Error from GStreamer: Source: oggdemux
Message: Could not demultiplex stream.
DebugInfo: ../ext/ogg/gstoggdemux.c(4776): gst_ogg_demux_send_event (): /GstPipeline:pipeline/GstOggDemux:oggdemux:
EOS before finding a chain

 
2024-05-15 08:14:35.1589||||Microsoft.CognitiveServices.Speech|DEBUG|[187895]: 211125ms SPX_TRACE_INFO:  blocking_read_write_buffer.h:127 WaitUntilBytesAvailable: available=0; required=3200 writeZero=true ...
 
2024-05-15 08:14:35.1589||||Microsoft.CognitiveServices.Speech|DEBUG|[491387]: 211125ms SPX_TRACE_SCOPE_EXIT:  base_gstreamer.cpp:186 BaseGstreamer::HandleGstMessageError
 
2024-05-15 08:14:35.1589||||Microsoft.CognitiveServices.Speech|DEBUG|[187895]: 211125ms SPX_TRACE_ERROR:  create_object_helpers.h:21 site does not support ISpxObjectFactory
 
2024-05-15 08:14:35.1589||||Microsoft.CognitiveServices.Speech|DEBUG|[187895]: 211125ms SPX_THROW_HR:  create_object_helpers.h:22 hr = 0x14
 
2024-05-15 08:14:36.5001||||Microsoft.CognitiveServices.Speech|DEBUG|[187895]: 212461ms SPX_TRACE_ERROR:  exception.cpp:123 About to throw Exception with an error code: 0x14 (SPXERR_UNEXPECTED_CREATE_OBJECT_FAILURE) 
[CALL STACK BEGIN]

    > audio_config_get_audio_processing_options
    - pal_string_to_wstring
    - pal_string_to_wstring
    - pal_string_to_wstring
    - pal_string_to_wstring
    - pal_string_to_wstring
    - pal_string_to_wstring
    - pal_string_to_wstring
    - pal_string_to_wstring
    - pal_string_to_wstring
    - configthreadlocale
    - BaseThreadInitThunk
    - RtlUserThreadStart

[CALL STACK END]

2024-05-15 08:14:36.5001||||Microsoft.CognitiveServices.Speech|DEBUG|[187895]: 212462ms SPX_DBG_TRACE_SCOPE_ENTER:  thread_service.cpp:45 CSpxThreadService::Term
 
2024-05-15 08:14:36.5001||||Microsoft.CognitiveServices.Speech|DEBUG|[187895]: 212462ms SPX_DBG_TRACE_SCOPE_EXIT:  thread_service.cpp:45 CSpxThreadService::Term
 
2024-05-15 08:14:36.5001||||Microsoft.CognitiveServices.Speech|DEBUG|[187895]: 212462ms SPX_DBG_TRACE_SCOPE_EXIT:  audio_pump.cpp:173 *** AudioPump THREAD stopped! ***
 
2024-05-15 08:14:36.5001||||Microsoft.CognitiveServices.Speech|DEBUG|[187895]: 212462ms SPX_TRACE_ERROR:  audio_pump.cpp:472 [0000025BF2822E90]CSpxAudioPump::PumpThread(): exception caught during pumping, Exception with an error code: 0x14 (SPXERR_UNEXPECTED_CREATE_OBJECT_FAILURE)
 
2024-05-15 08:14:36.5001||||Microsoft.CognitiveServices.Speech|DEBUG|[187895]: 212462ms SPX_TRACE_ERROR:  create_object_helpers.h:21 site does not support ISpxObjectFactory
 
2024-05-15 08:14:36.5001||||Microsoft.CognitiveServices.Speech|DEBUG|[187895]: 212462ms SPX_THROW_HR:  create_object_helpers.h:22 hr = 0x14
 

from azure-sdk-for-net.

MikeAlhayek avatar MikeAlhayek commented on June 30, 2024

I changed the recorder settings for testing purpuses by passing the following object to the navigator.mediaDevices.getUserMedia object.

{
  audio: {
    autoGainControl: false,
    channelCount: 1,
    echoCancellation: false,
    latency: 0,
    noiseSuppression: false,
    sampleSize: 16
}, video: false
}

I am able to write the bites to a local file and I am able to play the file with no problem. Here are the metadata info for the saved file

file_type: OPUS
file_type_extension: opus
mime_type: audio/ogg
opus_version: 1
audio_channels: 1
sample_rate: 16000
output_gain: 1 
codec_name: opus
codec_long_name: Opus (Opus Interactive Audio Codec) 
sample_rate: 48000 
channels: 1
channel_layout: mono 
duration: 4.76
size: 8648
bit_rate: 14534 

Still with no success into converting the audio bytes to test using the SDK.

from azure-sdk-for-net.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.