Type of issue Code doesn't work Deion</h3

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

I provided a repo for this issue <a class="issue-link js-issue-link" data-error-text="

Unable to use UGG OPUS stream in a continuous recognition using C# SDK about azure-sdk-for-net HOT 5 OPEN

MikeAlhayek commented on September 23, 2024

Unable to use UGG OPUS stream in a continuous recognition using C# SDK

from azure-sdk-for-net.

Comments (5)

github-actions commented on September 23, 2024

Thanks for the feedback! We are routing this to the appropriate team for follow-up. cc @robch.

from azure-sdk-for-net.

MikeAlhayek commented on September 23, 2024

@robch here is how the recording is captured using JavaScript and send to the SignalR hub to call the service above

navigator.mediaDevices.getUserMedia({ audio: true })
    .then(stream => {
        const mediaRecorder = new MediaRecorder(stream, {
            mimeType: "audio/ogg; codecs=opus",
        });

        const subject = new signalR.Subject();

        mediaRecorder.addEventListener("dataavailable", async e => {
            // convert blob to base64 to send to the SignalR as string
            const uint8Array = new Uint8Array(await e.data.arrayBuffer());

            const binaryString = uint8Array.reduce((str, byte) => str + String.fromCharCode(byte), '');

            var base64 = btoa(binaryString);
            subject.next(base64);
        });

        // When the recording stops, complete the request for the SignalR hub so that the StopContinuousRecognitionAsync is called.
        mediaRecorder.addEventListener("stop", () => {
            subject.complete();
        });

        // when the recording starts, send the subject to the SignalR hub/server
        mediaRecorder.addEventListener("start", () => {
            connection.send('UploadStream', sessionId, currentRecordingId, subject);
        });

        // start recording when the record button is clicked
        recordButton.addEventListener("click", () => {
            if (mediaRecorder.state == "recording") {
                mediaRecorder.stop();
            } else {
                mediaRecorder.start(1000);
            }
        });

    }).catch(err => {
        // If the user denies permission to record audio, then display an error.
        console.log('Error: ' + err);
        alert('You must allow Microphone access to use this feature.');
    });

from azure-sdk-for-net.

MikeAlhayek commented on September 23, 2024

Alternatively, I tried to use RecognizeOnceAsync() instead of continuous recognizer as you can see in code below. The request times out every time.

public async Task<string> GetTextAsync(Stream stream, AudioInterpreterTextContext context = null)
{
    ArgumentNullException.ThrowIfNull(stream);

    stream.Position = 0;
    byte[] bytes = null;

    if (stream is not MemoryStream memoryStream)
    {
        memoryStream = new MemoryStream();
        await stream.CopyToAsync(memoryStream);
        bytes = memoryStream.ToArray();

        memoryStream.Dispose();
    }

    bytes ??= memoryStream.ToArray();

    var index = FindHeaderEndIndex(bytes);

    if (index > -1)
    {
        stream.Position = index + 1;
    }
    else
    {
        stream.Position = 0;
    }

    var format = AudioStreamFormat.GetCompressedFormat(AudioStreamContainerFormat.OGG_OPUS);

    using var audioStream = AudioInputStream.CreatePushStream(format);

    // Do we have to write the bytes in chunks here? Not sure why can't we do audioStream.Write(bytes) instead of the next 14 lines.
    using (var binaryReader = new BinaryReader(stream, Encoding.UTF8, leaveOpen: true))
    {
        byte[] readBytes;
        do
        {
            readBytes = binaryReader.ReadBytes(_bufferSize);

            if (readBytes.Length == 0)
            {
                break;
            }
            audioStream.Write(readBytes, readBytes.Length);
        } while (readBytes.Length > 0);
    }

    var speechConfig = SpeechConfig.FromSubscription(_options.Key, _options.Region);

    if (context != null && context.Data.TryGetValue("Language", out var lang))
    {
        speechConfig.SpeechRecognitionLanguage = lang?.ToString();
    }

    using var audioConfig = AudioConfig.FromStreamInput(audioStream);
    using var speechRecognizer = new SpeechRecognizer(speechConfig, audioConfig);

    var speechRecognitionResult = await speechRecognizer.RecognizeOnceAsync();

    if (speechRecognitionResult.Reason == ResultReason.RecognizedSpeech)
    {
        return speechRecognitionResult.Text;
    }

    LogErrors(speechRecognitionResult);

    return null;
}

Here is the debug trace for the above approach

024-05-15 08:14:35.1589||||Microsoft.CognitiveServices.Speech|DEBUG|[491387]: 211125ms SPX_TRACE_ERROR:  base_gstreamer.cpp:211 Error from GStreamer: Source: oggdemux
Message: Could not demultiplex stream.
DebugInfo: ../ext/ogg/gstoggdemux.c(4776): gst_ogg_demux_send_event (): /GstPipeline:pipeline/GstOggDemux:oggdemux:
EOS before finding a chain

 
2024-05-15 08:14:35.1589||||Microsoft.CognitiveServices.Speech|DEBUG|[187895]: 211125ms SPX_TRACE_INFO:  blocking_read_write_buffer.h:127 WaitUntilBytesAvailable: available=0; required=3200 writeZero=true ...
 
2024-05-15 08:14:35.1589||||Microsoft.CognitiveServices.Speech|DEBUG|[491387]: 211125ms SPX_TRACE_SCOPE_EXIT:  base_gstreamer.cpp:186 BaseGstreamer::HandleGstMessageError
 
2024-05-15 08:14:35.1589||||Microsoft.CognitiveServices.Speech|DEBUG|[187895]: 211125ms SPX_TRACE_ERROR:  create_object_helpers.h:21 site does not support ISpxObjectFactory
 
2024-05-15 08:14:35.1589||||Microsoft.CognitiveServices.Speech|DEBUG|[187895]: 211125ms SPX_THROW_HR:  create_object_helpers.h:22 hr = 0x14
 
2024-05-15 08:14:36.5001||||Microsoft.CognitiveServices.Speech|DEBUG|[187895]: 212461ms SPX_TRACE_ERROR:  exception.cpp:123 About to throw Exception with an error code: 0x14 (SPXERR_UNEXPECTED_CREATE_OBJECT_FAILURE) 
[CALL STACK BEGIN]

    > audio_config_get_audio_processing_options
    - pal_string_to_wstring
    - pal_string_to_wstring
    - pal_string_to_wstring
    - pal_string_to_wstring
    - pal_string_to_wstring
    - pal_string_to_wstring
    - pal_string_to_wstring
    - pal_string_to_wstring
    - pal_string_to_wstring
    - configthreadlocale
    - BaseThreadInitThunk
    - RtlUserThreadStart

[CALL STACK END]

2024-05-15 08:14:36.5001||||Microsoft.CognitiveServices.Speech|DEBUG|[187895]: 212462ms SPX_DBG_TRACE_SCOPE_ENTER:  thread_service.cpp:45 CSpxThreadService::Term
 
2024-05-15 08:14:36.5001||||Microsoft.CognitiveServices.Speech|DEBUG|[187895]: 212462ms SPX_DBG_TRACE_SCOPE_EXIT:  thread_service.cpp:45 CSpxThreadService::Term
 
2024-05-15 08:14:36.5001||||Microsoft.CognitiveServices.Speech|DEBUG|[187895]: 212462ms SPX_DBG_TRACE_SCOPE_EXIT:  audio_pump.cpp:173 *** AudioPump THREAD stopped! ***
 
2024-05-15 08:14:36.5001||||Microsoft.CognitiveServices.Speech|DEBUG|[187895]: 212462ms SPX_TRACE_ERROR:  audio_pump.cpp:472 [0000025BF2822E90]CSpxAudioPump::PumpThread(): exception caught during pumping, Exception with an error code: 0x14 (SPXERR_UNEXPECTED_CREATE_OBJECT_FAILURE)
 
2024-05-15 08:14:36.5001||||Microsoft.CognitiveServices.Speech|DEBUG|[187895]: 212462ms SPX_TRACE_ERROR:  create_object_helpers.h:21 site does not support ISpxObjectFactory
 
2024-05-15 08:14:36.5001||||Microsoft.CognitiveServices.Speech|DEBUG|[187895]: 212462ms SPX_THROW_HR:  create_object_helpers.h:22 hr = 0x14

from azure-sdk-for-net.

MikeAlhayek commented on September 23, 2024

I changed the recorder settings for testing purpuses by passing the following object to the navigator.mediaDevices.getUserMedia object.

{
  audio: {
    autoGainControl: false,
    channelCount: 1,
    echoCancellation: false,
    latency: 0,
    noiseSuppression: false,
    sampleSize: 16
}, video: false
}

I am able to write the bites to a local file and I am able to play the file with no problem. Here are the metadata info for the saved file

file_type: OPUS
file_type_extension: opus
mime_type: audio/ogg
opus_version: 1
audio_channels: 1
sample_rate: 16000
output_gain: 1 
codec_name: opus
codec_long_name: Opus (Opus Interactive Audio Codec) 
sample_rate: 48000 
channels: 1
channel_layout: mono 
duration: 4.76
size: 8648
bit_rate: 14534

Still with no success into converting the audio bytes to test using the SDK.

from azure-sdk-for-net.

MikeAlhayek commented on September 23, 2024

I provided a repo for this issue Azure-Samples/cognitive-services-speech-sdk#2387 . Also, this repo can provide a good sample once I get it to work.

from azure-sdk-for-net.

Unable to use UGG OPUS stream in a continuous recognition using C# SDK about azure-sdk-for-net HOT 5 OPEN

Comments (5)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent