Comments (5)
Thanks for the feedback! We are routing this to the appropriate team for follow-up. cc @robch.
from azure-sdk-for-net.
@robch here is how the recording is captured using JavaScript and send to the SignalR hub to call the service above
navigator.mediaDevices.getUserMedia({ audio: true })
.then(stream => {
const mediaRecorder = new MediaRecorder(stream, {
mimeType: "audio/ogg; codecs=opus",
});
const subject = new signalR.Subject();
mediaRecorder.addEventListener("dataavailable", async e => {
// convert blob to base64 to send to the SignalR as string
const uint8Array = new Uint8Array(await e.data.arrayBuffer());
const binaryString = uint8Array.reduce((str, byte) => str + String.fromCharCode(byte), '');
var base64 = btoa(binaryString);
subject.next(base64);
});
// When the recording stops, complete the request for the SignalR hub so that the StopContinuousRecognitionAsync is called.
mediaRecorder.addEventListener("stop", () => {
subject.complete();
});
// when the recording starts, send the subject to the SignalR hub/server
mediaRecorder.addEventListener("start", () => {
connection.send('UploadStream', sessionId, currentRecordingId, subject);
});
// start recording when the record button is clicked
recordButton.addEventListener("click", () => {
if (mediaRecorder.state == "recording") {
mediaRecorder.stop();
} else {
mediaRecorder.start(1000);
}
});
}).catch(err => {
// If the user denies permission to record audio, then display an error.
console.log('Error: ' + err);
alert('You must allow Microphone access to use this feature.');
});
from azure-sdk-for-net.
Alternatively, I tried to use RecognizeOnceAsync()
instead of continuous recognizer as you can see in code below. The request times out every time.
public async Task<string> GetTextAsync(Stream stream, AudioInterpreterTextContext context = null)
{
ArgumentNullException.ThrowIfNull(stream);
stream.Position = 0;
byte[] bytes = null;
if (stream is not MemoryStream memoryStream)
{
memoryStream = new MemoryStream();
await stream.CopyToAsync(memoryStream);
bytes = memoryStream.ToArray();
memoryStream.Dispose();
}
bytes ??= memoryStream.ToArray();
var index = FindHeaderEndIndex(bytes);
if (index > -1)
{
stream.Position = index + 1;
}
else
{
stream.Position = 0;
}
var format = AudioStreamFormat.GetCompressedFormat(AudioStreamContainerFormat.OGG_OPUS);
using var audioStream = AudioInputStream.CreatePushStream(format);
// Do we have to write the bytes in chunks here? Not sure why can't we do audioStream.Write(bytes) instead of the next 14 lines.
using (var binaryReader = new BinaryReader(stream, Encoding.UTF8, leaveOpen: true))
{
byte[] readBytes;
do
{
readBytes = binaryReader.ReadBytes(_bufferSize);
if (readBytes.Length == 0)
{
break;
}
audioStream.Write(readBytes, readBytes.Length);
} while (readBytes.Length > 0);
}
var speechConfig = SpeechConfig.FromSubscription(_options.Key, _options.Region);
if (context != null && context.Data.TryGetValue("Language", out var lang))
{
speechConfig.SpeechRecognitionLanguage = lang?.ToString();
}
using var audioConfig = AudioConfig.FromStreamInput(audioStream);
using var speechRecognizer = new SpeechRecognizer(speechConfig, audioConfig);
var speechRecognitionResult = await speechRecognizer.RecognizeOnceAsync();
if (speechRecognitionResult.Reason == ResultReason.RecognizedSpeech)
{
return speechRecognitionResult.Text;
}
LogErrors(speechRecognitionResult);
return null;
}
Here is the debug trace for the above approach
024-05-15 08:14:35.1589||||Microsoft.CognitiveServices.Speech|DEBUG|[491387]: 211125ms SPX_TRACE_ERROR: base_gstreamer.cpp:211 Error from GStreamer: Source: oggdemux
Message: Could not demultiplex stream.
DebugInfo: ../ext/ogg/gstoggdemux.c(4776): gst_ogg_demux_send_event (): /GstPipeline:pipeline/GstOggDemux:oggdemux:
EOS before finding a chain
2024-05-15 08:14:35.1589||||Microsoft.CognitiveServices.Speech|DEBUG|[187895]: 211125ms SPX_TRACE_INFO: blocking_read_write_buffer.h:127 WaitUntilBytesAvailable: available=0; required=3200 writeZero=true ...
2024-05-15 08:14:35.1589||||Microsoft.CognitiveServices.Speech|DEBUG|[491387]: 211125ms SPX_TRACE_SCOPE_EXIT: base_gstreamer.cpp:186 BaseGstreamer::HandleGstMessageError
2024-05-15 08:14:35.1589||||Microsoft.CognitiveServices.Speech|DEBUG|[187895]: 211125ms SPX_TRACE_ERROR: create_object_helpers.h:21 site does not support ISpxObjectFactory
2024-05-15 08:14:35.1589||||Microsoft.CognitiveServices.Speech|DEBUG|[187895]: 211125ms SPX_THROW_HR: create_object_helpers.h:22 hr = 0x14
2024-05-15 08:14:36.5001||||Microsoft.CognitiveServices.Speech|DEBUG|[187895]: 212461ms SPX_TRACE_ERROR: exception.cpp:123 About to throw Exception with an error code: 0x14 (SPXERR_UNEXPECTED_CREATE_OBJECT_FAILURE)
[CALL STACK BEGIN]
> audio_config_get_audio_processing_options
- pal_string_to_wstring
- pal_string_to_wstring
- pal_string_to_wstring
- pal_string_to_wstring
- pal_string_to_wstring
- pal_string_to_wstring
- pal_string_to_wstring
- pal_string_to_wstring
- pal_string_to_wstring
- configthreadlocale
- BaseThreadInitThunk
- RtlUserThreadStart
[CALL STACK END]
2024-05-15 08:14:36.5001||||Microsoft.CognitiveServices.Speech|DEBUG|[187895]: 212462ms SPX_DBG_TRACE_SCOPE_ENTER: thread_service.cpp:45 CSpxThreadService::Term
2024-05-15 08:14:36.5001||||Microsoft.CognitiveServices.Speech|DEBUG|[187895]: 212462ms SPX_DBG_TRACE_SCOPE_EXIT: thread_service.cpp:45 CSpxThreadService::Term
2024-05-15 08:14:36.5001||||Microsoft.CognitiveServices.Speech|DEBUG|[187895]: 212462ms SPX_DBG_TRACE_SCOPE_EXIT: audio_pump.cpp:173 *** AudioPump THREAD stopped! ***
2024-05-15 08:14:36.5001||||Microsoft.CognitiveServices.Speech|DEBUG|[187895]: 212462ms SPX_TRACE_ERROR: audio_pump.cpp:472 [0000025BF2822E90]CSpxAudioPump::PumpThread(): exception caught during pumping, Exception with an error code: 0x14 (SPXERR_UNEXPECTED_CREATE_OBJECT_FAILURE)
2024-05-15 08:14:36.5001||||Microsoft.CognitiveServices.Speech|DEBUG|[187895]: 212462ms SPX_TRACE_ERROR: create_object_helpers.h:21 site does not support ISpxObjectFactory
2024-05-15 08:14:36.5001||||Microsoft.CognitiveServices.Speech|DEBUG|[187895]: 212462ms SPX_THROW_HR: create_object_helpers.h:22 hr = 0x14
from azure-sdk-for-net.
I changed the recorder settings for testing purpuses by passing the following object to the navigator.mediaDevices.getUserMedia
object.
{
audio: {
autoGainControl: false,
channelCount: 1,
echoCancellation: false,
latency: 0,
noiseSuppression: false,
sampleSize: 16
}, video: false
}
I am able to write the bites to a local file and I am able to play the file with no problem. Here are the metadata info for the saved file
file_type: OPUS
file_type_extension: opus
mime_type: audio/ogg
opus_version: 1
audio_channels: 1
sample_rate: 16000
output_gain: 1
codec_name: opus
codec_long_name: Opus (Opus Interactive Audio Codec)
sample_rate: 48000
channels: 1
channel_layout: mono
duration: 4.76
size: 8648
bit_rate: 14534
Still with no success into converting the audio bytes to test using the SDK.
from azure-sdk-for-net.
I provided a repo for this issue Azure-Samples/cognitive-services-speech-sdk#2387 . Also, this repo can provide a good sample once I get it to work.
from azure-sdk-for-net.
Related Issues (20)
- [BUG] Azure Blob Storage upload threw InvalidQueryParameterValue when using SDK HOT 2
- Valid values for the ImageReference Properties HOT 1
- Type forward system events from Azure.Messaging.EventGrid to Azure.Messaging.EventGrid.SystemEvents
- My version is not in this enum HOT 7
- Add sample demonstrating publishing CNCF CloudEvent to Event Grid namespace topic
- Regarding misleading AuthenticationFailedException - The current credential is not configured to acquire tokens for tenant.... HOT 10
- [QUERY] Will the Azure Content Safety Prompt Shield API support Managed Service Identity (MSI) after General Availability (GA)? HOT 1
- [BUG] token retrieval intermittently stuck in BearerTokenAuthenticationPolicy in Azure.Core 1.40.0 HOT 9
- [BUG] Deserializing `null` BoundingBox throws an exception HOT 11
- [BUG] GetProperties method does not return metadata even thougvh the summary says so HOT 2
- [FEATURE REQ] PublicNetworkAccess.SecuredByPerimeterValue does not exist but is supported by the Storage API HOT 3
- [BUG] Storage operations are hanging HOT 5
- [QUERY] Multiple questions about beta version of package Azure.Maps.Routing HOT 3
- [FEATURE REQ] How to set UserAccountId in AppInsights with Azure.Monitor.OpenTelemetry.Exporter HOT 3
- Update the doc of InteractiveBrowserCred and DeviceCodeCred
- Ali HOT 1
- [FEATURE REQ] RunCommandInput.CommandId Property should support a type in addition to/instead of a string HOT 3
- [QUERY] QueueTrigger and UpdateMessage (visibilityTimeout) HOT 1
- [BUG] Azure OpenAI Assistants (C#) do not work as expected (mixing up roles etc) HOT 5
- [BUG] Cannot remove WAF policy association from App Gateway HttpListener HOT 4
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from azure-sdk-for-net.