Comments (11)
@kimurakoki Thank you for using JS Speech SDK, and writing this up. I can check whether Azure Service Principal tokens are usable by speech service resources, but I don't believe they are. I'd suggest porting the REST call to grab the auth token using a cognitive services subscription key and region from JS to Python and returning that token value.
from cognitive-services-speech-sdk-js.
@kimurakoki You should be able to use the JS SDK to authenticate via an Entra Token against a Speech Resource...
I just used this JS in Node to use a ClientSecretCredential and was successful.
(async function() {
// <code>
"use strict";
// pull in the required packages.
var sdk = require("microsoft-cognitiveservices-speech-sdk");
var azIdentity = require("@azure/identity");
var fs = require("fs");
// replace with your own subscription key,
// service region (e.g., "westus"), and
// the name of the file you want to run
// through the speech recognizer.
var serviceRegion = "YourRegion"; // e.g., "westus"
var filename = "YourAudioFile.wav"; // 16000 Hz, Mono
var clientSecret = "YourClientSecret";
var tenantId = "YourTenantId";
var clientId = "YourClientId";
var aadResourceId = "YourResourceId";
var cred = new azIdentity.ClientSecretCredential(tenantId, clientId, clientSecret);
var token = await cred.getToken("https://cognitiveservices.azure.com/.default");
var speechToken = "aad#" + aadResourceId + "#" + token.token;
// create the push stream we need for the speech sdk.
var pushStream = sdk.AudioInputStream.createPushStream();
// open the file and push it to the push stream.
fs.createReadStream(filename).on('data', function(arrayBuffer) {
pushStream.write(arrayBuffer.slice());
}).on('end', function() {
pushStream.close();
});
// we are done with the setup
console.log("Now recognizing from: " + filename);
console.log("Token: " + speechToken);
// now create the audio-config pointing to our stream and
// the speech config specifying the language.
var audioConfig = sdk.AudioConfig.fromStreamInput(pushStream);
var speechConfig = sdk.SpeechConfig.fromAuthorizationToken(speechToken, serviceRegion);
// setting the recognition language to English.
speechConfig.speechRecognitionLanguage = "en-US";
// create the speech recognizer.
var recognizer = new sdk.SpeechRecognizer(speechConfig, audioConfig);
// start the recognizer and wait for a result.
recognizer.recognizeOnceAsync(
function (result) {
console.log(result);
recognizer.close();
recognizer = undefined;
},
function (err) {
console.trace("err - " + err);
recognizer.close();
recognizer = undefined;
});
// </code>
}());
Can you share some of your JS code that is hitting the error?
from cognitive-services-speech-sdk-js.
I had this problem and just found a solution, the User role.
The documentation for the Speech Service SDK is so difficult to follow, sometimes things are not clear, you have to dig through code samples, stack overflow and Chat GPT to fix a problem.
There are documentations, but they're either inaccurate or more theoritical than practical, documentation doesn't account for edge cases so when you're stuck with a problem, you're facing a wall not knowing what to do.
Thank you @kimurakoki
from cognitive-services-speech-sdk-js.
@glharper Currently, we have transitioned from JavaScript to Python, setting the subscription key as an environment variable, and generating authorization tokens via REST to utilize voice input. However, this approach necessitates rotating the subscription key if it ever gets compromised. Hence, we were exploring the possibility of using Azure Service Principal tokens. We've also contacted support and were informed that it might be feasible, which led us to raise this issue as a potential bug. Thank you for your response.
from cognitive-services-speech-sdk-js.
I'm working on a project where the frontend is in JavaScript and the backend in Python, running on EKS. We've integrated EKS service accounts with Azure Managed Identities to authenticate EKS pods when accessing Azure resources.
The following is a snippet from our Python REST API, which provides a token through the /api/generate-token
endpoint:
# Python code for generating the token
from azure.identity import ClientAssertionCredential
def get_azure_service_account_token():
# Your logic to retrieve the Azure Service Account Token
pass
credential = ClientAssertionCredential(
tenant_id="Your Tenant ID",
client_id="Your Client ID",
func=get_azure_service_account_token,
)
token_response = credential.get_token(
"https://cognitiveservices.azure.com/.default"
)
resourceId = "Your Resource ID"
region = "Your Region"
# Generate the authorization token
authorizationToken = "aad#" + resourceId + "#" + token_response.token
# Or, alternatively
authorizationToken = token_response.token
We retrieve this token in our JavaScript frontend as follows:
async getAzureSpeechToken() {
const response = await fetch('/api/generate-token', {
method: "GET",
});
const result = await response.json();
return {
token: result.token,
region: result.region,
};
};
async startRecognize() {
try {
const { token, region } = await this.getAzureSpeechToken();
const speechConfig = SpeechSDK.SpeechConfig.fromAuthorizationToken(token, region);
speechConfig.speechRecognitionLanguage = "ja-JP";
const recognizer = new SpeechSDK.SpeechRecognizer(speechConfig);
recognizer.startContinuousRecognitionAsync();
// Event handlers and other logic here...
} catch (e) {
console.error(e);
}
}
However, I'm encountering the following error:
WebSocket connection to 'wss://japaneast.stt.speech.microsoft.com/speech/recognition/interactive/cognitiveservices/v1?language=ja-JP&format=simple&Authorization=[REDACTED]' failed:
Could you provide some insights or suggestions to resolve this issue?
from cognitive-services-speech-sdk-js.
Maybe this is just some sample code and I'm not that fluent in Python, but...
# Generate the authorization token
authorizationToken = "aad#" + resourceId + "#" + token_response.token
# Or, alternatively
authorizationToken = token_response.token
Doesn't that just overwrite the formatted value of the token with the non-formatted value?
Have you printed out the token that the Speech SDK is getting and seen it's formatted correctly?
from cognitive-services-speech-sdk-js.
I apologize for any confusion caused by my previous message. I wanted to clarify that in my code, I've actually tried both patterns for generating the authorizationToken. However, in practice, only one of these methods is used at a time, not both simultaneously. The two lines of code were meant to show alternative ways to format the token, but only one format is implemented in the actual code.
from cognitive-services-speech-sdk-js.
@kimurakoki Thanks, just wanted to make sure it wasn't that.
I went and found the service logs from the initial connection ID in this issue, and the logs indicate the token passed didn't start with "aad#"
from cognitive-services-speech-sdk-js.
Just wanted to drop a quick thank you for your help with this issue. We figured it out - turns out we needed the Cognitive Services Speech User role, not the Cognitive Services Contributor.
Also, just a heads-up on why we got mixed up. We thought "Cognitive Services Contributor" was fine based on the info at Microsoft's Speech Service RBAC doc. Plus, "Cognitive Services Speech User" wasn't listed in the Azure RBAC built-in roles, so that added to the confusion.
Anyway, all sorted now. Thanks again!
Cheers,
from cognitive-services-speech-sdk-js.
@kimurakoki are you finally returning an access token with the speech scope issued (your example) or you call cognitive speech url to issue additional limited token after?
from cognitive-services-speech-sdk-js.
@ievgennaida Here is an example of how to generate a speech token for Azure Cognitive Services. This Python function dynamically selects the appropriate credentials based on the environment (local vs. managed identity in Azure) and retrieves an authorization token for Azure's speech services:
def generate_speech_token(self) -> TokenResponse:
credential = (
DefaultAzureCredential()
if is_local()
else ClientAssertionCredential(
tenant_id=get_azure_tenant_id(),
client_id=get_azure_managed_identity_client_id(),
func=get_azure_service_account_token,
)
)
token_response = credential.get_token(
"https://cognitiveservices.azure.com/.default"
)
authorizationToken = (
"aad#" + get_azure_speech_resouce_id() + "#" + token_response.token
)
return TokenResponse(
token=authorizationToken,
region=get_azure_speech_region(),
)
from cognitive-services-speech-sdk-js.
Related Issues (20)
- [Bug]: ErrorType (UnexpectedBreak,MissingBreak) are not receiving in detailResult words from sdk HOT 4
- [Bug]: speakSsmlAsync produces 0 duration audio but result reason is SynthesizingAudioCompleted HOT 1
- [Bug]: Real-Time Speech-to-Text Lag and Synchronization Problems on Low-Power Devices HOT 4
- [Bug]: ConversationTranscriptionResult always return 0 on Channel info HOT 1
- Illegal Invocation Error When Using Speech SDK in Cloudflare Workers Environment HOT 5
- [Bug]: 2 Node [s] with type [Others] should not contain node [voice] with type [Media] HOT 2
- [Bug]: No way to determine when the produced audio has completed HOT 2
- [Bug]: Websocket 404 in Firefox HOT 15
- [Bug]: 3D Blendshape Data Not Generating for Super Realistic Voices HOT 8
- I'm looking for a way to adjust these threshold values depending on the country, but I haven't found any options or settings for that. HOT 2
- [Bug]: JS SpeechSDK.AudioConfig.fromDefaultMicrophoneInput capturing Teams/Zoom call speaker sounds where as JAVA SpeechSDK.AudioConfig.fromDefaultMicrophoneInput not HOT 8
- How do I get the speaker's name from SpeechSynthesizer events?
- [Bug]: SDK Crashes HOT 1
- [Bug]: Speech translation dynamic addTargetLanguage fails after no speech for 1 min HOT 2
- [Bug]: Firefox WebSocket HTTP/2 Issue: App Malfunction When Engine Started and Stopped Multiple Times HOT 8
- [Bug]: Browser Unable to Decode and Play Partial Speech Segments due to Missing Header Information HOT 1
- [Bug]: Azure Speech Recognition Not Converting Speech to Text for Chinese Language HOT 9
- [Doc]: TTS batch synthesis maximum JSON payload size HOT 1
- [Bug]: SpeakerAudioDestination > onAudioEnd does not work
- Seeking Advice on Optimizing Azure Speech Services Region Handling HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from cognitive-services-speech-sdk-js.