Giter Club home page Giter Club logo

Comments (11)

glharper avatar glharper commented on June 19, 2024 2

@kimurakoki Thank you for using JS Speech SDK, and writing this up. I can check whether Azure Service Principal tokens are usable by speech service resources, but I don't believe they are. I'd suggest porting the REST call to grab the auth token using a cognitive services subscription key and region from JS to Python and returning that token value.

from cognitive-services-speech-sdk-js.

rhurey avatar rhurey commented on June 19, 2024 2

@kimurakoki You should be able to use the JS SDK to authenticate via an Entra Token against a Speech Resource...

I just used this JS in Node to use a ClientSecretCredential and was successful.

(async function() {
  // <code>
  "use strict";
  
  // pull in the required packages.
  var sdk = require("microsoft-cognitiveservices-speech-sdk");
  var azIdentity = require("@azure/identity");
  var fs = require("fs");
  
  // replace with your own subscription key,
  // service region (e.g., "westus"), and
  // the name of the file you want to run
  // through the speech recognizer.
  var serviceRegion = "YourRegion"; // e.g., "westus"
  var filename = "YourAudioFile.wav"; // 16000 Hz, Mono
  var clientSecret = "YourClientSecret";
  var tenantId = "YourTenantId";
  var clientId = "YourClientId";
  var aadResourceId = "YourResourceId";

  var cred = new azIdentity.ClientSecretCredential(tenantId, clientId, clientSecret);
  var token = await cred.getToken("https://cognitiveservices.azure.com/.default");

  var speechToken = "aad#" + aadResourceId + "#" + token.token;

  // create the push stream we need for the speech sdk.
  var pushStream = sdk.AudioInputStream.createPushStream();
  
  // open the file and push it to the push stream.
  fs.createReadStream(filename).on('data', function(arrayBuffer) {
    pushStream.write(arrayBuffer.slice());
  }).on('end', function() {
    pushStream.close();
  });
  
  // we are done with the setup
  console.log("Now recognizing from: " + filename);
  console.log("Token: " + speechToken);

  // now create the audio-config pointing to our stream and
  // the speech config specifying the language.
  var audioConfig = sdk.AudioConfig.fromStreamInput(pushStream);
  var speechConfig = sdk.SpeechConfig.fromAuthorizationToken(speechToken, serviceRegion);
  
  // setting the recognition language to English.
  speechConfig.speechRecognitionLanguage = "en-US";
  
  // create the speech recognizer.
  var recognizer = new sdk.SpeechRecognizer(speechConfig, audioConfig);
  
  // start the recognizer and wait for a result.
  recognizer.recognizeOnceAsync(
    function (result) {
      console.log(result);
  
      recognizer.close();
      recognizer = undefined;
    },
    function (err) {
      console.trace("err - " + err);
  
      recognizer.close();
      recognizer = undefined;
    });
  // </code>
  
}());

Can you share some of your JS code that is hitting the error?

from cognitive-services-speech-sdk-js.

paschaldev avatar paschaldev commented on June 19, 2024 1

I had this problem and just found a solution, the User role.

The documentation for the Speech Service SDK is so difficult to follow, sometimes things are not clear, you have to dig through code samples, stack overflow and Chat GPT to fix a problem.

There are documentations, but they're either inaccurate or more theoritical than practical, documentation doesn't account for edge cases so when you're stuck with a problem, you're facing a wall not knowing what to do.

Thank you @kimurakoki

from cognitive-services-speech-sdk-js.

kimurakoki avatar kimurakoki commented on June 19, 2024

@glharper Currently, we have transitioned from JavaScript to Python, setting the subscription key as an environment variable, and generating authorization tokens via REST to utilize voice input. However, this approach necessitates rotating the subscription key if it ever gets compromised. Hence, we were exploring the possibility of using Azure Service Principal tokens. We've also contacted support and were informed that it might be feasible, which led us to raise this issue as a potential bug. Thank you for your response.

from cognitive-services-speech-sdk-js.

kimurakoki avatar kimurakoki commented on June 19, 2024

@rhurey

I'm working on a project where the frontend is in JavaScript and the backend in Python, running on EKS. We've integrated EKS service accounts with Azure Managed Identities to authenticate EKS pods when accessing Azure resources.

The following is a snippet from our Python REST API, which provides a token through the /api/generate-token endpoint:

# Python code for generating the token
from azure.identity import ClientAssertionCredential

def get_azure_service_account_token():
    # Your logic to retrieve the Azure Service Account Token
    pass

credential = ClientAssertionCredential(
    tenant_id="Your Tenant ID",
    client_id="Your Client ID",
    func=get_azure_service_account_token,
)

token_response = credential.get_token(
    "https://cognitiveservices.azure.com/.default"
)

resourceId = "Your Resource ID"
region = "Your Region"
# Generate the authorization token
authorizationToken = "aad#" + resourceId + "#" + token_response.token
# Or, alternatively
authorizationToken = token_response.token

We retrieve this token in our JavaScript frontend as follows:

async getAzureSpeechToken() {
  const response = await fetch('/api/generate-token', {
    method: "GET",
  });
  const result = await response.json();
  return {
    token: result.token,
    region: result.region,
  };
};

async startRecognize() {
  try {
    const { token, region } = await this.getAzureSpeechToken();
    const speechConfig = SpeechSDK.SpeechConfig.fromAuthorizationToken(token, region);
    speechConfig.speechRecognitionLanguage = "ja-JP";
    const recognizer = new SpeechSDK.SpeechRecognizer(speechConfig);
    recognizer.startContinuousRecognitionAsync();

    // Event handlers and other logic here...

  } catch (e) {
    console.error(e);
  }
}

However, I'm encountering the following error:

WebSocket connection to 'wss://japaneast.stt.speech.microsoft.com/speech/recognition/interactive/cognitiveservices/v1?language=ja-JP&format=simple&Authorization=[REDACTED]' failed:

Could you provide some insights or suggestions to resolve this issue?

from cognitive-services-speech-sdk-js.

rhurey avatar rhurey commented on June 19, 2024

Maybe this is just some sample code and I'm not that fluent in Python, but...

# Generate the authorization token
authorizationToken = "aad#" + resourceId + "#" + token_response.token
# Or, alternatively
authorizationToken = token_response.token

Doesn't that just overwrite the formatted value of the token with the non-formatted value?

Have you printed out the token that the Speech SDK is getting and seen it's formatted correctly?

from cognitive-services-speech-sdk-js.

kimurakoki avatar kimurakoki commented on June 19, 2024

@rhurey

I apologize for any confusion caused by my previous message. I wanted to clarify that in my code, I've actually tried both patterns for generating the authorizationToken. However, in practice, only one of these methods is used at a time, not both simultaneously. The two lines of code were meant to show alternative ways to format the token, but only one format is implemented in the actual code.

from cognitive-services-speech-sdk-js.

rhurey avatar rhurey commented on June 19, 2024

@kimurakoki Thanks, just wanted to make sure it wasn't that.

I went and found the service logs from the initial connection ID in this issue, and the logs indicate the token passed didn't start with "aad#"

from cognitive-services-speech-sdk-js.

kimurakoki avatar kimurakoki commented on June 19, 2024

Hey @rhurey, @glharper

Just wanted to drop a quick thank you for your help with this issue. We figured it out - turns out we needed the Cognitive Services Speech User role, not the Cognitive Services Contributor.

Also, just a heads-up on why we got mixed up. We thought "Cognitive Services Contributor" was fine based on the info at Microsoft's Speech Service RBAC doc. Plus, "Cognitive Services Speech User" wasn't listed in the Azure RBAC built-in roles, so that added to the confusion.

Anyway, all sorted now. Thanks again!

Cheers,

from cognitive-services-speech-sdk-js.

ievgennaida avatar ievgennaida commented on June 19, 2024

@kimurakoki are you finally returning an access token with the speech scope issued (your example) or you call cognitive speech url to issue additional limited token after?

from cognitive-services-speech-sdk-js.

kimurakoki avatar kimurakoki commented on June 19, 2024

@ievgennaida Here is an example of how to generate a speech token for Azure Cognitive Services. This Python function dynamically selects the appropriate credentials based on the environment (local vs. managed identity in Azure) and retrieves an authorization token for Azure's speech services:

def generate_speech_token(self) -> TokenResponse:
    credential = (
        DefaultAzureCredential()
        if is_local()
        else ClientAssertionCredential(
            tenant_id=get_azure_tenant_id(),
            client_id=get_azure_managed_identity_client_id(),
            func=get_azure_service_account_token,
        )
    )

    token_response = credential.get_token(
        "https://cognitiveservices.azure.com/.default"
    )

    authorizationToken = (
        "aad#" + get_azure_speech_resouce_id() + "#" + token_response.token
    )
    return TokenResponse(
        token=authorizationToken,
        region=get_azure_speech_region(),
    )

from cognitive-services-speech-sdk-js.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.