Giter Club home page Giter Club logo

Comments (7)

nshmyrev avatar nshmyrev commented on June 4, 2024 1

ASR takes a lot of RAM, compute, and battery, so it's not realistic to do on the ASG.

It can be optimized. What are the available resources on ASG?

from wearableintelligencesystem.

CaydenPierce avatar CaydenPierce commented on June 4, 2024

Hey @nshmyrev , thanks for checking this out.

ASG - Android Smart Glasses

The current ASG hardware is a Vuzix Blade with specs:

  • Quad Core ARM Cortex-A53
  • 1GB RAM
  • Android 5.1.1

Important considerations:

  • battery life - this is paramount. Whatever's cheaper on battery (streaming audio over WiFi or BT vs ASR) will probably win
  • ASR accuracy/WER - this is used for live conversations in noisy environments - we were using Google because DeepSpeech just wasn't accurate enough - realize that Vosk is achieving significantly better performance with larger models, and hoping to use larger models on ASP (Android Smart Phone)

from wearableintelligencesystem.

nshmyrev avatar nshmyrev commented on June 4, 2024

The current ASG hardware is a Vuzix Blade with specs:

Well, you definitely can run at least keyword activation on that. Something like https://github.com/ARM-software/ML-KWS-for-MCU should help and take very few resources. The rest depends on the app if you want to recognize just a few commands or more serious queries.

ASR accuracy/WER - this is used for live conversations in noisy environments - we were using Google because DeepSpeech just wasn't accurate enough - realize that Vosk is achieving significantly better performance with larger models, and hoping to use larger models on ASP (Android Smart Phone)

Ok, if you need help on this let me know. I wanted to work with Vuzix on that but they never responded to my queries somehow.

from wearableintelligencesystem.

CaydenPierce avatar CaydenPierce commented on June 4, 2024

Ok, if you need help on this let me know. I wanted to work with Vuzix on that but they never responded to my queries somehow.

Great, thanks, we could certainly use some help in terms of getting highest possible accuracy/WER.

Since we will be streaming audio from ASG to ASP either way, it makes sense battery-wise and compute-wise to do ASR on ASP.

I've seen incredible results with the vosk-model-en-us-0.22 and good results with the vosk-model-small-en-us-0.15. How reasonable would it to get the larger model (or something in between) with better accuracy going on a modern Android smart phone?

We're also looking into better sensors - the microphone used has a drastic effect on the WER. Have you tried different mics and found any that are ideal?

Happy to move this to an issue on Vosk repo. First priority is getting the whole pipeline working, but we'll soon want to optimize.

Thanks @nshmyrev

from wearableintelligencesystem.

CaydenPierce avatar CaydenPierce commented on June 4, 2024

Tested both vosk-model-en-us-0.22 and vosk-model-en-us-0.22-lgraph from https://alphacephei.com/vosk/models on Android, and both work! This is on my private fork which will be merged in the next few days.

The larger model vosk-model-en-us-0.22 wouldn't build with Gradle (OOM, even with 8gb build allowance). But build works in Bazel. It takes 10 minutes for vosk-model-en-us-0.22 though, so will want to make this something to download separate from the APK.

@nshmyrev

from wearableintelligencesystem.

nshmyrev avatar nshmyrev commented on June 4, 2024

The larger model vosk-model-en-us-0.22 wouldn't build with Gradle (OOM, even with 8gb build allowance). But build works in Bazel. It takes 10 minutes for vosk-model-en-us-0.22 though, so will want to make this something to download separate from the APK.

Big model is certainly not for Android. The lgraph version should be ok.

from wearableintelligencesystem.

CaydenPierce avatar CaydenPierce commented on June 4, 2024

Successful for all steps. Vosk is very high quality ASR, even with the small model.

A few things we need and will follow up with in future issues:

  • omnidirectional microphone that picks up far-field data - we want to be able to transcribe conversations, but right now we are only transcribing the user because the microphone on the Vuzix Blade is directional and near-field
  • custom vocabulary for wake words and commands
  • contextual ASR - using the context and the user's vocabulary to judge what is more likely ASR output amongst a probability distribution of outputs
  • english grammar ASR - using proper grammar / making linguistic sense to judge the likely ASR output amongst a probability distribution of outputs

from wearableintelligencesystem.

Related Issues (19)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.