Comments (7)
ASR takes a lot of RAM, compute, and battery, so it's not realistic to do on the ASG.
It can be optimized. What are the available resources on ASG?
from wearableintelligencesystem.
Hey @nshmyrev , thanks for checking this out.
ASG - Android Smart Glasses
The current ASG hardware is a Vuzix Blade with specs:
- Quad Core ARM Cortex-A53
- 1GB RAM
- Android 5.1.1
Important considerations:
- battery life - this is paramount. Whatever's cheaper on battery (streaming audio over WiFi or BT vs ASR) will probably win
- ASR accuracy/WER - this is used for live conversations in noisy environments - we were using Google because DeepSpeech just wasn't accurate enough - realize that Vosk is achieving significantly better performance with larger models, and hoping to use larger models on ASP (Android Smart Phone)
from wearableintelligencesystem.
The current ASG hardware is a Vuzix Blade with specs:
Well, you definitely can run at least keyword activation on that. Something like https://github.com/ARM-software/ML-KWS-for-MCU should help and take very few resources. The rest depends on the app if you want to recognize just a few commands or more serious queries.
ASR accuracy/WER - this is used for live conversations in noisy environments - we were using Google because DeepSpeech just wasn't accurate enough - realize that Vosk is achieving significantly better performance with larger models, and hoping to use larger models on ASP (Android Smart Phone)
Ok, if you need help on this let me know. I wanted to work with Vuzix on that but they never responded to my queries somehow.
from wearableintelligencesystem.
Ok, if you need help on this let me know. I wanted to work with Vuzix on that but they never responded to my queries somehow.
Great, thanks, we could certainly use some help in terms of getting highest possible accuracy/WER.
Since we will be streaming audio from ASG to ASP either way, it makes sense battery-wise and compute-wise to do ASR on ASP.
I've seen incredible results with the vosk-model-en-us-0.22
and good results with the vosk-model-small-en-us-0.15
. How reasonable would it to get the larger model (or something in between) with better accuracy going on a modern Android smart phone?
We're also looking into better sensors - the microphone used has a drastic effect on the WER. Have you tried different mics and found any that are ideal?
Happy to move this to an issue on Vosk repo. First priority is getting the whole pipeline working, but we'll soon want to optimize.
Thanks @nshmyrev
from wearableintelligencesystem.
Tested both vosk-model-en-us-0.22
and vosk-model-en-us-0.22-lgraph
from https://alphacephei.com/vosk/models on Android, and both work! This is on my private fork which will be merged in the next few days.
The larger model vosk-model-en-us-0.22
wouldn't build with Gradle (OOM, even with 8gb build allowance). But build works in Bazel. It takes 10 minutes for vosk-model-en-us-0.22
though, so will want to make this something to download separate from the APK.
from wearableintelligencesystem.
The larger model vosk-model-en-us-0.22 wouldn't build with Gradle (OOM, even with 8gb build allowance). But build works in Bazel. It takes 10 minutes for vosk-model-en-us-0.22 though, so will want to make this something to download separate from the APK.
Big model is certainly not for Android. The lgraph version should be ok.
from wearableintelligencesystem.
Successful for all steps. Vosk is very high quality ASR, even with the small model.
A few things we need and will follow up with in future issues:
- omnidirectional microphone that picks up far-field data - we want to be able to transcribe conversations, but right now we are only transcribing the user because the microphone on the Vuzix Blade is directional and near-field
- custom vocabulary for wake words and commands
- contextual ASR - using the context and the user's vocabulary to judge what is more likely ASR output amongst a probability distribution of outputs
- english grammar ASR - using proper grammar / making linguistic sense to judge the likely ASR output amongst a probability distribution of outputs
from wearableintelligencesystem.
Related Issues (19)
- ASP application HOT 1
- Voice Command on ASP HOT 1
- Sensors - retrieve, stream, and save
- Image Search
- Live Life Captions - enhanced interface
- Affective Computing
- Memory Expansion Tools
- Mobility - GLBOX shrink or dissapear HOT 2
- Write docs HOT 1
- Improve name rememberer / facial recognition system
- KeyError: 'summary', calling the web service
- Add user accounts and secure endpoints with JWT
- Audio/Visual/Generic-Sensor low-latency internet streaming
- ignore
- Mapping - Map overlay, voice command directions HOT 2
- Vision Upgrades - Thermal Vision, Night Vision, Eyes-in-back-head
- Testtest
- Multi-user
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from wearableintelligencesystem.