Giter Club home page Giter Club logo

emexlabs / wearableintelligencesystem Goto Github PK

View Code? Open in Web Editor NEW
105.0 14.0 23.0 613.64 MB

Wearable computing software framework for intelligence augmentation research and applications. Easily build smart glasses apps, relying on built in voice command, speech recognition, computer vision, UI, sensors, smart phone connection, NLP, facial recognition, database, cloud connection, and more. This repo is in beta.

License: MIT License

Java 12.41% Starlark 7.32% Dockerfile 0.04% Shell 0.21% Makefile 0.01% SCSS 0.01% Python 4.37% C++ 71.69% HTML 0.04% JavaScript 0.05% Objective-C 1.09% Objective-C++ 1.27% C 0.65% Kotlin 0.87%
smartglass smartglasses hci human-computer-interaction ai wearable wearables wearable-computing android machine-learning

wearableintelligencesystem's Introduction

Wearable Intelligence System

ARCHIVED. The WIS has been been reorganized and upgraded to be faster, cleaner, and support more smart glasses as the SmartGlassesManager: https://github.com/TeamOpenSmartGlasses/SmartGlassesManager/

The Wearable Intelligence System (WIS) is the homepage for your smart glasses with a host of built-in apps, voice controls, always available HUD information, app launcher, and more. The WIS makes building smart glasses applications easy. There's a number of powerful and fashionable smart glasses being released (2022-24), and the WIS gives you an interface and apps to make those glasses useful. The WIS is like your phone homescreen or your computer desktop combined with a smart assistant.

Beta Version Video

Wearable Intelligence System Beta Release Demo

Early Alpha Version Video

Wearable Intelligence System alpha version Demo

What It Can Do Now

User Features

  • Search - Search the web with voice, see immediate results on your HUD.
  • Ask Questions - Ask an intelligent voice assistant any question, see answers on your HUD.
  • Live Translation - Translate live foreign language speech into your native language, and silently read the output on the screen.
  • Remember More - Memory tools to expand your memory with information recording and recall.
  • Visual Search - Define anything that you see. Find images with similiar content as your point-of-view (POV) camera image.
  • Name Memorizer - Never forget a name again with HUD notifications when you see a familiar face
  • Live Captions - Improve understanding and retention in conversations, meetings, lectures, etc. with live closed captions overlaid on your vision at all times.
  • Autociter / Wearable Referencer - Auto-associative voice search through a personal database, send relevant links to conversation partners over SMS.

Developer Use

The WIS can do much more if you are a researcher, engineer, scientist, or diy hobby maker because it's a software framework that makes it easy to build smart glasses applications and experiments. Checkout the Documentation for more information.

How To Use

You will need two pieces of hardware to run the system:

  • ASP - Android Smart Phone
  • ASG - Android Smart Glasses (Supported: Vuzix Blade)

Voice Commands

All voice commands must be preceded by a wakeword (the wake word is "hey computer"). A wakeword is "hey computer" or any word you choose to "wake up" the system and start listening to commands.

Wakeword

The wake word is "hey computer".

Voice Commands

Say "hey computer" to see available commands.

Some of the available voice commands:
  • search for <query - search the web for anything, see the intelligently chosen top result
  • question <query> - ask a question to an intelligence assistant
  • run visual search - use a POV image to search the web for anything that you see around you
  • save speech <note> - save any voice note to your cache of notes. This can be used to save ideas, thoughts, notes, reminders, etc.
  • save speech tag <tagname> <note> - save any voice note to your cache of notes and to a specific tag bin named
  • run speech translate <language> - live translate the given language into english
  • run live life captions - display live closed captions
  • run blank screen - blank the screen

Abbreviations

ASP - Android Smart Phone
ASG - Android Smart Glasses
GLBOX - GNU/Linux 'Single Board Computer'/Laptop

Install / Use

First Time Setup

  1. On your Android smart phone, download the "Wearable Intelligence System" app:
  2. On your smart glasses, download the "Wearable Intelligence System" app:
    • Launch the "Wearable Intelligence System" app on your smart phone
    • Accept permissions.
    • Tap "Start Wifi Hotspot", turn on (configure password if necessary) your wifi hotspot, then go "Back" to return
  3. Connect smart glasses WiFi to the smart phone WiFi hotspot
  4. Enable mobile data (or wifi sharing) on Android smart phone
  5. Start "Wearable Intelligence System" application on smart glasses
    • The phone connection icon will be green if the glasses are connected to your phone. If you speak, you'll see a live transcript on the smart glasses screen.
    • On the Android smart phone, got to "Memory Tools" -> "Memory Stream" and you will see live transcripts
  6. Setup complete.

Normal Use

Here's how to launch the system after you've already done the initial setup above:

  1. Launch "WIS" app on smart phone
  2. Enable mobile hotspot on smart phone with the "Start WiFi Hotspot" button
  3. Connect Android smart glasses to Android smart phone WiFi hotspot.
  4. Launch "WIS" app on smart glasses.
  5. Verify system is running by the "Smart Glasses Conection Indicator" icon turning white on the smart glasses HUD.

Documentation / Developers

The docs are hosted on this repo's Wiki, here are the docs.

Authors

The system is fully Open Source and built by this growing list of contributors:

We are actively building a community that is building cognitive augmentation technologies together.

The Wearable Intelligence System was started at Emex Labs by Cayden Pierce.

wearableintelligencesystem's People

Contributors

caydenpierce avatar chuoling avatar stairs1 avatar thisisvaze avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

wearableintelligencesystem's Issues

ASP application

The current ASP application is basically a souped up MediaPipe example running our custom MediaPipe Perception Pipeline.

We need to make it reliably run on the ASP for many hours/days at a time, in the background, always available to receive a connection and/or requests from the ASG.

TODO

  • get app editing/building in Android studio - not essential but will help development speed
  • get current MediaPipe inferencing and ASG socket stuff running in an Android Service - determine the best type of service (Background, Foreground, Bound, etc.) to do the job. Likely needs a Foreground service like here
  • start saving incoming images/video stream from ASG into a file structure that makes sense
  • setup database on Android with SqlLite or Realm
  • save image data (file location, timestamp, metadata) in database
  • setup a WebSocket connection on ASP that will run always in background - this will accept sensor data, function requests, and push data to the ASG
  • stream audio data from ASG to ASP - either over raw TCP socket or over the new WebSocket above

Mapping - Map overlay, voice command directions

Pulling out one's phone all the time while navigating a new place is difficult, dangerous, and forces one to stop in their journey. This is made worse if the individual is operating some vehicle (bicycle, car) where they must disengage all together.

The current idea:

  1. User issues a voice command, asking for directions to their desired location. " directions to x"
  2. The system pulls up a map, centered on the current users position, and with the direction path highlighted. The system also output speech to give the user turn-by-turn directions.

Implementation

OsmAnd Mapping app for Android combined with OsmAnd-Api

OsmAnd~ app - https://f-droid.org/en/packages/net.osmand.plus/ (this is like the Google Play OsmAnd+ version, but ~ for devs)

Osm-and-api: https://github.com/osmandapp/osmand-api-demo/
^this demo app runs the app IN ITSELF - it doesn't start the OsmAnd activity with some arguments (like running a program that takes over) but OsmAnd maps show up directly in the parents app/activity

Use OsmAndApi to call OsmAnd and display OsmAnd maps in the app. Use the "Navigation" -> "Navigate and Search"

For see the folder osmand-api-demo in the osmand-api-demo repo and read the README.md in that folder for instructions on how to use the osmand-api

Audio/Visual/Generic-Sensor low-latency internet streaming

We need a low-latency, reliable method of streaming audio data, visual data, and generic sensory data (e.g. GPS, accelerometer, EEG, heart rate, etc.) over the internet.

The state-of-the-art appears to be WebRTC:
- app streaming WebRTC from Vuzix M400: https://github.com/dalkofjac/smartglassy-demo-foi
- android WebRTC: https://github.com/renyuzhuo/WebRTC-Android-Learn

Keep in mind that Vuzix Blade and other Android smart glasses we are using now are old Android (Vuzix Blade == 5.1) so we may be limited in libraries we can use on the ASG side.

Write docs

There is a lot of docs here in the 4 READMEs. But, there's some problems.

Problems

  • there are 4 READMEs
  • the text wall makes it hard to find what you need when you need it
  • the system has significantly evolved and thus some of the older README content isn't applicable, and some new information is missing

Solutions

  • migrate docs to Read the docs - https://docs.readthedocs.io/en/stable/tutorial/
  • Write a "Before you start" guide
  • Write a "Setup" guide for those just setting up using APK download
  • Write a "Developer" setup guide for setting up ASP environment, ASG environment, and GLBOX

KeyError: 'summary', calling the web service

Calling the web service it responds with an error:
KeyError: 'summary'
step to play:
1 - after downloading the DEV branch, raise the [main_webserver.py] service
2 - call with a client (e.g. postman) http://127.0.0.1:5000/semantic_web_speech
3 - pass the {"text": "Giovanni Capuozzo"} into the request body

proposta fix:
main_tools.py line "68-76"


        #limit summary
        for entity in entity_list:
            try:    
                 " ".join(entity["summary"].split(" ")[:self.summary_limit]) + "..."
            except KeyError:
                entity["summary"] = ""
                # entity["summary"] = "summary not aviable"
            pass
            

Image Search

Use voice command to search the web for text and receive a GridView of images displayed.

Voice Command on ASP

Note: this relies on completion of #11

This is part of the move to no longer streaming sensor data over the internet.

Voice Command on ASP

The entire voice command is written in Python on the GLBOX. However, text parsing is not very complicated, so porting to Android should be straightforward, and doing so will avoid the latency and wastefulness of transcribing locally then streaming results to the web for processing.

TODO

  • parse wake words
  • parse commands
  • parse arguments
  • implement wolfram search function
  • implement azure photo search function
  • implement wikipedia search function
  • run functions based on command
  • allow some wake words to also be commands

Memory Expansion Tools

Build a system that improves users memories and thinking by recording everything a user experiences and creating a searchable, computable, associative knowledge representation of that information.

Contact [email protected] for more details.

Multi-user

Make the system work for multiple users at once:

  • implement a "listener" that only starts up services (speech to text, etc.) when a connection is made and kills services when no connection has been made for n seconds - can be done with a master REST server and containerized instances of the main.py system in ./gnu_linux_box
  • modify "listener" to be able to host multiple sessions from different users that identify themselves with a unique key/username/JWT/login

Mobility - GLBOX shrink or dissapear

An issue with the usability, we arability, and mobility of the system is the need to carry the GLBOX laptop around everywhere.

It's heavy, bulky, requires a bag to carry, and thus isn't highly mobile.

There are two main options here (please feel free to suggest other options):

  1. Move the GLBOX to a smaller wearable that fits in pocket

    • all SBCs that I know of which have compute are too big (i.e. RaspberryPi 4 is no better than laptop as it won't fit in pocket)
    • Running GNU/Linux on Android phone hardware possible but buggy, unsupported, probably cause a lot of tangential issues
    • Pi Zero W (or similiar GNU/Linux SBCs) are small, low power, and could do it, but they are so computationally weak that we would have to stream video to the cloud or the ASP anyway for any intense computer vision
  2. Move GLBOX to the cloud

    • requires implementing audio streaming from ASG to GLBOX over internet
    • introduces latency on transcription and command response
    • will use a lot of data, means the system will now only work with an internet connection
    • better long term - means that users don't have to buy 2 pieces of hardware, and unlimited, high speed data is a reasonable assumption for 5 years out

Decision

2, move GLBOX to the cloud

Why?

For a number of reason, #2 makes more sense:

  • requiring another piece of hardware is a huge deterrent to use
    • expensive
    • another thing to carry
    • another thing to keep charged
    • increased system complexity and mental load
  • no current mobile GNU/Linux implementation is all three of:
    1. Powerful enough to run heavy compute at small size and power
    2. Proper form factor to fit in a pocket (smartphone form factor)
    3. Reliable
      -in the future this may be possible with mobile GNU/Linux system (running GNU/Linux on mobile chipsets - current implementations (Ubuntu Touch, PostmarketOS, etc.) are not ready for production
  • cloud implementation is more agile, can constantly make changes/push features/etc.

How?

The GLBOX code is already a Python socket server running in GNU/Linux, so moving it to the "the cloud" is almost already done. We just need to modify a few things to make it work on a cloud server as opposed to a wearable server.

MVP

  • audio capture on ASG that is streamed to the GLBOX (as opposed to current scheme where audio streams direct to GLBOX via Bluetooth)
  • finish implementing "heart beats" (pings) on both GLBOX and ASG sides (almost done)
  • switch from current LAN advertising scheme to referencing GLBOX with domain name

Further

  • finish implementing a streaming VAD which will turn off speech to text service when there is no voice, turn speech to text back on when there is a voice, and will hold a circular buffer of audio so no transcriptions are lost

Vision Upgrades - Thermal Vision, Night Vision, Eyes-in-back-head

Give the user the ability to see in all kinds of extended sensory spaces - augmediated vision [1].

The most straightforward way to do this is to buy USB cameras, connect these directly to the ASG via USB, and stream a live view of that camera to the display.

Deliverables

  • Get USB camera connected and streaming to ASG
  • Get live viewfinder view of the camera output on ASG screen
  • Integrate this view into the WIS such that it can be pulled up with a single voice command - " switch modes extended vision"
  • integrate (connect, stream, view, attach to hardware) regular camera - eyes in the back of the head
  • integrate thermal vision
  • integrate binocular vision
  • integrate magnifying vision
  • integrate audio vision

References

[1]: Steve Mann - https://spectrum.ieee.org/steve-mann-my-augmediated-life, https://www.instructables.com/Augmented-Reality-Eyeglass-With-Thermal-Vision-Bui/

Live Life Captions - enhanced interface

  • use ring mouse to implement upgraded interface with text - scroll through words
  • ability to define any word that has appeared in the stream
  • implement a user vocabulary and auto-define words that are rare and haven't been seen before

Sensors - retrieve, stream, and save

We want to collect data from a number of sensors on the ASG and ASP and save them in a database.

Some sensors are sparse in time and can be saved directly to the database (e.g. GPS). Others are time series signals and must be saved to files that are referenced by the database (e.g. video).

Sensors

Audio: pull from microphone, encode audio on ASG, stream to ASP, save locally (how to save - in file chunks?)
Video: get from ASG camera in background (AndroidHiddenCamera), stream to ASP, save in files chunks
Accelerometer, Compass, Gyroscope, Ambient Light Sensor: ASG stream to ASP and save, ASP pull its own and save (head and body senors of user)
GPS: pull from android phone - no point in getting from ASG

Video

  • find and implement better, encrypted video streaming from ASG to ASP
  • save in file chunks (e.g. 30 second blocks) and save reference to each chunk with metadata in database

The state-of-the-art appears to be WebRTC:

Keep in mind that Vuzix Blade and other Android smart glasses we are using now are old Android (Vuzix Blade == 5.1) so we may be limited in libraries we can use on the ASG side.

Audio

  • stream encrypted raw data (or encoded with AAC or similiar) from ASG to ASP
  • save in file chunks (e.g. 30 second blocks) and save reference to each chunk with metadata in database

Other

  • get GPS every n seconds on ASP, save to database
  • Accelerometer, Magnetometer, Gyroscope - get on ASG, stream to ASP, save to chunked files, and reference in database. Also pull on ASP, save to chunked files, and reference in database

ASR on ASP

Part of the move away from streaming sensor data over the internet.

Relies on implementation of #10

ASR on Android

We need to be able to transcribe text locally. ASR takes a lot of RAM, compute, and battery, so it's not realistic to do on the ASG. Streaming audio 8 hours a day every day to the internet takes too much data. This means it must happen on the ASP.

After considerable research by yours truly it seems that the best option to do this in Android is Vosk: https://github.com/alphacep/vosk-api

TODO (after completion of #10)

  • get Vosk Android API demo working and test locally on ASP: https://github.com/alphacep/vosk-android-demo
  • pull Vosk Android libs into ASP app
  • run Vosk on incoming audio stream from ASG and receive transcriptions
  • make transcriptions available to rest of ASP application
  • stream transcriptions to ASG

Add user accounts and secure endpoints with JWT

  • adding user accounts, a frontend to make new accounts
  • saving transcripts with userids
  • JWT to secure endpoints
    JWT token : we need to expose a login http://127.0.0.1:5000/login to which username and password will be passed (eg {"user": "simonexyz", "password": "my_password"}, all this must already be encrypted according to a predefined client / server key), if the user exists in the database, the token must be returned to the user, with which he can encrypt and send the message, a session must have a maximum duration beyond which it will expire and must be generated another token and renewed the login according to the first step, it will also be more useful to organize a gateway that all the bees will point and verify the authentication, from there it will then be sorted into the various services.
    example of autentication -> https://www.youtube.com/watch?v=LKveAwao9HA

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.