Improve name rememberer / facial recognition system

save face rec bounding box in database and in face encoding object
- and display cropped face in face rec ui, so we can tag multiple multiple people

ASP application

The current ASP application is basically a souped up MediaPipe example running our custom MediaPipe Perception Pipeline.

We need to make it reliably run on the ASP for many hours/days at a time, in the background, always available to receive a connection and/or requests from the ASG.

TODO

get app editing/building in Android studio - not essential but will help development speed
get current MediaPipe inferencing and ASG socket stuff running in an Android Service - determine the best type of service (Background, Foreground, Bound, etc.) to do the job. Likely needs a Foreground service like here
start saving incoming images/video stream from ASG into a file structure that makes sense
setup database on Android with SqlLite or Realm
save image data (file location, timestamp, metadata) in database
setup a WebSocket connection on ASP that will run always in background - this will accept sensor data, function requests, and push data to the ASG
stream audio data from ASG to ASP - either over raw TCP socket or over the new WebSocket above

Mapping - Map overlay, voice command directions

Pulling out one's phone all the time while navigating a new place is difficult, dangerous, and forces one to stop in their journey. This is made worse if the individual is operating some vehicle (bicycle, car) where they must disengage all together.

The current idea:

User issues a voice command, asking for directions to their desired location. " directions to x"
The system pulls up a map, centered on the current users position, and with the direction path highlighted. The system also output speech to give the user turn-by-turn directions.

Implementation

OsmAnd Mapping app for Android combined with OsmAnd-Api

OsmAnd~ app - https://f-droid.org/en/packages/net.osmand.plus/ (this is like the Google Play OsmAnd+ version, but ~ for devs)

Osm-and-api: https://github.com/osmandapp/osmand-api-demo/
^this demo app runs the app IN ITSELF - it doesn't start the OsmAnd activity with some arguments (like running a program that takes over) but OsmAnd maps show up directly in the parents app/activity

Use OsmAndApi to call OsmAnd and display OsmAnd maps in the app. Use the "Navigation" -> "Navigate and Search"

For see the folder osmand-api-demo in the osmand-api-demo repo and read the README.md in that folder for instructions on how to use the osmand-api

Audio/Visual/Generic-Sensor low-latency internet streaming

We need a low-latency, reliable method of streaming audio data, visual data, and generic sensory data (e.g. GPS, accelerometer, EEG, heart rate, etc.) over the internet.

The state-of-the-art appears to be WebRTC:
- app streaming WebRTC from Vuzix M400: https://github.com/dalkofjac/smartglassy-demo-foi
- android WebRTC: https://github.com/renyuzhuo/WebRTC-Android-Learn

Keep in mind that Vuzix Blade and other Android smart glasses we are using now are old Android (Vuzix Blade == 5.1) so we may be limited in libraries we can use on the ASG side.

Write docs

There is a lot of docs here in the 4 READMEs. But, there's some problems.

Problems

there are 4 READMEs
the text wall makes it hard to find what you need when you need it
the system has significantly evolved and thus some of the older README content isn't applicable, and some new information is missing

Solutions

migrate docs to Read the docs - https://docs.readthedocs.io/en/stable/tutorial/
Write a "Before you start" guide
Write a "Setup" guide for those just setting up using APK download
Write a "Developer" setup guide for setting up ASP environment, ASG environment, and GLBOX

KeyError: 'summary', calling the web service

Calling the web service it responds with an error:
KeyError: 'summary'
step to play:
1 - after downloading the DEV branch, raise the [main_webserver.py] service
2 - call with a client (e.g. postman) http://127.0.0.1:5000/semantic_web_speech
3 - pass the {"text": "Giovanni Capuozzo"} into the request body

proposta fix:
main_tools.py line "68-76"


        #limit summary
        for entity in entity_list:
            try:    
                 " ".join(entity["summary"].split(" ")[:self.summary_limit]) + "..."
            except KeyError:
                entity["summary"] = ""
                # entity["summary"] = "summary not aviable"
            pass

Image Search

Use voice command to search the web for text and receive a GridView of images displayed.

Voice Command on ASP

Note: this relies on completion of #11

This is part of the move to no longer streaming sensor data over the internet.

Voice Command on ASP

The entire voice command is written in Python on the GLBOX. However, text parsing is not very complicated, so porting to Android should be straightforward, and doing so will avoid the latency and wastefulness of transcribing locally then streaming results to the web for processing.

TODO

ignore

Test

Use standalone?

Is it possible to use this without smart glasses?

Memory Expansion Tools

Build a system that improves users memories and thinking by recording everything a user experiences and creating a searchable, computable, associative knowledge representation of that information.

Contact [email protected] for more details.

Multi-user

Make the system work for multiple users at once:

implement a "listener" that only starts up services (speech to text, etc.) when a connection is made and kills services when no connection has been made for n seconds - can be done with a master REST server and containerized instances of the main.py system in ./gnu_linux_box
modify "listener" to be able to host multiple sessions from different users that identify themselves with a unique key/username/JWT/login

Mobility - GLBOX shrink or dissapear

An issue with the usability, we arability, and mobility of the system is the need to carry the GLBOX laptop around everywhere.

It's heavy, bulky, requires a bag to carry, and thus isn't highly mobile.

There are two main options here (please feel free to suggest other options):

Move the GLBOX to a smaller wearable that fits in pocket
- all SBCs that I know of which have compute are too big (i.e. RaspberryPi 4 is no better than laptop as it won't fit in pocket)
- Running GNU/Linux on Android phone hardware possible but buggy, unsupported, probably cause a lot of tangential issues
- Pi Zero W (or similiar GNU/Linux SBCs) are small, low power, and could do it, but they are so computationally weak that we would have to stream video to the cloud or the ASP anyway for any intense computer vision
Move GLBOX to the cloud
- requires implementing audio streaming from ASG to GLBOX over internet
- introduces latency on transcription and command response
- will use a lot of data, means the system will now only work with an internet connection
- better long term - means that users don't have to buy 2 pieces of hardware, and unlimited, high speed data is a reasonable assumption for 5 years out

Decision

2, move GLBOX to the cloud

Why?

For a number of reason, #2 makes more sense:

requiring another piece of hardware is a huge deterrent to use
- expensive
- another thing to carry
- another thing to keep charged
- increased system complexity and mental load
no current mobile GNU/Linux implementation is all three of:
1. Powerful enough to run heavy compute at small size and power
2. Proper form factor to fit in a pocket (smartphone form factor)
3. Reliable
  -in the future this may be possible with mobile GNU/Linux system (running GNU/Linux on mobile chipsets - current implementations (Ubuntu Touch, PostmarketOS, etc.) are not ready for production
cloud implementation is more agile, can constantly make changes/push features/etc.

How?

The GLBOX code is already a Python socket server running in GNU/Linux, so moving it to the "the cloud" is almost already done. We just need to modify a few things to make it work on a cloud server as opposed to a wearable server.

MVP

audio capture on ASG that is streamed to the GLBOX (as opposed to current scheme where audio streams direct to GLBOX via Bluetooth)
finish implementing "heart beats" (pings) on both GLBOX and ASG sides (almost done)
switch from current LAN advertising scheme to referencing GLBOX with domain name

Further

finish implementing a streaming VAD which will turn off speech to text service when there is no voice, turn speech to text back on when there is a voice, and will hold a circular buffer of audio so no transcriptions are lost

Vision Upgrades - Thermal Vision, Night Vision, Eyes-in-back-head

Give the user the ability to see in all kinds of extended sensory spaces - augmediated vision [1].

The most straightforward way to do this is to buy USB cameras, connect these directly to the ASG via USB, and stream a live view of that camera to the display.

Deliverables

Get USB camera connected and streaming to ASG
Get live viewfinder view of the camera output on ASG screen
Integrate this view into the WIS such that it can be pulled up with a single voice command - " switch modes extended vision"
integrate (connect, stream, view, attach to hardware) regular camera - eyes in the back of the head
integrate thermal vision
integrate binocular vision
integrate magnifying vision
integrate audio vision

References

[1]: Steve Mann - https://spectrum.ieee.org/steve-mann-my-augmediated-life, https://www.instructables.com/Augmented-Reality-Eyeglass-With-Thermal-Vision-Bui/

Live Life Captions - enhanced interface

use ring mouse to implement upgraded interface with text - scroll through words
ability to define any word that has appeared in the stream
implement a user vocabulary and auto-define words that are rare and haven't been seen before

Sensors - retrieve, stream, and save

We want to collect data from a number of sensors on the ASG and ASP and save them in a database.

Some sensors are sparse in time and can be saved directly to the database (e.g. GPS). Others are time series signals and must be saved to files that are referenced by the database (e.g. video).

Sensors

Audio: pull from microphone, encode audio on ASG, stream to ASP, save locally (how to save - in file chunks?)
Video: get from ASG camera in background (AndroidHiddenCamera), stream to ASP, save in files chunks
Accelerometer, Compass, Gyroscope, Ambient Light Sensor: ASG stream to ASP and save, ASP pull its own and save (head and body senors of user)
GPS: pull from android phone - no point in getting from ASG

Video

find and implement better, encrypted video streaming from ASG to ASP
save in file chunks (e.g. 30 second blocks) and save reference to each chunk with metadata in database

The state-of-the-art appears to be WebRTC:

app streaming WebRTC from Vuzix M400: https://github.com/dalkofjac/smartglassy-demo-foi
android WebRTC: https://github.com/renyuzhuo/WebRTC-Android-Learn

Keep in mind that Vuzix Blade and other Android smart glasses we are using now are old Android (Vuzix Blade == 5.1) so we may be limited in libraries we can use on the ASG side.

Audio

stream encrypted raw data (or encoded with AAC or similiar) from ASG to ASP
save in file chunks (e.g. 30 second blocks) and save reference to each chunk with metadata in database

Other

get GPS every n seconds on ASP, save to database
Accelerometer, Magnetometer, Gyroscope - get on ASG, stream to ASP, save to chunked files, and reference in database. Also pull on ASP, save to chunked files, and reference in database

ASR on ASP

Part of the move away from streaming sensor data over the internet.

Relies on implementation of #10

ASR on Android

We need to be able to transcribe text locally. ASR takes a lot of RAM, compute, and battery, so it's not realistic to do on the ASG. Streaming audio 8 hours a day every day to the internet takes too much data. This means it must happen on the ASP.

After considerable research by yours truly it seems that the best option to do this in Android is Vosk: https://github.com/alphacep/vosk-api

TODO (after completion of #10)

get Vosk Android API demo working and test locally on ASP: https://github.com/alphacep/vosk-android-demo
pull Vosk Android libs into ASP app
run Vosk on incoming audio stream from ASG and receive transcriptions
make transcriptions available to rest of ASP application
stream transcriptions to ASG

adding user accounts, a frontend to make new accounts
saving transcripts with userids
JWT to secure endpoints
JWT token : we need to expose a login http://127.0.0.1:5000/login to which username and password will be passed (eg {"user": "simonexyz", "password": "my_password"}, all this must already be encrypted according to a predefined client / server key), if the user exists in the database, the token must be returned to the user, with which he can encrypt and send the message, a session must have a maximum duration beyond which it will expire and must be generated another token and renewed the login according to the first step, it will also be more useful to organize a gateway that all the bees will point and verify the authentication, from there it will then be sorted into the various services.
example of autentication -> https://www.youtube.com/watch?v=LKveAwao9HA

emexlabs / wearableintelligencesystem Goto Github PK

wearableintelligencesystem's Introduction

Wearable Intelligence System

ARCHIVED. The WIS has been been reorganized and upgraded to be faster, cleaner, and support more smart glasses as the SmartGlassesManager: https://github.com/TeamOpenSmartGlasses/SmartGlassesManager/

Beta Version Video

Early Alpha Version Video

What It Can Do Now

User Features

Developer Use

How To Use

Voice Commands

Wakeword

Voice Commands

Some of the available voice commands:

Abbreviations

Install / Use

First Time Setup

Normal Use

Documentation / Developers

Authors

wearableintelligencesystem's People

Contributors

Stargazers

Watchers

Forkers

wearableintelligencesystem's Issues

TODO

The current idea:

Implementation

Problems

Solutions

Voice Command on ASP

TODO

Decision

Why?

How?

MVP

Further

Deliverables

References

Sensors

Video

Audio

Other

Part of the move away from streaming sensor data over the internet.

ASR on Android

TODO (after completion of #10)

Recommend Projects

Recommend Topics

Recommend Org