scribear / scribear.github.io Goto Github PK

View Code? Open in Web Editor NEW

11.0 4.0 16.0 178.68 MB

Live Transcription for Augmented Reality Glasses

HTML 0.74% CSS 2.36% TypeScript 91.94% JavaScript 4.97%

scribear.github.io's Introduction

permalink: /index.html

ScribeAR

For all other documents and references, see our Box folder. If you can't access Box with your Illinois account, go to https://cloud-dashboard.illinois.edu/cbdash/ and turn U of I Box on.
RampUpTest
WebUIGuide
For all communications, see Slack.
Replace/update this readme as necessary.

Resources

Setup

Download Node.js at nodejs.org. Node comes with npm (Node Package Manager), which we will use for running code on your machine and deploying it to Github Pages. Make sure node and npm are in your PATH and then run node -v and npm -v.

Here are some relevant files worth knowing about. These files are present in any React repository. Most of them are initialized automatically with the command npx create-react-app.

.gitignore lists files that won't be committed to Git. You will likely never touch any of the files listed there (except for notes.txt, which you may create in your local directory to keep any personal notes).
node_modules handles the backend for us. This folder is huge and that’s why we don’t commit it to Git (i.e. it's listed in .gitignore). Don’t touch node_modules directly unless you know what you’re doing.
package.json provides information that (a) npm uses to run the build, (b) npm uses to update our individual node_modules folders, and (c) shows us things like home URL and version number. JSON stands for JavaScript Object Notation and is essentially a JS-style object with key-value pairs.
- The most important parts of this file are:
  - dependencies, which specifies the dependencies and version numbers we are using in our project.
  - scripts, which allows us to use shorthand for some common commands.
- Note that package.json is committed to Git.
  - When you want to add a new dependency to the project, run npm install <dependencyname> and npm will automatically update node_modules and package.json.
  - When you push a commit, you will include this file with any dependencies and scripts that you added (i.e. it's not in .gitignore).
  - When you pull a commit, it may contain updated dependencies or scripts. This information will be stored in this file. Run npm update to apply these changes locally to your node_modules.
    - After your first time cloning the directory, you should be able to just run npm update, which will take a few minutes to create the node_modules folder, then if you run npm start it should load the page in a browser.
  - Essentially, node_modules runs a whole lot of code. But package.json dictates what node_modules should contain.
public directory holds index.html, the home html file. It is automatically connected to src/App.js and you will rarely need to touch the html or anything else in public. manifest.json holds some basic information used by index.html and I don’t really know what it does.
- Any external CSS libraries you want to use should be linked in the index.html file. (Right now there aren't any.)
- All images should be stored in the public directory and referenced in the project as though they are in the same folder, i.e. simply as ./imgname.jpg.
src directory is where 99% of the work is done. App.tsx, the top overarching file, lives here. All components are stored in src/components and most of the code you write is somewhere in here.

When you push a commit, if the commit is ready to deploy to Github Pages, also run npm run deploy and the site at our URL will be updated within a couple minutes.

Relevant Technologies

I prefer to learn through videos so most of the links I'm posting will be videos. I will also post some documentation, which is useful, but imo only as a reference, not a way to learn something new.

ReactJS is a JavaScript library for frontend development, but you can also think of it as a template for organizing code for a web page. You will want to become comfortable with JavaScript if you are not already.

The core idea of React is to separate code into components. For example, on most websites, a header is a component; a sidebar is a component; buttons can be components nested in other components, and so on. This is useful because it allows us to reuse code when we need the same type of components multiple times. It also allows us to render only one component at a time, using much less overhead than re-rendering a whole site. React handles component rendering automatically to optimize it for the page. For example, the Recognition component renders multiple times per second as it gets Speech Recognition results (because we keep updating the component's recognition variable). We don’t want the whole page to re-render every time this happens so we split it into components.
React uses TSX for the actual elements on the page. TSX is a version of JSX which uses Typescript instead of Javascript. While Javascript and Typescript are similar I would definitely recommend having a good grasp on the differences between the two before trying to add code to this project.
State is generally maintained directly in components, except when a global state manager like Redux is used.
- Data can be passed down from parent to child components through props, which are kind of like arguments or parameters passed to the child. However, data cannot be passed up from a child to a parent. This is why React is said to have a unidirectional data flow. One workaround is to pass a function down as a prop, which the child can call to be executed by the parent. This is done in Options, which passes functions to the OnOff, etc. components in order to reuse OnOff for different purposes if necessary.
- Components can modify their own state (via the function setState), and they can modify their children’s props, but they can’t modify their own props.

Here is a look at the component tree for the site so far. If you expand the src/components folder of the code, you’ll see the same structure. Of course, the component tree will change over time.

├ App
├─── API
└─────── WebspeechRecognition
├─────── AzureRecognition
├─── SIDEBAR
├─────── Display
├─────── Phrase
├─────── SpeechToText
├─────── Visualization
├─── TOPBAR
├─────── PickApi
├─────── FullScreen
├─────── MenuHide

Redux is a JavaScript library used to store global state. It is particularly useful alongside React because it solves many of the difficulties created by React’s unidirectional data flow.

Without a global state manager, state can only be stored in components and passed down from one component to another (recall React's unidirectional data flow). So in React alone, if you want to pass state up to a parent component, you have to declare a function in the parent component and pass it down to the child as a prop. This can get very tedious and lead to over-rendering when you have multiple levels of components. A common problem with this is prop drilling, which is when you have to pass props through many levels of components when the components in the middle have no need for the data. For example, the buttons are stored in OnOff, PlusMinus, and Record, but most of the data they control is needed across the entire page. Redux is a great tool to store state globally and avoid tedious/inefficient prop drilling.
In the site, Redux is only used for the options buttons as described above. You can check out the redux directory to see how it’s set up. OnOff, PlusMinus, and Record set state. App and Captions use this state.
Keep in mind that Redux cannot be used directly in React class components. As a workaround, Captions (functional component) gets global state from Redux and passes it down to Recognition (class component) as props.
This video is a great resource for learning Redux in conjunction with React. Everything I know about Redux came from this video, so obviously I’m still at a beginner level with it. Still, the video covers everything you need to know to understand how the site currently uses Redux for global state. Use the video to try to understand what is going on in Options and its child components. Options does not invoke Redux but it does pass functions to its children to be used by Redux.

Speech Recognition

We currently use 3 API's:

Webspeech: is an API available through Reacts library, so it is very simple to get working. It is run asynchronously to the rest of the code and because of this, having it communicate with everything else can be a little tricky. The best way I have found to communicate with it is by using stateRef's which react offers with "React.useRef()"

Azure: is another API we use and is slightly more difficult to work with, as it requires a key and region authentication. It is also run asynchronously so we use "React.useRef()" to communicate with it as well. Azure is much more exciting as a lot of its capabilities are pretty cutting edge, a lot of updates we have planned involve implementing Azure features.

StreamText: is a website that we actually just render with an Iframe. There is almost no coding involved and because it uses an Iframe, there is also very little communication possible. Anyone looking to help with streamtext would probably need to get comfortable with XML requests.

scribear.github.io's People

Contributors

Stargazers

Watchers

Forkers

ap-ack jiaming1999 xinyuliu5566 sxinyu17 coolhands25 sicongzhang113 williamfoster3 monty7 joannah2 tugoph 18nanma cheng-wang2002 socksham lrq3000 aaronhyzhang derinsozen

scribear.github.io's Issues

Mozilla logo transparency and check mark

The Mozilla icon (webspeech) background isn't transparent. Making it transparent would improve appearance.

The green checkmark is also off-center vertically from the other icons.

SRT file functionality

Be able to select a local SRT captioning file and play it through ScribeAR.
Be able to change offsets to line up captions with media.

Plus/Minus buttons -> Sliders with text input

Plus/Minus buttons should be turned into sliders for ease of use. If the user wants to set the setting to a large value, clicking the plus/minus button is not feasible since it only goes up by preset increments. The slider will allow also us to set recommended values within a certain range.

To accommodate for different user needs, we can allow the user to set the value directly by typing it in. The user can click the text value next to the slider to set a custom value. An example of this is the image below.

(Edit Nov 25, 2020): Could add example text under the slider to show the size of the text as the slider changes, even if there is no text visible from recognition.

Confusing api choice button

Maybe it can be something other than this:

Menu buttons should stay static as you scroll

The top of the menu ( < Main Menu > <) should stay static when scrolling so the user doesn't get lost in the menus.

OAuth login asks for too many permissions

This is a lot. Also I think being able to link to university SSOs is better.

Reducing the redux actions, adding more to redux states

Currently, our redux actions only carry a "type" field and our reducers switch based on this field. Our reducers check for this "type" field when checking what needs to be done based on the action. All of the switch case statements will check the action.type anytime an action is called from anywhere, which could slow our processing.

If we add additional fields, and leave the "type" field to specify which reducer the action is meant for, we can reduce the amount of checks that actions need to go through.

Additionally, we can add more properties to our states to simplify our actions and processing.

Body background color not changing when switching from light to dark mode

Idea: Pulling captions out of Zoom call

One of my classes this semester has a zoom call synced with the class. Now the Advantage of that is there is automatic captioning via otter.ai that could be turned into.

Adding preview text for text size changes

Adding some preview text when the text size is changed will help the user change the settings before any text has been transcribed.

Unused code

There are a lot of unused components and imports in our project. We should delete unused code or archive it in some way if we plan on using it later.

Add documentation/blog post comparing different types of mics

So far I've tried different type of microphones and we can document this somewhere for people. Here have been my experiences so far:

Lapel Mic

Pros:

Picks up the professor's voice pretty well
Small form factor
Cons:
The professor often has to wear multiple mics now.
Requires the professor to do some work too.
The professor has to make sure the mic is on.
Has a usb connector dongle (instead of using something like bluetooth)
Slightly fragile

Laptop Mic

Pros:

Doesn't cost anything.
No setup
Picks up both professor as well students around.
Cons:
Bad sound quality (-> meh captions)

Ommidirectional mic

(I tried one lecture with Colin's mic, not sure what model it is but it's in reference to that)

Pros:

Picked up a lot of the professor talking
Picked up a good amount of other students
Bluetooth wireless so I can sit anywhere
Cons:
Not as good as lapel mic on professor
Huge form factor, doesn't even fit in my bag so I have to carry it briefcase style
Takes 5-10 minutes to set up before lecture start
Sometimes unsure if connected correctly. Colin recommended putting it on mute to check and it worked but I had to go back and forth a lot.
Expensive

Default visualization location change

Right now, the audio visualization is defaulted to be at the very left of the screen. This hides the circular visualization under the menu, making the user wonder if the visualization works. If we move the visualization to the middle or the right by default, we can eliminate this confusion.

Webspeech not working on master branch

Webspeech doesn't output captions on the current master branch.

Improve keyboard use of the site

Ideally, I should be able to use the site entirely with keyboard gestures. That way I don't have to drag my mouse all the way to my glasses during class every time I need to change anything (breaks immersion).

Close side menu when clicking on the background

Side menu should close when clicking out of the menu instead of having to click on the button.

Azure model customization

https://docs.microsoft.com/en-us/azure/cognitive-services/speech-service/how-to-custom-speech-train-model

This can be both general, as well as course specific.

Improve recognition accuracy on industry-specific vocabulary and grammar, like medical terminology or IT jargon

This takes in text input. We can scrape slides/syllabus/textbooks for better transcription of CS specific terms.

Define the phonetic and displayed form of a word or term that has nonstandard pronunciation, like product names or acronyms

This probably won't be too often, but maybe some words like Sequel -> SQL can be fixed.

Improve recognition accuracy on speaking styles, accents, or specific background noises

This is audio+text input. For standard American accents, the baseline models are fine, but for specific professors we can fine tune the model. Generally there might be some captioned classes from previous semester (via DRES or similar) that can be put in as trained datasets.

Automatic building and publishing through GitHub Actions

Bottom text cutoff

Basically it goes to the bottom of the screen. So the bottoms of letters like ys and gs get cut out. Make sure the box works with all the available text sizes.

Menu buttons move depending on text in between

The back/forward buttons should stay static to allow users to page through the menus without having to move the mouse.

Firefox Compatibility Error

There's some sort of compatibility issue that might be linked to the API that we're using with speech recognition. There's a type error on line 4 as shown below.

TypeError: SpeechRecognition is not a constructor

./src/components/Captions/Recognition/index.js
src/components/Captions/Recognition/index.js:4

1 | import React from 'react'
2 |
3 | const SpeechRecognition = window.SpeechRecognition || window.webkitSpeechRecognition
4 | const recognition = new SpeechRecognition()
5 | recognition.lang = 'en-US'
6 | recognition.continuous = false
7 | recognition.interimResults = true

Streamtext text cut off

The text when StreamText is activated gets cut off when it reaches the bottom of the screen. The last line is only half visible.

Refactor filenames -> Messy imports

Since we refactored the filenames of the components, we have some messy imports such as

import ToggleButton from '../../ToggleButton/ToggleButton.js'

We can keep the original imports:

import ToggleButton from '../../ToggleButton/'

if we add an index.js file like so:

"use strict";

var _interopRequireDefault = require("@babel/runtime/helpers/interopRequireDefault");

Object.defineProperty(exports, "__esModule", {
  value: true
});
Object.defineProperty(exports, "default", {
  enumerable: true,
  get: function get() {
    return _ToggleButton.default;
  }
});

var _ToggleButton = _interopRequireDefault(require("./ToggleButton"));

(From material-ui components)

Some themes are unreadable

The text should be white instead of black. Some of the themes have black text with a black background, making it impossible to read.

Make click regions for the sources not just the text name but the entire row

Only clicking the inner part actually pulls the menu. If you click the outer part (but still the highlighted row), it does nothing. Probably just make the outer row the click action caller.

Switch paragraph view for line view

I think having all the transcripts in a paragraph can get hard to follow, especially if it's constantly being live corrected (Azure does this more than Mozilla). I'm not sure if this is easy, but even a very rough heuristic (split every 10 words), could help.

Sidebar Menu heading starts on left edge of screen

The audio visualization box flickers when hovering over it without visualization present

Current white screen

I added files from my version and now I got a rendering failure.

Pure black background for Moverio

For normal screens, having a #000000 bg color is too stark, but for Moverio the black turns into transparent. This is useful for seeing what's going on since the current background is slightly visible.

Glasses mode

Using ScribeAR with AR glasses is difficult due to the small screen size shrinking our app. The text on the menu is difficult to read and navigate because of this. We should have a toggleable glasses mode that will:

Increase text size for all of the text in the menus
Increase button size
Possibly streamline the menu by reducing the number of menu options such as Azure setup
Possibly providing most used buttons outside of the sidebar menu, such as at the top of the screen. Text size changes would be good for this. The buttons can appear when the user hovers over a specific area and fade away after.

Azure captions config panel had enter button misaligned

Its not breaking, but it still hides some fields.

Tutorial modal background dissappears

The modal disappears and leaves the overlay on some pats of the screen but not others.

I'm running Google Chrome 85.0.4183.83 on Ubuntu 19.10.

Fix webspeech for now

By replacing the whole Captions folder, the previous Captions folder now named CaptionsNew, we need to find out what's wrong with that folder.

Title of page says "React App" instead of "ScribeAR"

Azure API disconnection

This happens when I'm trying this in lecture. The connection lasts about 5-10 minutes where it either:

says I've gotten the wrong region all of a sudden and then I have to reload
just freezes the captioning. Reloading fixes it as well.

In either case I haven't been able to figure out why this is happening and console logs didn't show anything helpful.

Remove caption area scrollbar

On the AR glasses it looks quite weird since it's just a floating bar in the middle of the air.

Update documentation on APIs

From the main README:

Speech Recognition

Sooner than later, we're going to switch to a different speech recognition engine so I'll keep this short. We are currently using the Web Speech API, which is handled directly through the browser.

Also can be a good point on the pros and cons of each API for users to pick.

OAuth login redirects to a localhost page

Probably something just hard coded somewhere.

Banner changes to "Welcome to ScribeAR" when choosing audio visualization

Split reducers into component reducers

Currently, all of our reducers are located in one large file and are combined in that file. We should split these reducers into component reducers and store them in the component folders that they belong to. This should make our structure a little more modular, since we would only need to add import statements when we create new components.

Clear local storage buton

Users should be able to clear local storage, like their azure key, using website functionality instead of using the browser local storage.

Ability to auto hide or remove header

Once I set up the captioning service I usually inspect element and delete the <head> node when I'm watching lecture because it's blue and stands out. Maybe it can be visible on mouse hover (might make it less accessible) or have a keyboard shortcut to make it visible.

I think the top bar should be hidden as much as possible.