Giter Club home page Giter Club logo

scribear.github.io's Introduction

permalink: /index.html

ScribeAR

Resources

Setup

Download Node.js at nodejs.org. Node comes with npm (Node Package Manager), which we will use for running code on your machine and deploying it to Github Pages. Make sure node and npm are in your PATH and then run node -v and npm -v.

Here are some relevant files worth knowing about. These files are present in any React repository. Most of them are initialized automatically with the command npx create-react-app.

  • .gitignore lists files that won't be committed to Git. You will likely never touch any of the files listed there (except for notes.txt, which you may create in your local directory to keep any personal notes).
  • node_modules handles the backend for us. This folder is huge and that’s why we don’t commit it to Git (i.e. it's listed in .gitignore). Don’t touch node_modules directly unless you know what you’re doing.
  • package.json provides information that (a) npm uses to run the build, (b) npm uses to update our individual node_modules folders, and (c) shows us things like home URL and version number. JSON stands for JavaScript Object Notation and is essentially a JS-style object with key-value pairs.
    • The most important parts of this file are:
      • dependencies, which specifies the dependencies and version numbers we are using in our project.
      • scripts, which allows us to use shorthand for some common commands.
    • Note that package.json is committed to Git.
      • When you want to add a new dependency to the project, run npm install <dependencyname> and npm will automatically update node_modules and package.json.
      • When you push a commit, you will include this file with any dependencies and scripts that you added (i.e. it's not in .gitignore).
      • When you pull a commit, it may contain updated dependencies or scripts. This information will be stored in this file. Run npm update to apply these changes locally to your node_modules.
        • After your first time cloning the directory, you should be able to just run npm update, which will take a few minutes to create the node_modules folder, then if you run npm start it should load the page in a browser.
      • Essentially, node_modules runs a whole lot of code. But package.json dictates what node_modules should contain.
  • public directory holds index.html, the home html file. It is automatically connected to src/App.js and you will rarely need to touch the html or anything else in public. manifest.json holds some basic information used by index.html and I don’t really know what it does.
    • Any external CSS libraries you want to use should be linked in the index.html file. (Right now there aren't any.)
    • All images should be stored in the public directory and referenced in the project as though they are in the same folder, i.e. simply as ./imgname.jpg.
  • src directory is where 99% of the work is done. App.tsx, the top overarching file, lives here. All components are stored in src/components and most of the code you write is somewhere in here.

When you push a commit, if the commit is ready to deploy to Github Pages, also run npm run deploy and the site at our URL will be updated within a couple minutes.

Relevant Technologies

I prefer to learn through videos so most of the links I'm posting will be videos. I will also post some documentation, which is useful, but imo only as a reference, not a way to learn something new.

ReactJS is a JavaScript library for frontend development, but you can also think of it as a template for organizing code for a web page. You will want to become comfortable with JavaScript if you are not already.

  • The core idea of React is to separate code into components. For example, on most websites, a header is a component; a sidebar is a component; buttons can be components nested in other components, and so on. This is useful because it allows us to reuse code when we need the same type of components multiple times. It also allows us to render only one component at a time, using much less overhead than re-rendering a whole site. React handles component rendering automatically to optimize it for the page. For example, the Recognition component renders multiple times per second as it gets Speech Recognition results (because we keep updating the component's recognition variable). We don’t want the whole page to re-render every time this happens so we split it into components.
  • React uses TSX for the actual elements on the page. TSX is a version of JSX which uses Typescript instead of Javascript. While Javascript and Typescript are similar I would definitely recommend having a good grasp on the differences between the two before trying to add code to this project.
  • State is generally maintained directly in components, except when a global state manager like Redux is used.
    • Data can be passed down from parent to child components through props, which are kind of like arguments or parameters passed to the child. However, data cannot be passed up from a child to a parent. This is why React is said to have a unidirectional data flow. One workaround is to pass a function down as a prop, which the child can call to be executed by the parent. This is done in Options, which passes functions to the OnOff, etc. components in order to reuse OnOff for different purposes if necessary.
    • Components can modify their own state (via the function setState), and they can modify their children’s props, but they can’t modify their own props.
  • Here is a look at the component tree for the site so far. If you expand the src/components folder of the code, you’ll see the same structure. Of course, the component tree will change over time.
    ├ App
    ├─── API
    └─────── WebspeechRecognition
    ├─────── AzureRecognition
    ├─── SIDEBAR
    ├─────── Display
    ├─────── Phrase
    ├─────── SpeechToText
    ├─────── Visualization
    ├─── TOPBAR
    ├─────── PickApi
    ├─────── FullScreen
    ├─────── MenuHide
    

Redux is a JavaScript library used to store global state. It is particularly useful alongside React because it solves many of the difficulties created by React’s unidirectional data flow.

  • Without a global state manager, state can only be stored in components and passed down from one component to another (recall React's unidirectional data flow). So in React alone, if you want to pass state up to a parent component, you have to declare a function in the parent component and pass it down to the child as a prop. This can get very tedious and lead to over-rendering when you have multiple levels of components. A common problem with this is prop drilling, which is when you have to pass props through many levels of components when the components in the middle have no need for the data. For example, the buttons are stored in OnOff, PlusMinus, and Record, but most of the data they control is needed across the entire page. Redux is a great tool to store state globally and avoid tedious/inefficient prop drilling.
  • In the site, Redux is only used for the options buttons as described above. You can check out the redux directory to see how it’s set up. OnOff, PlusMinus, and Record set state. App and Captions use this state.
  • Keep in mind that Redux cannot be used directly in React class components. As a workaround, Captions (functional component) gets global state from Redux and passes it down to Recognition (class component) as props.
  • This video is a great resource for learning Redux in conjunction with React. Everything I know about Redux came from this video, so obviously I’m still at a beginner level with it. Still, the video covers everything you need to know to understand how the site currently uses Redux for global state. Use the video to try to understand what is going on in Options and its child components. Options does not invoke Redux but it does pass functions to its children to be used by Redux.

Speech Recognition

We currently use 3 API's:

Webspeech: is an API available through Reacts library, so it is very simple to get working. It is run asynchronously to the rest of the code and because of this, having it communicate with everything else can be a little tricky. The best way I have found to communicate with it is by using stateRef's which react offers with "React.useRef()"

Azure: is another API we use and is slightly more difficult to work with, as it requires a key and region authentication. It is also run asynchronously so we use "React.useRef()" to communicate with it as well. Azure is much more exciting as a lot of its capabilities are pretty cutting edge, a lot of updates we have planned involve implementing Azure features.

StreamText: is a website that we actually just render with an Iframe. There is almost no coding involved and because it uses an Iframe, there is also very little communication possible. Anyone looking to help with streamtext would probably need to get comfortable with XML requests.

scribear.github.io's People

Contributors

williamfoster3 avatar yunwang-yunw3 avatar angrave avatar heaper327 avatar yuxuanjerrychen01 avatar ap-ack avatar jiaming1999 avatar jonili99 avatar 18nanma avatar ammpr avatar timurjavid avatar faerryn avatar aashiagrawal avatar dependabot[bot] avatar joannah2 avatar sxinyuliu avatar abh1t avatar hilkiu2 avatar sicongzhang113 avatar

Stargazers

Derin Sozen avatar Stephen Karl Larroque avatar  avatar ykc avatar chocolateman avatar  avatar  avatar Harsh Deep avatar Scott Turro avatar Nikhil Richard avatar  avatar

Watchers

James Cloos avatar  avatar  avatar Jingjin Wang avatar

scribear.github.io's Issues

Mozilla logo transparency and check mark

icon

The Mozilla icon (webspeech) background isn't transparent. Making it transparent would improve appearance.

The green checkmark is also off-center vertically from the other icons.

SRT file functionality

  • Be able to select a local SRT captioning file and play it through ScribeAR.
  • Be able to change offsets to line up captions with media.

Plus/Minus buttons -> Sliders with text input

slider
slider2

Plus/Minus buttons should be turned into sliders for ease of use. If the user wants to set the setting to a large value, clicking the plus/minus button is not feasible since it only goes up by preset increments. The slider will allow also us to set recommended values within a certain range.

To accommodate for different user needs, we can allow the user to set the value directly by typing it in. The user can click the text value next to the slider to set a custom value. An example of this is the image below.

slider3

(Edit Nov 25, 2020): Could add example text under the slider to show the size of the text as the slider changes, even if there is no text visible from recognition.

Reducing the redux actions, adding more to redux states

Currently, our redux actions only carry a "type" field and our reducers switch based on this field. Our reducers check for this "type" field when checking what needs to be done based on the action. All of the switch case statements will check the action.type anytime an action is called from anywhere, which could slow our processing.

type

If we add additional fields, and leave the "type" field to specify which reducer the action is meant for, we can reduce the amount of checks that actions need to go through.

Additionally, we can add more properties to our states to simplify our actions and processing.

Idea: Pulling captions out of Zoom call

One of my classes this semester has a zoom call synced with the class. Now the Advantage of that is there is automatic captioning via otter.ai that could be turned into.

Unused code

There are a lot of unused components and imports in our project. We should delete unused code or archive it in some way if we plan on using it later.

Add documentation/blog post comparing different types of mics

So far I've tried different type of microphones and we can document this somewhere for people. Here have been my experiences so far:

Lapel Mic

Pros:

  • Picks up the professor's voice pretty well
  • Small form factor
    Cons:
  • The professor often has to wear multiple mics now.
  • Requires the professor to do some work too.
  • The professor has to make sure the mic is on.
  • Has a usb connector dongle (instead of using something like bluetooth)
  • Slightly fragile

Laptop Mic

Pros:

  • Doesn't cost anything.
  • No setup
  • Picks up both professor as well students around.
    Cons:
  • Bad sound quality (-> meh captions)

Ommidirectional mic

(I tried one lecture with Colin's mic, not sure what model it is but it's in reference to that)

Pros:

  • Picked up a lot of the professor talking
  • Picked up a good amount of other students
  • Bluetooth wireless so I can sit anywhere
    Cons:
  • Not as good as lapel mic on professor
  • Huge form factor, doesn't even fit in my bag so I have to carry it briefcase style
  • Takes 5-10 minutes to set up before lecture start
  • Sometimes unsure if connected correctly. Colin recommended putting it on mute to check and it worked but I had to go back and forth a lot.
  • Expensive

Default visualization location change

Right now, the audio visualization is defaulted to be at the very left of the screen. This hides the circular visualization under the menu, making the user wonder if the visualization works. If we move the visualization to the middle or the right by default, we can eliminate this confusion.

visualizationbehindmenu

Improve keyboard use of the site

Ideally, I should be able to use the site entirely with keyboard gestures. That way I don't have to drag my mouse all the way to my glasses during class every time I need to change anything (breaks immersion).

Azure model customization

https://docs.microsoft.com/en-us/azure/cognitive-services/speech-service/how-to-custom-speech-train-model

This can be both general, as well as course specific.

Improve recognition accuracy on industry-specific vocabulary and grammar, like medical terminology or IT jargon

This takes in text input. We can scrape slides/syllabus/textbooks for better transcription of CS specific terms.

Define the phonetic and displayed form of a word or term that has nonstandard pronunciation, like product names or acronyms

This probably won't be too often, but maybe some words like Sequel -> SQL can be fixed.

Improve recognition accuracy on speaking styles, accents, or specific background noises

This is audio+text input. For standard American accents, the baseline models are fine, but for specific professors we can fine tune the model. Generally there might be some captioned classes from previous semester (via DRES or similar) that can be put in as trained datasets.

Bottom text cutoff

image

Basically it goes to the bottom of the screen. So the bottoms of letters like ys and gs get cut out. Make sure the box works with all the available text sizes.

Firefox Compatibility Error

There's some sort of compatibility issue that might be linked to the API that we're using with speech recognition. There's a type error on line 4 as shown below.

TypeError: SpeechRecognition is not a constructor

./src/components/Captions/Recognition/index.js
src/components/Captions/Recognition/index.js:4

1 | import React from 'react'
2 |
3 | const SpeechRecognition = window.SpeechRecognition || window.webkitSpeechRecognition
4 | const recognition = new SpeechRecognition()
5 | recognition.lang = 'en-US'
6 | recognition.continuous = false
7 | recognition.interimResults = true

Streamtext text cut off

cut_off

The text when StreamText is activated gets cut off when it reaches the bottom of the screen. The last line is only half visible.

Refactor filenames -> Messy imports

Since we refactored the filenames of the components, we have some messy imports such as

import ToggleButton from '../../ToggleButton/ToggleButton.js'

We can keep the original imports:

import ToggleButton from '../../ToggleButton/'

if we add an index.js file like so:

"use strict";

var _interopRequireDefault = require("@babel/runtime/helpers/interopRequireDefault");

Object.defineProperty(exports, "__esModule", {
  value: true
});
Object.defineProperty(exports, "default", {
  enumerable: true,
  get: function get() {
    return _ToggleButton.default;
  }
});

var _ToggleButton = _interopRequireDefault(require("./ToggleButton"));

(From material-ui components)

Some themes are unreadable

The text should be white instead of black. Some of the themes have black text with a black background, making it impossible to read.

black theme

Switch paragraph view for line view

I think having all the transcripts in a paragraph can get hard to follow, especially if it's constantly being live corrected (Azure does this more than Mozilla). I'm not sure if this is easy, but even a very rough heuristic (split every 10 words), could help.

Pure black background for Moverio

For normal screens, having a #000000 bg color is too stark, but for Moverio the black turns into transparent. This is useful for seeing what's going on since the current background is slightly visible.

Glasses mode

Using ScribeAR with AR glasses is difficult due to the small screen size shrinking our app. The text on the menu is difficult to read and navigate because of this. We should have a toggleable glasses mode that will:

  • Increase text size for all of the text in the menus
  • Increase button size
  • Possibly streamline the menu by reducing the number of menu options such as Azure setup
  • Possibly providing most used buttons outside of the sidebar menu, such as at the top of the screen. Text size changes would be good for this. The buttons can appear when the user hovers over a specific area and fade away after.

Fix webspeech for now

By replacing the whole Captions folder, the previous Captions folder now named CaptionsNew, we need to find out what's wrong with that folder.

Azure API disconnection

This happens when I'm trying this in lecture. The connection lasts about 5-10 minutes where it either:

  • says I've gotten the wrong region all of a sudden and then I have to reload
  • just freezes the captioning. Reloading fixes it as well.

In either case I haven't been able to figure out why this is happening and console logs didn't show anything helpful.

Update documentation on APIs

From the main README:

Speech Recognition

Sooner than later, we're going to switch to a different speech recognition engine so I'll keep this short. We are currently using the Web Speech API, which is handled directly through the browser.

Also can be a good point on the pros and cons of each API for users to pick.

Split reducers into component reducers

Currently, all of our reducers are located in one large file and are combined in that file. We should split these reducers into component reducers and store them in the component folders that they belong to. This should make our structure a little more modular, since we would only need to add import statements when we create new components.

reducers

Clear local storage buton

Users should be able to clear local storage, like their azure key, using website functionality instead of using the browser local storage.

Ability to auto hide or remove header

Once I set up the captioning service I usually inspect element and delete the <head> node when I'm watching lecture because it's blue and stands out. Maybe it can be visible on mouse hover (might make it less accessible) or have a keyboard shortcut to make it visible.

I think the top bar should be hidden as much as possible.

Save keys form last use

For example if I'm using Azure with a region and key, it should have those saved and autoconnect.

Autoscroll doesn't work anymore

On v6.5 the captions don't autoscroll. This is a major impact on usability and I don't think I could actually use it for lecture till this is fixed.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.