Giter Club home page Giter Club logo

synviz's Introduction

Synviz

Devpost submission

Synviz is an IoT device that uses state of the art artificial intelligence to decode text from the movement of a speaker's mouth.

Inspiration

There were two primary sources of inspiration. The first one was a paper published by University of Oxford researchers, who proposed a state of the art deep learning pipeline to extract spoken language from video. The paper can be found here. The repo for the model used as a base template can be found here.

The second source of inspiration is an existing product on the market, Focals by North. Focals are smart glasses that aim to put the important parts of your life right in front of you through a projected heads up display. We thought it would be a great idea to build onto a platform like this through adding a camera and using artificial intelligence to gain valuable insights about what you see, which in our case, is deciphering speech from visual input.

Pipeline Overview

  • The user presses the button on the glasses to start a recording
  • The user clicks the button again to stop recording
  • The data is passed to a Google Cloud Platform bucket as an mp4 file
  • Simultaneously, the glasses ping the Flask backend server to let it know there's something to be processed
  • The backend downloads the video file
  • The backend runs the video through a Haar Cascade classifier to detect a face
  • The video is cropped so that it tracks the mouth of the speaker
  • The cropped video is fed through a transformer network to get a transcript
  • The backend passes the transcript and file URL to the frontend through a socket
  • The frontend displays the transcript it got from the backend, and also allows playback of the mp4 file found on Google Cloud Platform

Use Cases

  • For individuals who are hard-of hearing or deaf
  • Noisy environments where automatic speech recognition is difficult
  • Combined with speech recognition for ultra-accurate, real-time transcripts
  • Language learners who want a transcript or translation

Social Impact

This hack can help in situations where communication is difficult. One of the most promising use cases is when this technology is combined with automatic speech recognition. All-in-one solutions for real-time transcription and translation are becoming more viable as our technology progresses. This proof-of-concept is another key piece that would improve human computer interaction.

Next Steps

With stronger on-board battery, 5G network connection, and a computationally stronger compute server, we believe it will be possible to achieve near real-time transcription from a video feed that can be implemented on an existing platform like North's Focals to deliver a promising business appeal.

The Team

synviz's People

Contributors

w29ahmed avatar willyumlu avatar sinclairhudson avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.