Giter Club home page Giter Club logo

pyh2r's Introduction

Hear to Read (pyh2r) ๐Ÿ—ฃ๏ธ๐Ÿ“œ

Hear to Read is a Flask server that uses OpenAI Whisper (offline) to transscipe voice messages into readable text.

๐Ÿค” Why?

I'am a fighting voice messengers for years, they are the wild west of communication. People talking sometimes forget what they were saying halfway through.

"Where is the relevant information!?", i asked! man ey!

Nobody has time to listen to a 3min voice message, just to know that my brother thinks of visiting my parents maybe next weekend and if I want to join, if they mgith go there when the weather is good enought, because the last days it was not that good and the kids like the outside... - Nope sorry - my eyes can scrim through text much faster than my ears can listen to a voice message. So I decided to fight back and created this project.

I've also brewed up a simple Telegram bot implementation, serving as an easy-to-use front end. Simply forward those voice messages to the bot, and voila! It replies with clear text. ๐Ÿš€

๐Ÿ’ก How It Works

  • Voice messages are sent to the Telegram bot.
  • Telegram Bot forwards the voice message to the Flask server.
  • Flask server uses OpenAI Whisper to transcribe the voice message.
  • And finally, the Flask server sends the transcribed text back to the Telegram bot, which replies to the user.

๐Ÿš€ Quick Start with Docker

cd <repo>
docker build -f docker/Dockerfile -t pyh2r .
docker run -d \
    --name hear_to_read \
    -e PYH2R_TELEGRAM_BOT_API_TOKEN=YOUR_TELEGRAM_BOT_API_TOKEN \ # required
    -e PYH2R_STORE_AUDIO_FILES=1 \ # Optional, enables storing audio files
    -v ./uploaded_files:/app/uploaded_files \ # Persistent store uploaded files
    pyh2r

Performance

Two things to note:

  • OpenAI Whisper is an offline model, depending on your hardware and the length of the voice message, it might some time.
  • OpenAI Whisper is not perfect
  • OpenAI Whisper comes in differenct flavors, the one used here is the smallest one, so it might not be as accurate as the bigger ones.
  • When you got a nvidia GPU, speed will massively increase! just add --gpus all to the docker run command, make sure you got nvidia-docker setup.

Runtime numbers

Model Subjective quality Runtime on Pi-CM4 Runtime on vServer
tiny poor, but enough for me 346.94 18.08
small good 253.89 20.12
base good 265.60 16.65
medium better 250.23 16.76
large-v3 best 291.82 16.69

Notes:

  • 30s voice message, while walking fast with background noise
  • Pi-CM4: Raspberry Pi Compute Module 4
  • vServer: [email protected] old Xeon VM
  • Only one experiment, no warmup, no average

pyh2r's People

Contributors

fwarmuth avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.