Giter Club home page Giter Club logo

speech-to-text's Introduction

speech-to-text

Real-time transcription using faster-whisper

architecture

Accepts audio input from a microphone using a Sounddevice. By using Silero VAD(Voice Activity Detection), silent parts are detected and recognized as one voice data. This audio data is converted to text using Faster-Whisper.

The HTML-based GUI allows you to check the transcription results and make detailed settings for the faster-whisper.

Transcription speed

If the sentences are well separated, the transcription takes less than a second. TranscriptionSpeed

Large-v2 model
Executed with CUDA 11.7 on a NVIDIA GeForce RTX 3060 12GB.

Installation

  1. pip install .

Usage

  1. python -m speech_to_text
  2. Select "App Settings" and configure the settings.
  3. Select "Model Settings" and configure the settings.
  4. Select "Transcribe Settings" and configure the settings.
  5. Select "VAD Settings" and configure the settings.
  6. Start Transcription

If you use the OpenAI API for text proofreading, set OPENAI_API_KEY as an environment variable.

Notes

  • If you select local_model in "Model size or path", the model with the same name in the local folder will be referenced.

Demo

demo

News

2023-06-26

  • Add generate audio files from input sound.
  • Add synchronize audio files with transcription.
    Audio and text highlighting are linked.

2023-06-29

  • Add transcription from audio files.(only wav format)

2023-07-03

  • Add Send transcription results from a WebSocket server to a WebSocket client.
    Example of use: Display subtitles in live streaming.

2023-07-05

  • Add generate SRT files from transcription result.

2023-07-08

  • Add support for mp3, ogg, and other audio files.
    Depends on Soundfile support.
  • Add setting to include non-speech data in buffer.
    While this will increase memory usage, it will improve transcription accuracy.

2023-07-09

  • Add non-speech threshold setting.

2023-07-11

  • Add Text proofreading option via OpenAI API.
    Transcription results can be proofread.

2023-07-12

  • Add feature where audio and word highlighting are synchronized.
    if Word Timestamps is true.

2023-10-01

  • Support for repetition_penalty and no_repeat_ngram_size in transcribe_settings.
  • Updating packages.

2023-11-27

  • Support "large-v3" model.
  • Update faster-whisper requirement to include the latest version "0.10.0".

Todo

  • Save and load previous settings.

  • Use Silero VAD

  • Allow local parameters to be set from the GUI.

  • Supports additional options in faster-whisper 0.8.0

speech-to-text's People

Contributors

reriiasu avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.