Giter Club home page Giter Club logo

gustavostz / whisper-clip Goto Github PK

View Code? Open in Web Editor NEW
67.0 2.0 7.0 2.58 MB

WhisperClip simplifies your life by automatically transcribing audio recordings and saving the text directly to your clipboard. With just a click of a button, you can effortlessly convert spoken words into written text, ready to be pasted wherever you need it. This application harnesses the power of OpenAI’s Whisper for free.

Home Page: https://openai.com/research/whisper

License: MIT License

Python 100.00%
audio-processing audio-transcription clipboard openai productivity productivity-tools python speech-recognition speech-to-text whisper

whisper-clip's Introduction

WhisperClip: One-Click Audio Transcription

Example using WhisperClip

WhisperClip simplifies your life by automatically transcribing audio recordings and saving the text directly to your clipboard. With just a click of a button, you can effortlessly convert spoken words into written text, ready to be pasted wherever you need it. This application harnesses the power of OpenAI's Whisper for free, making transcription more accessible and convenient.

Table of Contents

Features

  • Record audio with a simple click.
  • Automatically transcribe audio using Whisper (free).
  • Option to save transcriptions directly to the clipboard.

Installation

Prerequisites

  • Python 3.8 or higher
  • CUDA is highly recommended for better performance but not necessary. WhisperClip can also run on a CPU.

Setting Up the Environment

  1. Clone the repository:

    git clone https://github.com/gustavostz/whisper-clip.git
    cd whisper-clip
    
  2. Install PyTorch if you don't have it already. Refer to PyTorch's website for installation instructions.

  3. Install the required dependencies:

    pip install -r requirements.txt
    

Choosing the Right Model

Based on your GPU's VRAM, choose the appropriate Whisper model for optimal performance. Below is a table of available models with their required VRAM and relative speed:

Size Required VRAM Relative speed
tiny ~1 GB ~32x
base ~1 GB ~16x
small ~2 GB ~6x
medium ~5 GB ~2x
large ~10 GB 1x

For English-only applications, .en models (e.g., tiny.en, base.en) tend to perform better.

To change the model, modify the model_name variable in config.json to the desired model name.

Usage

Run the application:

python main.py
  • Click the microphone button to start and stop recording.
  • If "Save to Clipboard" is checked, the transcription will be copied to your clipboard automatically.

Configuration

  • The default shortcut for toggling recording is Alt+Shift+R. You can modify this in the config.json file.
  • You can also change the Whisper model used for transcription in the config.json file.

Feedback

If there's interest in a more user-friendly, executable version of WhisperClip, I'd be happy to consider creating one. Your feedback and suggestions are welcome! Just let me know through the GitHub issues.

Acknowledgments

This project uses OpenAI's Whisper for audio transcription.

whisper-clip's People

Contributors

gustavostz avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

whisper-clip's Issues

Unable to run normally after clicking the button

I followed the readme steps before proceeding with the operation.
After I finish executing Python main.py
Pop up UI interface with microphone
Click on the microphone

python main.py
Exception in thread Thread-6 (record_audio):
Traceback (most recent call last):
File "C:\Users\qaz21\AppData\Local\Programs\Python\Python312\Lib\threading.py", line 1073, in _bootstrap_inner
self.run()
File "C:\Users\qaz21\AppData\Local\Programs\Python\Python312\Lib\threading.py", line 1010, in run
self._target(*self._args, **self._kwargs)
File "C:\Codes\whisper-clip\audio_recorder.py", line 96, in record_audio
with sd.InputStream(callback=self.audio_callback):
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\qaz21\AppData\Local\Programs\Python\Python312\Lib\site-packages\sounddevice.py", line 1421, in init
_StreamBase.init(self, kind='input', wrap_callback='array',
File "C:\Users\qaz21\AppData\Local\Programs\Python\Python312\Lib\site-packages\sounddevice.py", line 817, in init
_get_stream_parameters(kind, device, channels, dtype, latency,
File "C:\Users\qaz21\AppData\Local\Programs\Python\Python312\Lib\site-packages\sounddevice.py", line 2660, in get_stream_parameters
info = query_devices(device)
^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\qaz21\AppData\Local\Programs\Python\Python312\Lib\site-packages\sounddevice.py", line 569, in query_devices
raise PortAudioError(f'Error querying device {device}')
sounddevice.PortAudioError: Error querying device -1
Exception in Tkinter callback
Traceback (most recent call last):
File "C:\Users\qaz21\AppData\Local\Programs\Python\Python312\Lib\tkinter_init
.py", line 1967, in call
return self.func(*args)
^^^^^^^^^^^^^^^^
File "C:\Codes\whisper-clip\audio_recorder.py", line 54, in toggle_recording
self.stop_recording()
File "C:\Codes\whisper-clip\audio_recorder.py", line 69, in stop_recording
audio_data = np.concatenate(self.recordings)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ValueError: need at least one array to concatenate
Exception in thread Thread-7 (record_audio):
Traceback (most recent call last):
File "C:\Users\qaz21\AppData\Local\Programs\Python\Python312\Lib\threading.py", line 1073, in _bootstrap_inner
self.run()
File "C:\Users\qaz21\AppData\Local\Programs\Python\Python312\Lib\threading.py", line 1010, in run
self._target(*self._args, **self._kwargs)
File "C:\Codes\whisper-clip\audio_recorder.py", line 96, in record_audio
with sd.InputStream(callback=self.audio_callback):
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\qaz21\AppData\Local\Programs\Python\Python312\Lib\site-packages\sounddevice.py", line 1421, in init
_StreamBase.init(self, kind='input', wrap_callback='array',
File "C:\Users\qaz21\AppData\Local\Programs\Python\Python312\Lib\site-packages\sounddevice.py", line 817, in init
_get_stream_parameters(kind, device, channels, dtype, latency,
File "C:\Users\qaz21\AppData\Local\Programs\Python\Python312\Lib\site-packages\sounddevice.py", line 2660, in get_stream_parameters
info = query_devices(device)
^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\qaz21\AppData\Local\Programs\Python\Python312\Lib\site-packages\sounddevice.py", line 569, in query_devices
raise PortAudioError(f'Error querying device {device}')
sounddevice.PortAudioError: Error querying device -1
Exception in Tkinter callback
Traceback (most recent call last):
File "C:\Users\qaz21\AppData\Local\Programs\Python\Python312\Lib\tkinter_init
.py", line 1967, in call
return self.func(*args)
^^^^^^^^^^^^^^^^
File "C:\Codes\whisper-clip\audio_recorder.py", line 54, in toggle_recording
self.stop_recording()
File "C:\Codes\whisper-clip\audio_recorder.py", line 69, in stop_recording
audio_data = np.concatenate(self.recordings)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ValueError: need at least one array to concatenate

Create a Video Demonstration of WhisperClip Usage

Description

We are looking for a contributor to create a video demonstration showing how to use WhisperClip. The video should cover the installation process, basic usage, and highlight the key features of the application.

Requirements

  • The video should be clear and easy to understand.
  • It should cover the following:
    • Cloning the repository and setting up the environment.
    • Running the application.
    • Starting and stopping audio recording.
    • Enabling the "Save to Clipboard" option and demonstrating its use.
    • Changing the default shortcut and Whisper model in the config.json file (optional).
  • The video can be of any length, as long as it effectively demonstrates how to use WhisperClip.

Reward

We value your contribution! If your video is selected, we will feature it in the Home Page (ReadMe section) of the WhisperClip GitHub repository. This could be a great opportunity for marketing your channel or video, as it will be visible to all visitors of the project.

Submission

Please submit your video by adding a comment to this issue with a link to the video (e.g., YouTube, Vimeo). We look forward to seeing your submissions!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.