davabase / whisper_real_time Goto Github PK

View Code? Open in Web Editor NEW

2.1K 2.1K 370.0 33.52 MB

Real time transcription with OpenAI Whisper.

Python 100.00%

whisper_real_time's People

Contributors

Stargazers

Watchers

Forkers

monicaarnaud akayalml yugaljain1999 sangsin condestable2000 clf2588 calcoloergosum johnciubuc access6080 nelsonscott karlmusingo adocampo rgga-16 amitkml jgomez696 slush0 tcs-mturpin cafew gaohuan2015 slay2k wilfoderek summer1704 tarunexl1234 summerflowers xudaiyanzi nilp0inter kewingj nvogler ukaserge cameronbergh pcc-cis-234a glowinthedark aisu-wata0 wojciechtyczynski yingweiy jakvb makermelissa bobthedev xunnew mdek tsukumonasu niittymaa sixset stephanyvargas jarodmica kamikuz eliottvaldes antalgu zhanyanjie6796 haloha123 scheung38 iwaqaruddin jonarroh girinman veryquant kondounagi orangetin aslanf8 anasjaber moktarino shikibu9419 alex87 tinkernamedferro qsteeler darkcount2011 christiancodes de30 josh3io xros codel1417 tomjpalamattam kandy22 satyamgupta0 david-threadgold heiland oblotai berkedabag goshiaoki hennsn andresduran53 msgpo yjacket synaptekresearch miles-fan rondorkerin techthiyanes mdtalhachy elyager khanutbj 0000duck alexbytescribe romanzoniit riggig jlb226 borthenreserve fermain reallyeasy1 marcher357 fxjkhr udaran98

whisper_real_time's Issues

Wow, what a great job. BOL!!!

ValueError: Malformed soundfile

follow your codes,get follow errors:
codes:


from transformers import pipeline
import sys
import time
from tempfile import NamedTemporaryFile

transcriber = pipeline(task="automatic-speech-recognition", model="openai/whisper-large",chunk_length_s = 30, device=0)
starttime = time.time()
audiopath = sys.argv[1]
wf = open(audiopath, "rb")
#wf.read(44) # skip header
temp_file = NamedTemporaryFile().name
while True:
    data = wf.read(16000)
    if len(data) == 0:
        break
    with open(temp_file+".wav", 'w+b') as f:
         f.write(data)
    text = transcriber(temp_file+".wav")['text']
    print(text)

#print(text)
endtime=time.time()
print("it takes {}".format(endtime-starttime))

error:

Traceback (most recent call last):
  File "test2_stream.py", line 18, in <module>
    text = transcriber(temp_file+".wav")['text']
  File "/home/ybZhang/miniconda3/envs/whister/lib/python3.8/site-packages/transformers/pipelines/automatic_speech_recognition.py", line 378, in __call__
    return super().__call__(inputs, **kwargs)
  File "/home/ybZhang/miniconda3/envs/whister/lib/python3.8/site-packages/transformers/pipelines/base.py", line 1076, in __call__
    return next(
  File "/home/ybZhang/miniconda3/envs/whister/lib/python3.8/site-packages/transformers/pipelines/pt_utils.py", line 124, in __next__
    item = next(self.iterator)
  File "/home/ybZhang/miniconda3/envs/whister/lib/python3.8/site-packages/transformers/pipelines/pt_utils.py", line 266, in __next__
    processed = self.infer(next(self.iterator), **self.params)
  File "/home/ybZhang/miniconda3/envs/whister/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 521, in __next__
    data = self._next_data()
  File "/home/ybZhang/miniconda3/envs/whister/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 561, in _next_data
    data = self._dataset_fetcher.fetch(index)  # may raise StopIteration
  File "/home/ybZhang/miniconda3/envs/whister/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 32, in fetch
    data.append(next(self.dataset_iter))
  File "/home/ybZhang/miniconda3/envs/whister/lib/python3.8/site-packages/transformers/pipelines/pt_utils.py", line 183, in __next__
    processed = next(self.subiterator)
  File "/home/ybZhang/miniconda3/envs/whister/lib/python3.8/site-packages/transformers/pipelines/automatic_speech_recognition.py", line 437, in preprocess
    inputs = ffmpeg_read(inputs, self.feature_extractor.sampling_rate)
  File "/home/ybZhang/miniconda3/envs/whister/lib/python3.8/site-packages/transformers/pipelines/audio_utils.py", line 41, in ffmpeg_read
    raise ValueError("Malformed soundfile")
ValueError: Malformed soundfile

UnboundLocalError: local variable 'source' referenced before assignment

python == 3.9
torch == 2.0.1
cuda == 11.8
using a virtual environment

getting this error for memory allocation. what to do? is there any parameter I have to add?

Traceback (most recent call last):
File "c:\Users\laugh\OneDrive\Documents\GitHub\Dobby\base2.py", line 152, in
main()
File "c:\Users\laugh\OneDrive\Documents\GitHub\Dobby\base2.py", line 69, in main
audio_model = whisper.load_model(model)
File "C:\Users\laugh\AppData\Local\Programs\Python\Python39\lib\site-packages\whisper_init_.py", line 154, in load_model
return model.to(device)
File "C:\Users\laugh\AppData\Local\Programs\Python\Python39\lib\site-packages\torch\nn\modules\module.py", line 1149, in to
return self._apply(convert)
File "C:\Users\laugh\AppData\Local\Programs\Python\Python39\lib\site-packages\torch\nn\modules\module.py", line 801, in _apply
module._apply(fn)
File "C:\Users\laugh\AppData\Local\Programs\Python\Python39\lib\site-packages\torch\nn\modules\module.py", line 801, in _apply
module._apply(fn)
File "C:\Users\laugh\AppData\Local\Programs\Python\Python39\lib\site-packages\torch\nn\modules\module.py", line 801, in _apply
module._apply(fn)
[Previous line repeated 2 more times]
File "C:\Users\laugh\AppData\Local\Programs\Python\Python39\lib\site-packages\torch\nn\modules\module.py", line 824, in apply
param_applied = fn(param)
File "C:\Users\laugh\AppData\Local\Programs\Python\Python39\lib\site-packages\torch\nn\modules\module.py", line 1147, in convert
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 16.00 MiB. GPU 0 has a total capacty of 4.00 GiB of which 0 bytes is free. Of the allocated memory 3.44 GiB is allocated by PyTorch,
and 15.11 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
PS C:\Users\laugh\OneDrive\Documents\GitHub\Dobby> & C:/Users/laugh/AppData/Local/Programs/Python/Python39/python.exe c:/Users/laugh/OneDrive/Documents/GitHub/Dobby/base2.py
Traceback (most recent call last):
File "c:\Users\laugh\OneDrive\Documents\GitHub\Dobby\base2.py", line 152, in
main()
File "c:\Users\laugh\OneDrive\Documents\GitHub\Dobby\base2.py", line 69, in main
audio_model = whisper.load_model(model)
File "C:\Users\laugh\AppData\Local\Programs\Python\Python39\lib\site-packages\whisper_init.py", line 154, in load_model
return model.to(device)
File "C:\Users\laugh\AppData\Local\Programs\Python\Python39\lib\site-packages\torch\nn\modules\module.py", line 1149, in to
return self._apply(convert)
File "C:\Users\laugh\AppData\Local\Programs\Python\Python39\lib\site-packages\torch\nn\modules\module.py", line 801, in _apply
module._apply(fn)
File "C:\Users\laugh\AppData\Local\Programs\Python\Python39\lib\site-packages\torch\nn\modules\module.py", line 801, in _apply
module._apply(fn)
File "C:\Users\laugh\AppData\Local\Programs\Python\Python39\lib\site-packages\torch\nn\modules\module.py", line 801, in _apply
module._apply(fn)
[Previous line repeated 2 more times]
File "C:\Users\laugh\AppData\Local\Programs\Python\Python39\lib\site-packages\torch\nn\modules\module.py", line 824, in _apply
param_applied = fn(param)
File "C:\Users\laugh\AppData\Local\Programs\Python\Python39\lib\site-packages\torch\nn\modules\module.py", line 1147, in convert
return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking)
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 16.00 MiB. GPU 0 has a total capacty of 4.00 GiB of which 0 bytes is free. Of the allocated memory 3.44 GiB is allocated by PyTorch,
and 15.11 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

Some text is replaced by later parts of the sentence

Some text is replaced by later parts of the sentence,
generally where there should be a comma.

I said: please let me know where I can find green apples
And after different text appearing the only text left was :
green apples

great work btw!

..

Silently failing when running on MacOS (with M1)

I'm trying to run this app as you described

python transcriber.py

but I came across a few issues:

The readme suggests to use 3.7 but that doesn't work because one of the dependencies, tiktoken, isn't supported below python3.8
When I ran the above command after using python3.8, it silently fails.
I also have had issues with the build command python cx_freeze_setup.py build but I haven't dug into that too much yet

Any support with debugging would be appreciated

Thanks for all that you've done!

auto translation?

I am fairly sure that this model is capable of translating non-english spoken language into english text. I think maybe we are missing a parameter perhaps? how can we make this translate non_english speech into english text?

Missing how to use information in readme

How do I run this? The basics are missing from the readme.

It does not take audio properly

i dont have a gpu so i a running this on cpu, but for testing i said these words,
"Hello,hello,hello
this. is just testing. Please give me everything that is said.
Thank you."
it either only prints hello, hello, hello or this is testing.

Here's how I run it on Mac M2

First I installed pyenv by running brew install pyenv

then pyenv install 3.8 python 3.7 didn't worked for me so :(

Created a venv and installed ffmpeg and portaudio (required for pyaudio)
brew install ffmpeg and
brew install portaudio

and finally
pip install -r requirements.txt

The code then worked! :)

Infinite loop management

Hello,

First of all : nice work! Your code has been very useful to me 💯

There is just one little problem I think: the sleep instruction only executes if the data queue is not empty.

while True:
    ...
    if not data_queue.empty(): 
        ...
        sleep(0.25)

I may be wrong, but it seems to me that an indentation level should be removed on the sleep() call to prevent the infinite loop spam when the data queue is empty?

It kinda works with m2 mps device

Heyo, so i ran this on my 2023 m2 macbook and got some results. it uses the gpu but doesnt quite get it right.

what i said into the microphone was
"hi hows it going"
"whats up"
"what it do"

anywhere here is my report:

(whisper_real_time) cameron@M2 whisper_real_time % pip freeze
certifi==2022.12.7
charset-normalizer==3.0.1
ffmpeg-python==0.2.0
filelock==3.9.0
future==0.18.3
huggingface-hub==0.12.1
idna==3.4
more-itertools==9.0.0
mpmath==1.2.1
networkx==3.0rc1
numpy==1.24.2
openai-whisper @ git+https://github.com/openai/whisper.git@51c785f7c91b8c032a1fa79c0e8f862dea81b860
packaging==23.0
PyAudio==0.2.13
PyYAML==6.0
regex==2022.10.31
requests==2.28.2
SpeechRecognition==3.9.0
sympy==1.11.1
tokenizers==0.13.2
torch==2.0.0.dev20230121
torchaudio==2.0.0.dev20230223
tqdm==4.64.1
transformers==4.26.1
typing_extensions==4.5.0
urllib3==1.26.14

(whisper_real_time) cameron@M2 whisper_real_time % python transcribe_demo.py --model large --non_english
Model loaded.

/Users/cameron/.local/share/virtualenvs/whisper_real_time-Iw30K9az/lib/python3.9/site-packages/whisper/decoding.py:633: UserWarning: The operator 'aten::repeat_interleave.self_int' is not currently supported on the MPS backend and will fall back to run on the CPU. This may have performance implications. (Triggered internally at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/mps/MPSFallback.mm:11.)
audio_features = audio_features.repeat_interleave(self.n_group, dim=0)

Hi<|en|><|en|> Hi Hi Hi Hi Hi Hi Hi

Hi<|en|><|en|> Hi Hi Hi Hi Hi Hi Hi
What<|en|><|en|><|en|>

Hi<|en|><|en|> Hi Hi Hi Hi Hi Hi Hi
What<|en|><|en|><|en|>
What<|en|><|en|><|en|> What What
^C

Transcription:
Hi<|en|><|en|> Hi Hi Hi Hi Hi Hi Hi
What<|en|><|en|><|en|>
What<|en|><|en|><|en|> What What

the app don't work?

when i run the app, it seems dont't work at all. did i miss some arguments or something else?
the command i run was: "python transcribe_demo.py", but it seems the app was wait arguments or something else when i press enter key

Sound System Issue

ALSA lib pcm.c:2642:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.rear
ALSA lib pcm.c:2642:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.center_lfe
ALSA lib pcm.c:2642:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.side
ALSA lib pcm_route.c:869:(find_matching_chmap) Found no matching channel map
ALSA lib pcm_route.c:869:(find_matching_chmap) Found no matching channel map
ALSA lib pcm_route.c:869:(find_matching_chmap) Found no matching channel map
ALSA lib pcm_route.c:869:(find_matching_chmap) Found no matching channel map
ALSA lib pcm_oss.c:377:(_snd_pcm_oss_open) Unknown field port
ALSA lib pcm_oss.c:377:(_snd_pcm_oss_open) Unknown field port
ALSA lib pcm_usb_stream.c:486:(_snd_pcm_usb_stream_open) Invalid type for card
ALSA lib pcm_usb_stream.c:486:(_snd_pcm_usb_stream_open) Invalid type for card
ALSA lib pcm.c:2642:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.rear
ALSA lib pcm.c:2642:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.center_lfe
ALSA lib pcm.c:2642:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.side
ALSA lib pcm_route.c:869:(find_matching_chmap) Found no matching channel map
ALSA lib pcm_route.c:869:(find_matching_chmap) Found no matching channel map
ALSA lib pcm_route.c:869:(find_matching_chmap) Found no matching channel map
ALSA lib pcm_route.c:869:(find_matching_chmap) Found no matching channel map
ALSA lib pcm_oss.c:377:(_snd_pcm_oss_open) Unknown field port
ALSA lib pcm_oss.c:377:(_snd_pcm_oss_open) Unknown field port
ALSA lib pcm_usb_stream.c:486:(_snd_pcm_usb_stream_open) Invalid type for card
ALSA lib pcm_usb_stream.c:486:(_snd_pcm_usb_stream_open) Invalid type for card
ALSA lib pcm.c:2642:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.rear
ALSA lib pcm.c:2642:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.center_lfe
ALSA lib pcm.c:2642:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.side
ALSA lib pcm_route.c:869:(find_matching_chmap) Found no matching channel map
ALSA lib pcm_route.c:869:(find_matching_chmap) Found no matching channel map
ALSA lib pcm_route.c:869:(find_matching_chmap) Found no matching channel map
ALSA lib pcm_route.c:869:(find_matching_chmap) Found no matching channel map
ALSA lib pcm_oss.c:377:(_snd_pcm_oss_open) Unknown field port
ALSA lib pcm_oss.c:377:(_snd_pcm_oss_open) Unknown field port
ALSA lib pcm_usb_stream.c:486:(_snd_pcm_usb_stream_open) Invalid type for card
ALSA lib pcm_usb_stream.c:486:(_snd_pcm_usb_stream_open) Invalid type for card
Model loaded.

ALSA lib pcm_dsnoop.c:641:(snd_pcm_dsnoop_open) unable to open slave
ALSA lib pcm.c:2642:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.rear
ALSA lib pcm.c:2642:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.center_lfe
ALSA lib pcm.c:2642:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.side
ALSA lib pcm_route.c:869:(find_matching_chmap) Found no matching channel map
ALSA lib pcm_route.c:869:(find_matching_chmap) Found no matching channel map
ALSA lib pcm_route.c:869:(find_matching_chmap) Found no matching channel map
ALSA lib pcm_oss.c:377:(_snd_pcm_oss_open) Unknown field port
ALSA lib pcm_oss.c:377:(_snd_pcm_oss_open) Unknown field port
ALSA lib pcm_usb_stream.c:486:(_snd_pcm_usb_stream_open) Invalid type for card
ALSA lib pcm_usb_stream.c:486:(_snd_pcm_usb_stream_open) Invalid type for card

i'm sure that there's a simple way to get the API to do this but i figured i would ask here

what would i need to add to the code to get the transcription to auto translate to English? i have another command line tool that uses whisper that accepts audio files and a bunch of different arguments for what whisper should do with it, including translation in the form of the "--task translate" argument. in the main transcribe_demo.py file i see where a couple arguments are being set for program, so i simply tried adding a similar line containing that argument, but i couldn't get it to work.

thoughts?

multi party

Hi, excellent work on this repo.

Any way to do multi-party? Ex: 2 people + talking
Is there a way to differentiate the speakers?

Is there documentation for how to use the demo?

Is there documentation for how to use the demo?
For example, how would you adjust the size of the model, how do you know if it's working, what is the default sound device it is picking up audio from, and can it work on a basic Intel GPU when using the smaller options?

HOW TO USE SPESIFIC LANGUAGE MODEL?

I can't find any clues in how to use specific language model.
In VOSK API or GOOGLE API all models file are language specific, but on this WHISPER API there are only tiny, small, or big model without any language specified.
Is there any how to control model used based on language specified?

Please state which version of Python should be used

Python 2.7 did not work installing requirements in WIN10
Python 3.12 errors with

import setuptools.version
        File "C:\Users\Administrator\AppData\Local\Temp\pip-build-env-zv_92dg2\overlay\Lib\site-packages\setuptools\version.py", line 1, in <module>
          import pkg_resources
        File "C:\Users\Administrator\AppData\Local\Temp\pip-build-env-zv_92dg2\overlay\Lib\site-packages\pkg_resources\__init__.py", line 2191, in <module>
          register_finder(pkgutil.ImpImporter, find_on_path)
                          ^^^^^^^^^^^^^^^^^^^
      AttributeError: module 'pkgutil' has no attribute 'ImpImporter'. Did you mean: 'zipimporter'?
      [end of output]

  note: This error originates from a subprocess, and is likely not a problem with pip.
error: subprocess-exited-with-error

× Getting requirements to build wheel did not run successfully.
│ exit code: 1
╰─> See above for output.

Got this error when used

➜ whisper_real_time git:(master) python3.8 transcribe_demo.py
Could not import the PyAudio C module '_portaudio'.
Traceback (most recent call last):
File "/home/samuel/.local/lib/python3.8/site-packages/speech_recognition/init.py", line 120, in get_pyaudio
import pyaudio
File "/usr/lib/python3/dist-packages/pyaudio.py", line 116, in
import _portaudio as pa
ModuleNotFoundError: No module named '_portaudio'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "transcribe_demo.py", line 152, in
main()
File "transcribe_demo.py", line 58, in main
for index, name in enumerate(sr.Microphone.list_microphone_names()):
File "/home/samuel/.local/lib/python3.8/site-packages/speech_recognition/init.py", line 135, in list_microphone_names
audio = Microphone.get_pyaudio().PyAudio()
File "/home/samuel/.local/lib/python3.8/site-packages/speech_recognition/init.py", line 122, in get_pyaudio
raise AttributeError("Could not find PyAudio; check installation")
AttributeError: Could not find PyAudio; check installation

Question about pytorch

parameter logits (Tensor of shape (1, 51864)) of distribution Categorical(logits: torch.Size([1, 51864])) to satisfy the constraint IndependentConstraint(Real(), 1), but found invalid values: tensor([[nan, nan, nan, ..., nan, nan, nan]], device='cuda:0')
I have tried [this] (openai/whisper#1068) but it did not work.
I got !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! 1.5mm 1.5mm 1.5mm 1.5mm 1.5mm 1.5mm 1.5mm 1.5mm 1.5mm 1.5mm 1.5mm 1.5mm 1.5mm 1.5mm

Getting this error when running

getting this error when running the script

Model loaded.

Traceback (most recent call last):
  File "C:\Users\ibrah\Desktop\demo.py", line 152, in <module>
    main()
  File "C:\Users\ibrah\Desktop\demo.py", line 124, in main
    result = audio_model.transcribe(temp_file, fp16=torch.cuda.is_available())
  File "C:\Users\ibrah\AppData\Local\Programs\Python\Python310\lib\site-packages\whisper\transcribe.py", line 121, in transcribe
    mel = log_mel_spectrogram(audio, padding=N_SAMPLES)
  File "C:\Users\ibrah\AppData\Local\Programs\Python\Python310\lib\site-packages\whisper\audio.py", line 130, in log_mel_spectrogram
    audio = load_audio(audio)
  File "C:\Users\ibrah\AppData\Local\Programs\Python\Python310\lib\site-packages\whisper\audio.py", line 46, in load_audio
    ffmpeg.input(file, threads=0)
  File "C:\Users\ibrah\AppData\Local\Programs\Python\Python310\lib\site-packages\ffmpeg\_run.py", line 313, in run
    process = run_async(
  File "C:\Users\ibrah\AppData\Local\Programs\Python\Python310\lib\site-packages\ffmpeg\_run.py", line 284, in run_async
    return subprocess.Popen(
  File "C:\Users\ibrah\AppData\Local\Programs\Python\Python310\lib\subprocess.py", line 971, in __init__
    self._execute_child(args, executable, preexec_fn, close_fds,
  File "C:\Users\ibrah\AppData\Local\Programs\Python\Python310\lib\subprocess.py", line 1456, in _execute_child
    hp, ht, pid, tid = _winapi.CreateProcess(executable, args,
FileNotFoundError: [WinError 2] The system cannot find the file specified```

Repeated and missing words for [phrase timeout]

delete this issue

Im sorry but can someone please help me to run this?

Thus far I have installed and tried to run the code in the demo. I am not sure to run it so the terminal starts transcribing!

Thanks !

WRT on Windows

Did somebody managed to let it work on Windows? I couldn't in any way.
The source is loaded correctly, but basically debugging, no data is put on queue:

    data = audio.get_raw_data()
    data_queue.put(data)

So no any result during listening.

Which way did you solve it?

Any chance to change source by command prompt?

I would like to use computer's line-in to transcribe coming sound from computer. Is there any way to change source of sound from mic to line in through command prompt argument addition? Good project btw. Thanks

File error

Error:

Traceback (most recent call last):
File "C:\Users\MSI\PycharmProjects\Jarvis\test_2.py", line 131, in
main()
File "C:\Users\MSI\PycharmProjects\Jarvis\test_2.py", line 103, in main
result = audio_model.transcribe(temp_file, fp16=torch.cuda.is_available())
File "C:\Users\MSI\PycharmProjects\Jarvis\venv\lib\site-packages\whisper\transcribe.py", line 121, in transcribe
mel = log_mel_spectrogram(audio, padding=N_SAMPLES)
File "C:\Users\MSI\PycharmProjects\Jarvis\venv\lib\site-packages\whisper\audio.py", line 130, in log_mel_spectrogram
audio = load_audio(audio)
File "C:\Users\MSI\PycharmProjects\Jarvis\venv\lib\site-packages\whisper\audio.py", line 46, in load_audio
ffmpeg.input(file, threads=0)
File "C:\Users\MSI\PycharmProjects\Jarvis\venv\lib\site-packages\ffmpeg_run.py", line 313, in run
process = run_async(
File "C:\Users\MSI\PycharmProjects\Jarvis\venv\lib\site-packages\ffmpeg_run.py", line 284, in run_async
return subprocess.Popen(
File "C:\Users\MSI\AppData\Local\Programs\Python\Python310\lib\subprocess.py", line 971, in init
self._execute_child(args, executable, preexec_fn, close_fds,
File "C:\Users\MSI\AppData\Local\Programs\Python\Python310\lib\subprocess.py", line 1440, in _execute_child
hp, ht, pid, tid = _winapi.CreateProcess(executable, args,
FileNotFoundError: [WinError 2] Не удается найти указанный файл

Process finished with exit code 1

I get this error, I do not know what the reason is. The error comes out after the inscription "Model Loaded". So I think the error is in the second half of the code. I used the original code, just changed the model to a small one. Can you help me?

torch.cuda.OutOfMemoryError

Traceback (most recent call last):
File "transcribe_demo.py", line 151, in
main()
File "transcribe_demo.py", line 69, in main
audio_model = whisper.load_model(model)
File "D:\anaconda3\envs\whisperTime\lib\site-packages\whisper_init_.py", line 122, in load_model
return model.to(device)
File "D:\anaconda3\envs\whisperTime\lib\site-packages\torch\nn\modules\module.py", line 989, in to
return self._apply(convert)
File "D:\anaconda3\envs\whisperTime\lib\site-packages\torch\nn\modules\module.py", line 641, in _apply
module._apply(fn)
File "D:\anaconda3\envs\whisperTime\lib\site-packages\torch\nn\modules\module.py", line 641, in _apply
module._apply(fn)
File "D:\anaconda3\envs\whisperTime\lib\site-packages\torch\nn\modules\module.py", line 641, in _apply
module._apply(fn)
[Previous line repeated 2 more times]
File "D:\anaconda3\envs\whisperTime\lib\site-packages\torch\nn\modules\module.py", line 664, in _apply
param_applied = fn(param)
File "D:\anaconda3\envs\whisperTime\lib\site-packages\torch\nn\modules\module.py", line 987, in convert
return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking)
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 26.00 MiB (GPU 0; 8.00 GiB total capacity; 6.50 GiB already allocated; 0 bytes free; 6.83 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for M
emory Management and PYTORCH_CUDA_ALLOC_CONF

Как использовать видеокарту, а не процессор?

Почему-то не могу никак понять, как именно и где прописать, чтобы whisper использовал мощности моей видеокарты, а лучше совмещать мог ее и процессор? Ибо какой толк использовать лишь его? Кто знает - поделитесь.

whisper real-time from Jetson Nano

Hello.
Thanks to your whisper real-time, I tried STT on my computer.
I want to use this package on my Jetson Nano, but when I run it on my Jetson Nano, the CPU and memory usage is very high and the screen freezes.
Then someone told me to use the API of OPENAI, and just like running GPT with python code, I can use the API of WHISPER.

So I'm wondering if I can use the STT function in this code by entering the api key without downloading the model or running heavy.

Dependencies

Script says Python 3.7, so I used 3.7 in my Conda env, but when I pip install requirements.txt, I get errors based on python 3.7. I believe the issue is with PyAudio:

Collecting SpeechRecognition
  Using cached SpeechRecognition-3.8.1-py2.py3-none-any.whl (32.8 MB)
INFO: pip is looking at multiple versions of pyaudio to determine which version is compatible with other requirements. This could take a while.
Collecting pyaudio
  Using cached PyAudio-0.2.12.tar.gz (42 kB)
  Installing build dependencies ... done
  Getting requirements to build wheel ... done
  Preparing metadata (pyproject.toml) ... done
ERROR: Ignored the following versions that require a different python version: 0.1.1 Requires-Python >=3.9; 0.1.2 Requires-Python >=3.8; 0.2.0 Requires-Python >=3.8; 0.3.0 Requires-Python >=3.8; 0.3.1 Requires-Python >=3.8; 0.3.2 Requires-Python >=3.8; 0.3.3 Requires-Python >=3.8; 3.10.0 Requires-Python >=3.8
ERROR: Could not find a version that satisfies the requirement tiktoken==0.3.1 (from openai-whisper) (from versions: none)
ERROR: No matching distribution found for tiktoken==0.3.1

Enhancement: Implement speaker diarization

To allow individual speakers' dialogue to be partitioned.

Kudos

Just wanted to say thank you for the cool repo!

how release a mic?

I used this code in my project. when selecting "1" in the menu, this code is run. how do I release the microphone fromspeech recognition? i try this:
except KeyboardInterrupt:
source.stream.pyaudio_stream.stop_stream()
source.stream.pyaudio_stream.close()
break

but this lines close whole app((, not just function, where i used this code

Optimal values for the VAD filter?

As the tittle says, did you find any optimal values for VAD filter?

Pyaudio install fails when installing requirements.txt

Users on Mac may receive an error where requirements.txt fails on the pyaudio install. To fix this, you need to first install portaudio through homebrew first. On their documentation, you need to run the commands in this order:

brew install portaudio
pip install pyaudio

Hat tip stackoverflow

Why does it take almost 15000-20000ms latency to deliver the text ?

For example: If I say I am XYZ. It takes almost above mentioned time to deliver it. How to speed this thing up?

Also, Why does it take unexpected time to load the model?

@davabase

Malayalam (ml) didnt work

Hi,

whisper_real_time works for english and hindi..But I couldnt get it to work for malayalam.
Even whisper is not working for malayalam

Heres the code section

model = whisper.load_model("medium")
result = model.transcribe("/home/ajay/pcs/whisper_real_time/stackoverflow.wav",language='ml')

self._execute_child(args, executable, preexec_fn, close_fds) Can not find the specified file [With Solution]

Traceback (most recent call last):
  File "C:\Users\UsernameHere\Desktop\PythonProjects\real-time-whisper\whisper_real_time-master\transcribe_demo3.py", line 158, in <module>
    main()
  File "C:\Users\UsernameHere\Desktop\PythonProjects\real-time-whisper\whisper_real_time-master\transcribe_demo3.py", line 130, in main
    result = audio_model.transcribe(temp_file, fp16=torch.cuda.is_available())
  File "C:\Users\UsernameHere\AppData\Local\Programs\Python\Python311\Lib\site-packages\whisper\transcribe.py", line 121, in transcribe
    mel = log_mel_spectrogram(audio, padding=N_SAMPLES)
  File "C:\Users\UsernameHere\AppData\Local\Programs\Python\Python311\Lib\site-packages\whisper\audio.py", line 140, in log_mel_spectrogram
    audio = load_audio(audio)
  File "C:\Users\UsernameHere\AppData\Local\Programs\Python\Python311\Lib\site-packages\whisper\audio.py", line 59, in load_audio
    out = run(cmd, capture_output=True, check=True).stdout
  File "C:\Users\UsernameHere\AppData\Local\Programs\Python\Python311\Lib\subprocess.py", line 548, in run
    with Popen(*popenargs, **kwargs) as process:
  File "C:\Users\UsernameHere\AppData\Local\Programs\Python\Python311\Lib\subprocess.py", line 1024, in __init__
    self._execute_child(args, executable, preexec_fn, close_fds,
  File "C:\Users\UsernameHere\AppData\Local\Programs\Python\Python311\Lib\subprocess.py", line 1510, in _execute_child
    # no special security
FileNotFoundError: [WinError 2] The system cannot find the file specified

After troubleshooting, this indicates that FFMPEG is not found or not installed correctly. It should have been installed correctly via requirements.txt.

One dirty and quick mitigation is to download ffmpeg.exe, and add it to your environmental variables "PATH" variable for windows users.

Linux mic detection

when I try to run the demo program on linux, I get these errors.


python3 transcribe_demo.py --model tiny
ALSA lib pcm_dsnoop.c:566:(snd_pcm_dsnoop_open) unable to open slave
ALSA lib pcm_dmix.c:999:(snd_pcm_dmix_open) unable to open slave
ALSA lib pcm.c:2666:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.rear
ALSA lib pcm.c:2666:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.center_lfe
ALSA lib pcm.c:2666:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.side
ALSA lib pcm_oss.c:397:(_snd_pcm_oss_open) Cannot open device /dev/dsp
ALSA lib pcm_oss.c:397:(_snd_pcm_oss_open) Cannot open device /dev/dsp
ALSA lib pcm_a52.c:1001:(_snd_pcm_a52_open) a52 is only for playback
ALSA lib confmisc.c:160:(snd_config_get_card) Invalid field card
ALSA lib pcm_usb_stream.c:482:(_snd_pcm_usb_stream_open) Invalid card 'card'
ALSA lib confmisc.c:160:(snd_config_get_card) Invalid field card
ALSA lib pcm_usb_stream.c:482:(_snd_pcm_usb_stream_open) Invalid card 'card'
ALSA lib pcm_dmix.c:999:(snd_pcm_dmix_open) unable to open slave
ALSA lib pcm_dsnoop.c:566:(snd_pcm_dsnoop_open) unable to open slave
ALSA lib pcm_dmix.c:999:(snd_pcm_dmix_open) unable to open slave
ALSA lib pcm.c:2666:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.rear
ALSA lib pcm.c:2666:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.center_lfe
ALSA lib pcm.c:2666:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.side
ALSA lib pcm_oss.c:397:(_snd_pcm_oss_open) Cannot open device /dev/dsp
ALSA lib pcm_oss.c:397:(_snd_pcm_oss_open) Cannot open device /dev/dsp
ALSA lib pcm_a52.c:1001:(_snd_pcm_a52_open) a52 is only for playback
ALSA lib confmisc.c:160:(snd_config_get_card) Invalid field card
ALSA lib pcm_usb_stream.c:482:(_snd_pcm_usb_stream_open) Invalid card 'card'
ALSA lib confmisc.c:160:(snd_config_get_card) Invalid field card
ALSA lib pcm_usb_stream.c:482:(_snd_pcm_usb_stream_open) Invalid card 'card'
ALSA lib pcm_dmix.c:999:(snd_pcm_dmix_open) unable to open slave
ALSA lib pcm_dsnoop.c:566:(snd_pcm_dsnoop_open) unable to open slave
ALSA lib pcm_dmix.c:999:(snd_pcm_dmix_open) unable to open slave
ALSA lib pcm.c:2666:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.rear
ALSA lib pcm.c:2666:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.center_lfe
ALSA lib pcm.c:2666:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.side
ALSA lib pcm_oss.c:397:(_snd_pcm_oss_open) Cannot open device /dev/dsp
ALSA lib pcm_oss.c:397:(_snd_pcm_oss_open) Cannot open device /dev/dsp
ALSA lib pcm_a52.c:1001:(_snd_pcm_a52_open) a52 is only for playback
ALSA lib confmisc.c:160:(snd_config_get_card) Invalid field card
ALSA lib pcm_usb_stream.c:482:(_snd_pcm_usb_stream_open) Invalid card 'card'
ALSA lib confmisc.c:160:(snd_config_get_card) Invalid field card
ALSA lib pcm_usb_stream.c:482:(_snd_pcm_usb_stream_open) Invalid card 'card'
ALSA lib pcm_dmix.c:999:(snd_pcm_dmix_open) unable to open slave
Model loaded.

ALSA lib pcm_dsnoop.c:566:(snd_pcm_dsnoop_open) unable to open slave
ALSA lib pcm_dmix.c:999:(snd_pcm_dmix_open) unable to open slave
ALSA lib pcm.c:2666:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.rear
ALSA lib pcm.c:2666:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.center_lfe
ALSA lib pcm.c:2666:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.side
ALSA lib pcm_oss.c:397:(_snd_pcm_oss_open) Cannot open device /dev/dsp
ALSA lib pcm_oss.c:397:(_snd_pcm_oss_open) Cannot open device /dev/dsp
ALSA lib pcm_a52.c:1001:(_snd_pcm_a52_open) a52 is only for playback
ALSA lib confmisc.c:160:(snd_config_get_card) Invalid field card
ALSA lib pcm_usb_stream.c:482:(_snd_pcm_usb_stream_open) Invalid card 'card'
ALSA lib confmisc.c:160:(snd_config_get_card) Invalid field card
ALSA lib pcm_usb_stream.c:482:(_snd_pcm_usb_stream_open) Invalid card 'card'
ALSA lib pcm_dmix.c:999:(snd_pcm_dmix_open) unable to open slave

How to get transcription in a text file

Maybe I'm missing the obvious, is there a way to export the transcription into a text file/log? Also are there other arguments that can be inserted to date/time stamp each entry?

Irrelevant Output

I get irrelevant output before and after I use microphone, such as "Okey", "Thank you for watching!" which I am pretty sure not said by me. Does anyone know where are these print come from?

Another quick question, what does that "FF" with red background mean? (see pic below)

SSL: CERTIFICATE_VERIFY_FAILED

Hi there :)
Any idea on why im getting this?

python3.11 transcribe_demo.py
Traceback (most recent call last):
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/urllib/request.py", line 1348, in do_open
    h.request(req.get_method(), req.selector, req.data, headers,
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/http/client.py", line 1303, in request
    self._send_request(method, url, body, headers, encode_chunked)
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/http/client.py", line 1349, in _send_request
    self.endheaders(body, encode_chunked=encode_chunked)
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/http/client.py", line 1298, in endheaders
    self._send_output(message_body, encode_chunked=encode_chunked)
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/http/client.py", line 1058, in _send_output
    self.send(msg)
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/http/client.py", line 996, in send
    self.connect()
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/http/client.py", line 1475, in connect
    self.sock = self._context.wrap_socket(self.sock,
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/ssl.py", line 517, in wrap_socket
    return self.sslsocket_class._create(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/ssl.py", line 1104, in _create
    self.do_handshake()
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/ssl.py", line 1382, in do_handshake
    self._sslobj.do_handshake()
ssl.SSLCertVerificationError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: self-signed certificate in certificate chain (_ssl.c:1006)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/magic/wholesomegarden/magicllight/transparwnt-web-app/whisper_real_time/transcribe_demo.py", line 143, in <module>
    main()
  File "/Users/magic/wholesomegarden/magicllight/transparwnt-web-app/whisper_real_time/transcribe_demo.py", line 66, in main
    audio_model = whisper.load_model(model)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/whisper/__init__.py", line 133, in load_model
    checkpoint_file = _download(_MODELS[name], download_root, in_memory)
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/whisper/__init__.py", line 69, in _download
    with urllib.request.urlopen(url) as source, open(download_target, "wb") as output:
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/urllib/request.py", line 216, in urlopen
    return opener.open(url, data, timeout)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/urllib/request.py", line 519, in open
    response = self._open(req, data)
               ^^^^^^^^^^^^^^^^^^^^^
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/urllib/request.py", line 536, in _open
    result = self._call_chain(self.handle_open, protocol, protocol +
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/urllib/request.py", line 496, in _call_chain
    result = func(*args)
             ^^^^^^^^^^^
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/urllib/request.py", line 1391, in https_open
    return self.do_open(http.client.HTTPSConnection, req,
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/urllib/request.py", line 1351, in do_open
    raise URLError(err)
urllib.error.URLError: <urlopen error [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: self-signed certificate in certificate chain (_ssl.c:1006)>

And maybe how to fix it...
trying to run on mac
Thanks

Recieve sound data from any application along with the mic in Linux. PR or use your code?

Hi, I wrote a hook that allows sound data from any application with active sound(or several applications simultaneously) together with the mic data, to be streamed to Whisper. I'm using it for application-agnostic live-transcribtion/LLM "real time" assistance application.

It's not that pretty: I use third party sound libs for Linux(PulseAudio) for virtual sound device creation, using bash script. Then the user manually redirect the sound of the app/mic into this virtual device, using PulseAudio GUI.
I wrote a small implementation of sr.AudioSource abstract class, with audio stream from PulseAudio, what allowed me easily connect to whisper and enjoy all the sr features like background listening, sound adjustment etc..

Now I'm ready to push the code, and I wonder if(and how) should I address you, or should I create a PR to add my hook, after I'll prettify and test it.

Thanks for sharing your code, it's the best I tried for real time Whisper usage.

Transcribe from sound card

Hi all,
Thank you for this implementation.
I would like to transcribe from the soundcard, so I would need to specify here a different source.

This is the list of my mic devices:

Microphone with name "MacBook Pro Microphone" found for `Microphone(device_index=1)`
Microphone with name "MacBook Pro Speakers" found for `Microphone(device_index=2)`

So I am adding:

source = sr.Microphone(sample_rate=16000, device_index=2)

but I get the following error:

Traceback (most recent call last):
  File "transcribe_demo_soundcard.py", line 79, in main
    recorder.adjust_for_ambient_noise(source)
  File "/opt/anaconda3/envs/py38/lib/python3.8/site-packages/speech_recognition/__init__.py", line 383, in adjust_for_ambient_noise
    assert source.stream is not None, "Audio source must be entered before adjusting, see documentation for ``AudioSource``; are you using ``source`` outside of a ``with`` statement?"
AssertionError: Audio source must be entered before adjusting, see documentation for ``AudioSource``; are you using ``source`` outside of a ``with`` statement?

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "transcribe_demo_soundcard.py", line 155, in <module>
    main()
  File "transcribe_demo_soundcard.py", line 79, in main
    recorder.adjust_for_ambient_noise(source)
  File "/opt/anaconda3/envs/py38/lib/python3.8/site-packages/speech_recognition/__init__.py", line 189, in __exit__
    self.stream.close()
AttributeError: 'NoneType' object has no attribute 'close'

Any clue why?
Thanks!

To .exe

Хочу компилировать твой замечательный код, чтобы на выходе быть .exe файл со всеми зависимостями. Скажи как можно указать место, куда будет скачиваться модель whisper?

Error loading "\lib\site-packages\torch\lib\shm.dll" or one of its dependencies

For anyone facing this error, when they try to run the demo, after installing the requirements.
Check if torch installed.
You can do check by going into the python CLI and trying to import torch manually.

Not sure why the issue is occuring but a quick google search reveals the error is with torch2.3 so downloading an older version will help
You can use the official instructions in pytorch.org to generate the install, but that wil default to the latest version 2.3

Simply specify version 2.2.2 or lower and update the index from cu116 to cu118
Example you can use below: (This will also install other pytorch livbraries which can come in handy in the future.

pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118

1 of the google solutions found here
https://stackoverflow.com/questions/74594256/pytorch-error-loading-lib-site-packages-torch-lib-shm-dll-or-one-of-its-depen