oliverguhr / wav2vec2-live Goto Github PK
View Code? Open in Web Editor NEWA live speech recognition using Facebooks wav2vec 2.0 model.
License: MIT License
A live speech recognition using Facebooks wav2vec 2.0 model.
License: MIT License
hi,
does this repo support mms model (which is like wav2vec2.0 but have more transformer layers.)?
Thanks for sharing this project!
I am on a mac, and I seem to have problems with opening the pyAudio stream here: https://github.com/oliverguhr/wav2vec2-live/blob/main/live_asr.py#L59-L64
Looking at the documentation: https://people.csail.mit.edu/hubert/pyaudio/docs/ I see that to use a microphone, a callback function is required.
I made sure that I am picking the correct device microphone here: https://github.com/oliverguhr/wav2vec2-live/blob/main/live_asr.py#L56-L57
The error I am getting is:
Process Process-2:
Traceback (most recent call last):
File "/usr/local/Cellar/[email protected]/3.9.2_4/Frameworks/Python.framework/Versions/3.9/lib/python3.9/multiprocessing/process.py", line 315, in _bootstrap
self.run()
File "/usr/local/Cellar/[email protected]/3.9.2_4/Frameworks/Python.framework/Versions/3.9/lib/python3.9/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "/Users/othrif/spectrum/voice/experiment/wav2vec/042121/wav2vec2-live/live_asr.py", line 62, in vad_process
stream = audio.open(input_device_index=selected_input_device_id,
File "/Users/othrif/spectrum/voice/experiment/wav2vec/042121/wav2vec2-live/.venv/lib/python3.9/site-packages/pyaudio.py", line 750, in open
stream = Stream(self, *args, **kwargs)
File "/Users/othrif/spectrum/voice/experiment/wav2vec/042121/wav2vec2-live/.venv/lib/python3.9/site-packages/pyaudio.py", line 441, in __init__
self._stream = pa.open(**arguments)
OSError: [Errno -9986] Internal PortAudio error
Any thoughts on what might be causing this?
i am using the default code, but it looks like taking command very fast if I take a little pause it breaks the line and and prints it the
next line for example if i speak "how are you" its taking "how" then "are " in the next line,,,,
i can i reduce sensetivity of taking command??
Your current library is great. Can you please provide a whisper-live version of your codebase?
Hi, I was testing live_asr.py
on my macOS Monterey (Python=3.8.11) under the following environment:
halo==0.0.31
numpy==1.21.4
PyAudio==0.2.11
Rx==3.2.0
SoundFile==0.10.3.post1
torch==1.10.0
torchaudio==0.10.0
transformers==4.8.2
webrtcvad==2.0.10
When I run python live_asr.py
, I came across errors as below:
Some weights of Wav2Vec2ForCTC were not initialized from the model checkpoint at facebook/wav2vec2-large-960h-lv60-self and are newly initialized: ['wav2vec2.masked_spec_embed']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
listening to your voice
/Users/jkang/anaconda3/envs/jk/lib/python3.7/site-packages/transformers/feature_extraction_utils.py:158: UserWarning: Creating a tensor from a list of numpy.ndarrays is extremely slow. Please consider converting the list to a single numpy.ndarray with numpy.array() before converting to a tensor. (Triggered internally at ../torch/csrc/utils/tensor_new.cpp:201.)
tensor = as_tensor(value)
/Users/jkang/anaconda3/envs/jk/lib/python3.7/site-packages/transformers/models/wav2vec2/modeling_wav2vec2.py:986: UserWarning: __floordiv__ is deprecated, and its behavior will change in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values. To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor').
return (input_length - kernel_size) // stride + 1
Exception in thread Thread-1:
Traceback (most recent call last):
File "/Users/jkang/anaconda3/envs/jk/lib/python3.7/threading.py", line 926, in _bootstrap_inner
self.run()
File "/Users/jkang/anaconda3/envs/jk/lib/python3.7/threading.py", line 870, in run
self._target(*self._args, **self._kwargs)
File "/Users/jkang/Desktop/test_asr/wav2vec2-live/live_asr.py", line 92, in asr_process
text = wave2vec_asr.buffer_to_text(float64_buffer).lower()
File "/Users/jkang/Desktop/test_asr/wav2vec2-live/wav2vec2_inference.py", line 22, in buffer_to_text
logits = self.model(inputs.input_values, attention_mask=torch.ones(len(inputs.input_values[0]))).logits
File "/Users/jkang/anaconda3/envs/jk/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1120, in _call_impl
result = forward_call(*input, **kwargs)
File "/Users/jkang/anaconda3/envs/jk/lib/python3.7/site-packages/transformers/models/wav2vec2/modeling_wav2vec2.py", line 1528, in forward
return_dict=return_dict,
File "/Users/jkang/anaconda3/envs/jk/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1120, in _call_impl
result = forward_call(*input, **kwargs)
File "/Users/jkang/anaconda3/envs/jk/lib/python3.7/site-packages/transformers/models/wav2vec2/modeling_wav2vec2.py", line 1171, in forward
return_dict=return_dict,
File "/Users/jkang/anaconda3/envs/jk/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/Users/jkang/anaconda3/envs/jk/lib/python3.7/site-packages/transformers/models/wav2vec2/modeling_wav2vec2.py", line 791, in forward
hidden_states[~attention_mask] = 0
IndexError: The shape of the mask [1920, 5] at index 0 does not match the shape of the indexed tensor [1, 5, 1024] at index 0
Exception in thread Thread-2:
Traceback (most recent call last):
File "/Users/jkang/anaconda3/envs/jk/lib/python3.7/threading.py", line 926, in _bootstrap_inner
self.run()
File "/Users/jkang/anaconda3/envs/jk/lib/python3.7/threading.py", line 870, in run
self._target(*self._args, **self._kwargs)
File "/Users/jkang/Desktop/test_asr/wav2vec2-live/live_asr.py", line 68, in vad_process
frame = stream.read(CHUNK)
File "/Users/jkang/anaconda3/envs/jk/lib/python3.7/site-packages/pyaudio.py", line 608, in read
return pa.read_stream(self._stream, num_frames, exception_on_overflow)
OSError: [Errno -9981] Input overflowed
I think the critical issue is this:
IndexError: The shape of the mask [1920, 5] at index 0 does not match the shape of the indexed tensor [1, 5, 1024] at index 0
I installed transformers==4.8.2, but the error occurs which is probably not related to the transformers' version in my guess.
Could you help me with what caused this error?
Thank you
Hello. I'm using your code with model saved in local directory (model is downloaded from here). We can use this model with or without 4-gram LM.
When i use model without LM, everything is ok.
But when i use model with 4-gram LM (code for combination of wav2vec2 and 4-grams LM is here), there is error when running kenlm.model(n_gram_path)
:
OSError: [Errno -9981] Input overflowed
Can you please check your code for this error?
Sorry for my bad English
Just an fyi, since this commit your wav2vec2-live is only compatible with transformers==4.8.2 and before
The default model that you shared in this repo, is for English. I checked the performance, it's not giving a good result.How to improve the model and reduce WER?
Please let me know whether this model will work in offline mode.
hello, thanks for your great work.
does this repo support onnx quantized models?
Hi,
I use model "m3hrdadfi/wav2vec2-large-xlsr-turkish" for live_asr.py but I get the exception below:
torch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values. To keep the current behavior, use torch.div(a, b, roun
ding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor').
return (input_length - kernel_size) // stride + 1
Exception in thread Thread-1:
Traceback (most recent call last):
File "C:\Program Files\Python38\lib\threading.py", line 932, in _bootstrap_inner
self.run()
File "C:\Program Files\Python38\lib\threading.py", line 870, in run
self._target(*self._args, **self._kwargs)
File ".\live_asr.py", line 89, in asr_process
text = wave2vec_asr.buffer_to_text(float64_buffer).lower()
File "D:\Dev\pertev\utils\wav2vec2_inference.py", line 22, in buffer_to_text
logits = self.model(inputs.input_values, attention_mask=torch.ones(len(inputs.input_values[0]))).logits
File "D:\Dev\pertev\venv\lib\site-packages\torch\nn\modules\module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "D:\Dev\pertev\venv\lib\site-packages\transformers\models\wav2vec2\modeling_wav2vec2.py", line 1494, in forward
outputs = self.wav2vec2(
File "D:\Dev\pertev\venv\lib\site-packages\torch\nn\modules\module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "D:\Dev\pertev\venv\lib\site-packages\transformers\models\wav2vec2\modeling_wav2vec2.py", line 1076, in forward
encoder_outputs = self.encoder(
File "D:\Dev\pertev\venv\lib\site-packages\torch\nn\modules\module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "D:\Dev\pertev\venv\lib\site-packages\transformers\models\wav2vec2\modeling_wav2vec2.py", line 695, in forward
hidden_states[~attention_mask] = 0
IndexError: The shape of the mask [2400, 7] at index 0 does not match the shape of the indexed tensor [1, 7, 1024] at index 0
I have implemented (not from scratch) LiveASREngine using whisper using the following codebase written by you:
https://github.com/oliverguhr/wav2vec2-live
The only change I made was in the wav2vec2_inference.py: initialized whisper model with hugging face pipeline.
my code: https://github.com/Dimlight/LiveASREngine
The problem I am facing now:
If I do not say anything and the entire room is silent, the engine continuously prints "you" or "thank you", I tested the system in a quiet room. still getting the same issue.
Can anyone please help me what can be the reasons for getting this kind of result?
Hi , I'm using the code given, no errors, only one warning, but the performance is terrible, am I the only one experincing that issues?
Buy terrible I mean it is mostly hard to see the connection between what said and the transcription.
The warning I'm getting is:
Some weights of Wav2Vec2ForCTC were not initialized from the model checkpoint at facebook/wav2vec2-large-960h and are newly initialized: ['wav2vec2.masked_spec_embed'] You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Any idea someone?
Hello @oliverguhr ! Thank you for your code, which is working fine on my CPU.
For practical reasons (another device with poor CPU but good GPU), I would like to be able to run it on GPU, using CUDA. The problem is that when I run the code using
device = "cuda:0" if torch.cuda.is_available() else "cpu"
self.model.to(device)
I get a RuntimeError: CUDA error: out of memory
error, even though I have 2GB memory available, and even though your code already uses with torch.no_grad()
:.
Do you know if the code can be adapted to use less memory, or how much memory would be needed for an inference?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.