Hi there, I've been desperate to try your after I saw it on r

Thanks for the deion and sorry for the inconveniences. Looks to me like Silero V

I've installed FFMPEG v 4.4.4 via brew install ffmpeg@4 >> <a href="https://form

Comments (13)

KoljaB commented on August 13, 2024

Hey there, thanks for trying out my stuff and helping to get rid of those early annoyances most fresh libraries have.

It looks like the main problem is that loading the Silero VAD Model fails. The CTranslate warning (due to MacBook and CPU inference) and the other stuff (silero exception happens in constructor and i don't handle that well in the shutdown) should be no real problems.

Could you please try to run some minimal code, only the silero model loading part, to see if the issue persists?

import torch
model, _ = torch.hub.load(repo_or_dir="snakers4/silero-vad", model="silero_vad", verbose=True)

from realtimestt.

Captain-Bacon commented on August 13, 2024

Thanks for the reply, and I'll happily do just about anything - this is just what I've been looking for - hit me up for any test you fancy!

Running just that code I got the following in the console:

Using cache found in /Users/.cache/torch/hub/snakers4_silero-vad_master

from realtimestt.

Captain-Bacon commented on August 13, 2024

Just in case it's useful, if I don't catch the error in the 1st 1/10th of a second, then I get hundreds of console logs as per the following.

`OSError: [Errno -9988] Stream closed

Error: [Errno -9988] Stream closed
RealTimeSTT: root - ERROR - Error during recording: [Errno -9988] Stream closed
Traceback: Traceback (most recent call last):
File ".../GitHub/RealtimeSTT/RealtimeSTT/audio_recorder.py", line 594, in _recording_worker
data = self.stream.read(self.buffer_size)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File ".../anaconda3/envs/whisper/lib/python3.11/site-packages/pyaudio/init.py", line 570, in read
return pa.read_stream(self._stream, num_frames,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
OSError: [Errno -9988] Stream closed
Fatal Python error: _enter_buffered_busy: could not acquire lock for <_io.BufferedWriter name=''> at interpreter shutdown, possibly due to daemon threads
Python runtime state: finalizing (tstate=0x0000000100d3ab28)

Current thread 0x00000001ea8c2080 (most recent call first):

Extension modules: pyaudio._portaudio, av._core, av.logging, av.bytesource, av.buffer, av.audio.format, av.enum, av.error, av.utils, av.option, av.descriptor, av.container.pyio, av.dictionary, av.format, av.stream, av.container.streams, av.sidedata.motionvectors, av.sidedata.sidedata, av.packet, av.container.input, av.container.output, av.container.core, av.codec.context, av.video.format, av.video.reformatter, av.plane, av.video.plane, av.video.frame, av.video.stream, av.codec.codec, av.frame, av.audio.layout, av.audio.plane, av.audio.frame, av.audio.stream, av.audio.fifo, av.filter.pad, av.filter.link, av.filter.context, av.filter.graph, av.filter.filter, av.audio.resampler, numpy.core._multiarray_umath, numpy.core._multiarray_tests, numpy.linalg._umath_linalg, numpy.fft._pocketfft_internal, numpy.random._common, numpy.random.bit_generator, numpy.random._bounded_integers, numpy.random._mt19937, numpy.random.mtrand, numpy.random._philox, numpy.random._pcg64, numpy.random._sfc64, numpy.random._generator, torch._C, torch._C._fft, torch._C._linalg, torch._C._nested, torch._C._nn, torch._C._sparse, torch._C._special, yaml._yaml, charset_normalizer.md, _webrtcvad (total: 65)
zsh: abort `

from realtimestt.

KoljaB commented on August 13, 2024

Thanks for the description and sorry for the inconveniences. Looks to me like Silero VAD loads using minimal code but fails loading when you use RealtimeSTT. Can't say that for sure, since it should throw an exception and print a logging message in that case, which i do not see in the output. I haven't worked with a MacBook setup before, but I'll do my best to help troubleshooting. Some ideas:

Are the minimal code and RealtimeSTT running in the same Anaconda environment? If not, please make sure they are.
RealtimeSTT is basically a single Python file (audio_recorder.py). You could try to download this file directly from the GitHub repository and run your code against this file. This helps rule out any issues torch.hub.load could have with the pip or Anaconda installation.

https://github.com/KoljaB/RealtimeSTT/blob/master/RealtimeSTT/audio_recorder.py

Try to run the AudioToTextRecorder class with the level=logging.DEBUG parameter (hopefully we see the logging), maybe we then see more details about what goes wrong.
Try creating a new Anaconda environment solely for RealtimeSTT (i see open-interpreter in your URL - i do not think it is the case but maybe some libs do collide)

from realtimestt.

Captain-Bacon commented on August 13, 2024

For whatever reason I always download the full GH repo - that's what I meant when I said that I'd tried it with and without the pip install realtimestt in my original message. I tried it directly from the repo just in case. I'd also run it inside an entirely vanilla brand-new environment with only realtimesst and it's dependencies. Of course that doesn't mean that I did it right, but I'm certainly trying to (not a coder!).

I put a wrapper around the script:

`
import subprocess
import sys

def run_script(script_path):
process = subprocess.Popen(['python', script_path], stdout=subprocess.PIPE, stderr=subprocess.PIPE)

while True:
    output = process.stderr.readline()
    if output:
        output = output.decode('utf-8')  # Add this line here
        print(output.strip())
        if "Error during recording: [Errno -9988] Stream closed" in output.strip():
            process.terminate()
            print("Script terminated due to error")
            sys.exit(1)
    if output == '' and process.poll() is not None:
        break

rc = process.poll()
return rc

run_script('/Users/GitHub/RealtimeSTT/test.py')
`

Here's the test.py script, with the requested debug log level:

`from RealtimeSTT import AudioToTextRecorder
import logging

logging.basicConfig(filename='app.log', filemode='w', format='%(name)s - %(levelname)s - %(message)s')
recorder = AudioToTextRecorder(level=logging.DEBUG)

print("Say something...")

while (True): print(recorder.text(), end=" ", flush=True)

And here's the output from the console, this time it doesn't have all the forced 'break' info, so hopefully it will be more useful?

.../RealtimeSTT/realtimestt_wrapper.py
b'[2023-09-21 21:35:52.870] [ctranslate2] [thread 191083] [warning] The compute type inferred from the saved model is >float16, but the target device or backend do not support efficient float16 computation. The model weights have been >automatically converted to use the float32 compute type instead.'

There's no other info in the console

Here's the log I saved:

root - WARNING - Input overflowed. Frame dropped.
root - ERROR - Error during recording: [Errno -9988] Stream closed
root - ERROR - Error during recording: [Errno -9988] Stream closed
root - ERROR - Error during recording: [Errno -9988] Stream closed
root - ERROR - Error during recording: [Errno -9988] Stream closed
root - ERROR - Error during recording: [Errno -9988] Stream closed
root - ERROR - Error during recording: [Errno -9988] Stream closed
root - ERROR - Error during recording: [Errno -9988] Stream closed
root - ERROR - Error during recording: [Errno -9988] Stream closed
root - ERROR - Error during recording: [Errno -9988] Stream closed
root - ERROR - Error during recording: [Errno -9988] Stream closed
root - ERROR - Error during recording: [Errno -9988] Stream closed

repeat this a few more times...

from realtimestt.

Captain-Bacon commented on August 13, 2024

I wasn't getting anything from the debug, so i had a bit of a poke around in your code (hope you don't mind!)

The realtimestt folder from the repo was read only, so I had to change that, and I uncommented the file name so it would save the file.

I was still not getting anything, so I moved the logging statement to the top of the page, straight after the imports, and before the init.

Here's what I got:

`RealtimeSTT: root - INFO - AudioToTextRecorder object created
RealtimeSTT: root - INFO - AudioToTextRecorder object created
RealtimeSTT: urllib3.connectionpool - DEBUG - Starting new HTTPS connection (1): huggingface.co:443
RealtimeSTT: urllib3.connectionpool - DEBUG - https://huggingface.co:443 "GET /api/models/guillaumekln/faster-whisper-tiny/revision/main HTTP/1.1" 200 1812
RealtimeSTT: root - INFO - Initializing WebRTC voice with Sensitivity 3
RealtimeSTT: torchaudio._extension - DEBUG - Failed to initialize ffmpeg bindings
Traceback (most recent call last):
File "/Users/anaconda3/envs/realtimestt/lib/python3.11/site-packages/torchaudio/_extension/utils.py", line 85, in _init_ffmpeg
_load_lib("libtorchaudio_ffmpeg")
File "/Users/anaconda3/envs/realtimestt/lib/python3.11/site-packages/torchaudio/_extension/utils.py", line 61, in _load_lib
torch.ops.load_library(path)
File "/Users/anaconda3/envs/realtimestt/lib/python3.11/site-packages/torch/_ops.py", line 643, in load_library
ctypes.CDLL(path)
File "/Users/anaconda3/envs/realtimestt/lib/python3.11/ctypes/init.py", line 376, in init
self._handle = _dlopen(self._name, mode)
^^^^^^^^^^^^^^^^^^^^^^^^^
OSError: dlopen(/Users/anaconda3/envs/realtimestt/lib/python3.11/site-packages/torchaudio/lib/libtorchaudio_ffmpeg.so, 0x0006): Library not loaded: @rpath/libavdevice.58.dylib
Referenced from: <00D3B28A-9088-32CE-B641-F43D64502379> /Users/anaconda3/envs/realtimestt/lib/python3.11/site-packages/torchaudio/lib/libtorchaudio_ffmpeg.so
Reason: tried: '/Users/anaconda3/envs/realtimestt/lib/python3.11/lib-dynload/../../libavdevice.58.dylib' (no such file), '/Users/anaconda3/envs/realtimestt/bin/../lib/libavdevice.58.dylib' (no such file), '/usr/local/lib/libavdevice.58.dylib' (no such file), '/usr/lib/libavdevice.58.dylib' (no such file, not in dyld cache)

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/Users/anaconda3/envs/realtimestt/lib/python3.11/site-packages/torchaudio/_extension/init.py", line 67, in
_init_ffmpeg()
File "/Users/anaconda3/envs/realtimestt/lib/python3.11/site-packages/torchaudio/_extension/utils.py", line 87, in _init_ffmpeg
raise ImportError("FFmpeg libraries are not found. Please install FFmpeg.") from err
ImportError: FFmpeg libraries are not found. Please install FFmpeg.
RealtimeSTT: root - INFO - _recording_worker method called
RealtimeSTT: root - DEBUG - Starting recording worker
RealtimeSTT: root - DEBUG - Starting realtime worker
RealtimeSTT: root - DEBUG - Constructor finished
RealtimeSTT: root - WARNING - Input overflowed. Frame dropped.
RealtimeSTT: root - ERROR - Error during recording: [Errno -9988] Stream closed
RealtimeSTT: root - ERROR - Error during recording: [Errno -9988] Stream closed
RealtimeSTT: root - ERROR - Error during recording: [Errno -9988] Stream closed
RealtimeSTT: root - ERROR - Error during recording: [Errno -9988] Stream closed
RealtimeSTT: root - ERROR - Error during recording: [Errno -9988] Stream closed
RealtimeSTT: root - ERROR - Error during recording: [Errno -9988] Stream closed
'

except...

==> Downloading https://formulae.brew.sh/api/formula.jws.json
################################################################################################################# 100.0%
==> Downloading https://formulae.brew.sh/api/cask.jws.json
################################################################################################################# 100.0%
Warning: ffmpeg 6.0_1 is already installed and up-to-date.

from realtimestt.

KoljaB commented on August 13, 2024

I read somewhere that torchaudio is incompatible to some ffmpeg versions. Can you check your ffmpeg version in a terminal with ffmpeg -version? I think ffmpeg 4.4 is maximal official supported for torchaudio, maybe it's worth trying to downgrade to that version.

from realtimestt.

Captain-Bacon commented on August 13, 2024

I've installed FFMPEG v 4.4.4 via brew install ffmpeg@4 >> https://formulae.brew.sh/formula/ffmpeg@4
"ffmpeg@4 4.4.4 is installed and up-to-date."

FFMPEG -Version gives the following output:
ffmpeg -version
ffmpeg version 4.4.4 Copyright (c) 2000-2023 the FFmpeg developers
built with Apple clang version 14.0.3 (clang-1403.0.22.14.1)
configuration: --prefix='/opt/homebrew/Cellar/ffmpeg@4/4.4.4' --enable-shared --enable-pthreads --enable-version3 --cc=clang --host-cflags= --host-ldflags= --enable-avresample --enable-ffplay --enable-gnutls --enable-gpl --enable-libaom --enable-libbluray --enable-libdav1d --enable-libmp3lame --enable-libopus --enable-librav1e --enable-librist --enable-librubberband --enable-libsnappy --enable-libsrt --enable-libtesseract --enable-libtheora --enable-libvidstab --enable-libvmaf --enable-libvorbis --enable-libvpx --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxml2 --enable-libxvid --enable-lzma --enable-libfontconfig --enable-libfreetype --enable-frei0r --enable-libass --enable-libopencore-amrnb --enable-libopencore-amrwb --enable-libopenjpeg --enable-libspeex --enable-libsoxr --enable-libzmq --enable-libzimg --disable-libjack --disable-indev=jack --enable-videotoolbox
libavutil 56. 70.100 / 56. 70.100
libavcodec 58.134.100 / 58.134.100
libavformat 58. 76.100 / 58. 76.100
libavdevice 58. 13.100 / 58. 13.100
libavfilter 7.110.100 / 7.110.100
libavresample 4. 0. 0 / 4. 0. 0
libswscale 5. 9.100 / 5. 9.100
libswresample 3. 9.100 / 3. 9.100
libpostproc 55. 9.100 / 55. 9.100

When I installed FFMEG I got this message:

ffmpeg@4 is keg-only, which means it was not symlinked into /opt/homebrew,
because this is an alternate version of another formula.

If you need to have ffmpeg@4 first in your PATH, run:
echo 'export PATH="/opt/homebrew/opt/ffmpeg@4/bin:$PATH"' >> ~/.zshrc
?
?For compilers to find ffmpeg@4 you may need to set:
export LDFLAGS="-L/opt/homebrew/opt/ffmpeg@4/lib"
export CPPFLAGS="-I/opt/homebrew/opt/ffmpeg@4/include"

For pkg-config to find ffmpeg@4 you may need to set:
export PKG_CONFIG_PATH="/opt/homebrew/opt/ffmpeg@4/lib/pkgconfig"

I've created symbolic links:
lrwxr-xr-x@ 1 user staff 59 Sep 24 00:17 libavcodec.58.dylib -> /opt/homebrew/Cellar/ffmpeg@4/4.4.4/lib/libavcodec.58.dylib
lrwxr-xr-x@ 1 user staff 60 Sep 24 00:10 libavdevice.58.dylib -> /opt/homebrew/Cellar/ffmpeg@4/4.4.4/lib/libavdevice.58.dylib
lrwxr-xr-x@ 1 user staff 59 Sep 24 00:16 libavfilter.7.dylib -> /opt/homebrew/Cellar/ffmpeg@4/4.4.4/lib/libavfilter.7.dylib
lrwxr-xr-x@ 1 user staff 60 Sep 24 00:16 libavformat.58.dylib -> /opt/homebrew/Cellar/ffmpeg@4/4.4.4/lib/libavformat.58.dylib
lrwxr-xr-x@ 1 user staff 58 Sep 24 00:17 libavutil.56.dylib -> /opt/homebrew/Cellar/ffmpeg@4/4.4.4/lib/libavutil.56.dylib

I've exported paths and such:
export DYLD_LIBRARY_PATH=/Users/anaconda3/envs/realtimestt/lib/python3.11/:$DYLD_LIBRARY_PATH
echo 'export PATH="/opt/homebrew/opt/ffmpeg@4/bin:$PATH"' >> ~/.zshrc
echo 'export LDFLAGS="-L/opt/homebrew/opt/ffmpeg@4/lib"' >> ~/.zshrc
echo 'export CPPFLAGS="-I/opt/homebrew/opt/ffmpeg@4/include"' >> ~/.zshrc
echo 'export PKG_CONFIG_PATH="/opt/homebrew/opt/ffmpeg@4/lib/pkgconfig"' >> ~/.zshrc
reloaded the ZSH (source ~/.zshrc), then reloaded the window in vscode to make sure it was all updated.

(Chat GPT told me what to do and how to do it when I fed the error message in the log into it)

Here's the log...
RealtimeSTT: root - INFO - AudioToTextRecorder object created
RealtimeSTT: urllib3.connectionpool - DEBUG - Starting new HTTPS connection (1): huggingface.co:443
RealtimeSTT: urllib3.connectionpool - DEBUG - https:/huggingface.co:443 "GET /api/models/guillaumekln/faster-whisper-tiny/revision/main HTTP/1.1" 200 1812
RealtimeSTT: root - INFO - Initializing WebRTC voice with Sensitivity 3
RealtimeSTT: torchaudio._extension - DEBUG - Failed to initialize ffmpeg bindings
Traceback (most recent call last):
File "...anaconda3/envs/realtimestt/lib/python3.11/site-packages/torchaudio/_extension/utils.py", line 85, in _init_ffmpeg
_load_lib("libtorchaudio_ffmpeg")
File ".../anaconda3/envs/realtimestt/lib/python3.11/site-packages/torchaudio/_extension/utils.py", line 61, in _load_lib
torch.ops.load_library(path)
File "...anaconda3/envs/realtimestt/lib/python3.11/site-packages/torch/_ops.py", line 643, in load_library
ctypes.CDLL(path)
File ".../anaconda3/envs/realtimestt/lib/python3.11/ctypes/init.py", line 376, in init
self._handle = _dlopen(self._name, mode)
^^^^^^^^^^^^^^^^^^^^^^^^^
OSError: dlopen(/Users/anaconda3/envs/realtimestt/lib/python3.11/site-packages/torchaudio/lib/libtorchaudio_ffmpeg.so, 0x0006): Library not loaded: @rpath/libavdevice.58.dylib
Referenced from: <00D3B28A-9088-32CE-B641-F43D64502379> /Users/anaconda3/envs/realtimestt/lib/python3.11/site-packages/torchaudio/lib/libtorchaudio_ffmpeg.so
Reason: tried: '/Users/anaconda3/envs/realtimestt/lib/python3.11/lib-dynload/../../libavdevice.58.dylib' (no such file), '/Users/anaconda3/envs/realtimestt/bin/../lib/libavdevice.58.dylib' (no such file), '/usr/local/lib/libavdevice.58.dylib' (no such file), '/usr/lib/libavdevice.58.dylib' (no such file, not in dyld cache)

The above exception was the direct cause of the following exception:

I am currently at a bit of a loss?

I don't' understand this documentation, but perhaps it will be a bit of use? >> https://pytorch.org/audio/main/_modules/torchaudio/utils/ffmpeg_utils.html

from realtimestt.

Captain-Bacon commented on August 13, 2024

After doing a load of testing I have found that if I remove the torchaudio from the requirements.txt file, then install the ffmpeg and the torchaudio via the Conda Pytorch channel then I can overcome the bindings problem.

I install them with
conda install -c pytorch torchaudio ffmpeg

This then installs both packages with their dependencies. It installs
ffmpeg version 4.2.2
libavutil 56. 31.100 / 56. 31.100
libavcodec 58. 54.100 / 58. 54.100
libavformat 58. 29.100 / 58. 29.100
libavdevice 58. 8.100 / 58. 8.100
libavfilter 7. 57.100 / 7. 57.100
libavresample 4. 0. 0 / 4. 0. 0
libswscale 5. 5.100 / 5. 5.100
libswresample 3. 5.100 / 3. 5.100
libpostproc 55. 5.100 / 55. 5.100

and

Name Version Build Channel

torchaudio 2.0.2 py311_cpu pytorch

Now my log file is a BIT easier...

RealtimeSTT: root - INFO - AudioToTextRecorder object created
RealtimeSTT: urllib3.connectionpool - DEBUG - Starting new HTTPS connection (1): huggingface.co:443
RealtimeSTT: urllib3.connectionpool - DEBUG - https://huggingface.co:443 "GET /api/models/guillaumekln/faster-whisper-tiny/revision/main HTTP/1.1" 200 1812
RealtimeSTT: root - INFO - Initializing WebRTC voice with Sensitivity 3
RealtimeSTT: root - INFO - _recording_worker method called
RealtimeSTT: root - DEBUG - Starting recording worker
RealtimeSTT: root - DEBUG - Starting realtime worker
RealtimeSTT: root - DEBUG - Constructor finished
RealtimeSTT: root - WARNING - Input overflowed. Frame dropped.
RealtimeSTT: root - ERROR - Error during recording: [Errno -9988] Stream closed

The console output is the part I'm currently having difficulty with:

objc[13119]: Class AVFFrameReceiver is implemented in both /Users/anaconda3/envs/realtimestt/lib/python3.11/site-packages/av/.dylibs/libavdevice.59.7.100.dylib (0x10367c778) and >/Users/anaconda3/envs/realtimestt/lib/libavdevice.58.8.100.dylib (0x13a4f0798). One of the two will be used. Which one is undefined.
objc[13119]: Class AVFAudioReceiver is implemented in both /Users/anaconda3/envs/realtimestt/lib/python3.11/site-packages/av/.dylibs/libavdevice.59.7.100.dylib (0x10367c7c8) and >/Users/anaconda3/envs/realtimestt/lib/libavdevice.58.8.100.dylib (0x13a4f07e8). One of the two will be used. Which one is undefined.

No matter which version I remove using rm, I end up with a situation that it crashes with a console output telling me that the file that it's looking for can't be found....

I'm sure there's some obvious way around this, but I don't know what it is.

Apologies if the updates are spamming you, but I grab the time as and when I can ( kids ;-) ), so I'm noting it both as an aide-de-memoir, and also in case I don't finish what I'm working on so that you have some idea of what 'progress' or otherwise is being made in case I don't get back to it for a day or two.

from realtimestt.

KoljaB commented on August 13, 2024

Thanks for your detailled feedback and your patience. Unfortunately I am also very lost on the conflicting versions of the shared libraries. This is so much environment / MacBook related and I lack experience with apple products and deployment to really give helpful advice here. It seems quite strange that using a clean conda install in a new environment causes such library conflicts.

What I do see in the log files is that something with the pyAudio stream is going wrong:

RealtimeSTT: root - WARNING - Input overflowed. Frame dropped.
RealtimeSTT: root - ERROR - Error during recording: [Errno -9988] Stream closed

The first error means a pyaudio.paInputOverflowed exception was raised, indicating that audio samples were dropped from the input stream. Probably as a consequence of that the consequence the stream gets closed.

I am currently unsure, why this exception gets raised. I would suggest doing a very basic test of your pyAudio installation like this:

import pyaudio

class SimpleAudioRecorder:
    def __init__(self):
        self.rate = 16000
        self.format = pyaudio.paInt16
        self.channels = 1
        self.input = True
        self.buffer_size = 512

        self.pa = pyaudio.PyAudio()
        self.stream = self.pa.open(rate=self.rate,
                                   format=self.format,
                                   channels=self.channels,
                                   input=self.input,
                                   frames_per_buffer=self.buffer_size)

    def record(self):
        print("Recording for 5 seconds...")
        frames = []

        for _ in range(0, int(self.rate / self.buffer_size * 5)):
            try:
                data = self.stream.read(self.buffer_size)
                frames.append(data)
            except IOError as e:
                print(f"Error recording data: {e}")

        print("Recording complete!")
        return b''.join(frames)

    def close(self):
        self.stream.stop_stream()
        self.stream.close()
        self.pa.terminate()


if __name__ == '__main__':
    recorder = SimpleAudioRecorder()
    audio_data = recorder.record()
    recorder.close()

Since it uses the same pyAudio logic as RealtimeSTT it should fail too and if it does, we have the pyAudio issue pinpointed down to simple demo code and can focus better on getting rid of it.

from realtimestt.

Captain-Bacon commented on August 13, 2024

Sorry for the delay in getting back to you.

That code snippet worked just fine...

Recording for 5 seconds... Recording complete!

I wanted to make sure that it was actually recording, so added the following:

with wave.open(filename, 'wb') as wf: wf.setnchannels(self.channels) wf.setsampwidth(self.pa.get_sample_size(self.format)) wf.setframerate(self.rate) wf.writeframes(b''.join(frames))

It output the file, and it was fine as a recording.

from realtimestt.

KoljaB commented on August 13, 2024

Ok, I finally start to get a grasp of what is happening. Sorry for all the issues.

You get a pyaudio.paInputOverflowed exception and your stream works, I think the reason must be that the script does not call the read() method fast enough to consume the incoming audio data, causing the buffer to fill up and overflow. And the only thing in that loop that really takes time to process is the WebRTC voice activity detection I do after reading from the stream. I expected this to be fast enough to do it in the stream loop, but turns out it isn't.

So I need to update the library and perform the WebRTC voice activity detection in another thread like I already do it with SileroVAD. But this forces me to redesign some things, I also rely on WebRTC when detecting end of speech and I need a clean recording worker.

So - as I need to redesign some things anyway, I think I should also switch the main transcription logic from multithreading to multiprocessing. The current implementation isn't perfect in the occasions where VAD and transcription are done in parallel only in threads. I guess pythons global interpreter lock makes them not interfer smoothly.

I will think about all that for a while. Then I will do a new release with reworked and hopefully more solid recording, VAD and transcription. Will take me something between one and three weeks I guess.

from realtimestt.

KoljaB commented on August 13, 2024

Just released a new version with a separated recording process. I really, really hope that this will solve your problem too (can't promise ofc). Maybe you can give it a try, I would love to hear some feedback.

Edit: you need to update your client code and include if name == 'main': protection due to the multiprocessing update. The files in the test directory are already all updated, please look at the new realtimestt_test.py file

from realtimestt.

unable to run script about realtimestt HOT 13 OPEN

Comments (13)

Name Version Build Channel

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent