Comments (6)
WhisperHallu is using Whisper or FasterWisper out of the box, without any modification on them. I don't understand why you didn't get them using your GPU.
from whisperhallu.
Hey thanks for the prompt response! Er -- Whisper and FasterWhisper will use the GPU, but what about ffmpeg, demucs, etc -- are those going to take forever? I had figured that running your algorithm on a GPU would make all that "pre-processing" that prevents the Whisper hallucination go a lot faster? I'm using an NVIDIA A100 40GB.
Here's the log so far:
(base) root@instance-2:/home/jsteinberg/WhisperHallu# ls
README.md data demucsWrapper.py hallu.py markers transcribeHallu.py
(base) root@instance-2:/home/jsteinberg/WhisperHallu# python hallu.py
Python >= 3.10
/opt/conda/lib/python3.10/site-packages/torch/hub.py:286: UserWarning: You are about to download and run code from an untrusted repository. In a future release, this won't be allowed. To add the repository to your trusted list, change the command to {calling_fn}(..., trust_repo=False) and a command prompt will appear asking for an explicit confirmation of trust, or load(..., trust_repo=True), which will assume that the prompt is to be answered with 'yes'. You can also use load(..., trust_repo='check') which will only prompt for confirmation if the repo is not already trusted. This will eventually be the default behaviour
warnings.warn(
Downloading: "https://github.com/snakers4/silero-vad/zipball/master" to /root/.cache/torch/hub/master.zip
Using Demucs
Downloading: "https://dl.fbaipublicfiles.com/demucs/hybrid_transformer/955717e8-8726e21a.th" to /root/.cache/torch/hub/checkpoints/955717e8-8726e21a.th
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 80.2M/80.2M [00:00<00:00, 111MB/s]
/opt/conda/lib/python3.10/site-packages/whisper/timing.py:58: NumbaDeprecationWarning: The 'nopython' keyword argument was not supplied to the 'numba.jit' decorator. The implicit default value for this argument is currently False, but it will be changed to True in Numba 0.59.0. See https://numba.readthedocs.io/en/stable/reference/deprecation.html#deprecation-of-object-mode-fall-back-behaviour-when-using-jit for details.
def backtrace(trace: np.ndarray):
Using standard Whisper
LOADING: large-v2 GPU:0 BS: 2
100%|█████████████████████████████████████| 2.87G/2.87G [00:47<00:00, 65.2MiB/s]
LOADED
=====transcribePrompt
PATH=../230821_0020S12.wav
LNGINPUT=en
LNG=en
PROMPT=Whisper, Ok. A pertinent sentence for your purpose in your language. Ok, Whisper. Whisper, Ok. Ok, Whisper. Whisper, Ok. Please find here, an unlikely ordinary sentence. This is to avoid a repetition to be deleted. Ok, Whisper.
CMD: ffmpeg -y -i "../230821_0020S12.wav" -c:a pcm_s16le -ar 16000 "../230821_0020S12.wav.WAV.wav" > "../230821_0020S12.wav.WAV.wav.log" 2>&1
T= 10.130795001983643
PATH=../230821_0020S12.wav.WAV.wav
Demucs using device: cuda:0
Source: drums
Source: bass
Source: other
Source: vocals
T= 186.54959273338318
PATH=../230821_0020S12.wav.WAV.wav.vocals.wav
CMD: ffmpeg -y -i "../230821_0020S12.wav.WAV.wav.vocals.wav" -af "silenceremove=start_periods=1:stop_periods=-1:start_threshold=-50dB:stop_threshold=-50dB:start_silence=0.2:stop_silence=0.2, loudnorm" -c:a pcm_s16le -ar 16000 "../230821_0020S12.wav.WAV.wav.vocals.wav.SILCUT.wav" > "../230821_0020S12.wav.WAV.wav.vocals.wav.SILCUT.wav.log" 2>&1
T= 58.83332967758179
PATH=../230821_0020S12.wav.WAV.wav.vocals.wav.SILCUT.wav
DURATION=7452
/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py:1501: UserWarning: operator() profile_node %669 : int[] = prim::profile_ivalue(%667)
does not have profile information (Triggered internally at ../third_party/nvfuser/csrc/graph_fuser.cpp:104.)
return forward_call(*args, **kwargs)
T= 27.54055142402649
PATH=../230821_0020S12.wav.WAV.wav.vocals.wav.SILCUT.wav.VAD.wav
NOT USING MARKS FOR DURATION > 30s
[0] PATH=../230821_0020S12.wav.WAV.wav.vocals.wav.SILCUT.wav.VAD.wav
from whisperhallu.
Wowza. I got it working.
from whisperhallu.
@jsteinberg-rbi
Demucs should run GPU. I think this is not possible with ffmpeg, but perhaps there is a possibility I ignore, especially for some features.
What did you do to get it working?
from whisperhallu.
@EtienneAb3d The file I was testing with initially was a 4GB file and it would just spin forever. When I switched to a 2GB it ran in under 10 minutes :)
Question for you: so I ran your script over 30 files last night. Which one of these files has the silence removed?
230821_0020S12.wav
230821_0020S12.wav.WAV.wav
230821_0020S12.wav.WAV.wav.bass.wav
230821_0020S12.wav.WAV.wav.drums.wav
230821_0020S12.wav.WAV.wav.log
230821_0020S12.wav.WAV.wav.other.wav
230821_0020S12.wav.WAV.wav.vocals.wav
230821_0020S12.wav.WAV.wav.vocals.wav.SILCUT.wav
230821_0020S12.wav.WAV.wav.vocals.wav.SILCUT.wav.VAD.wav
230821_0020S12.wav.WAV.wav.vocals.wav.SILCUT.wav.log
from whisperhallu.
@jsteinberg-rbi
SILCUT = Silence Cut
from whisperhallu.
Related Issues (20)
- Use FFMPEG silenceremove and VAD HOT 2
- .
- SRT and Translation options HOT 3
- GPU out of memory HOT 8
- RAM Exceedance on Google Colab Handling Long Audio HOT 3
- Can I remove other person's voice HOT 2
- Why not use markers when the duration of the audio exceed 30s? HOT 2
- Can't load Whisper model: large HOT 2
- Missing LICENSE HOT 1
- No such file or directory: '<file_path>.WAV.wav.vocals.wav.SILCUT.wav.VAD.wav.MRK.wav.CPS.wav' HOT 1
- Support for using Whisper via API HOT 1
- The "segment" time is wrong from the real time when the silence is removed HOT 1
- Can’t load Whisper model HOT 1
- Google Colab Error HOT 2
- Can't load Whisper Model HOT 4
- SubtitleEdit HOT 3
- KeyError: 'word_timestamps' HOT 3
- New Google Colab Error HOT 4
- Cannot obtain timestamps along with transcription HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from whisperhallu.