So from what I've seen when the runs it attempts to run as a GPU if one is pres

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Should a GPU help this algorithm go faster or no? about whisperhallu HOT 6 OPEN

jsteinberg-rbi commented on May 29, 2024

Should a GPU help this algorithm go faster or no?

from whisperhallu.

Comments (6)

EtienneAb3d commented on May 29, 2024

WhisperHallu is using Whisper or FasterWisper out of the box, without any modification on them. I don't understand why you didn't get them using your GPU.

from whisperhallu.

jsteinberg-rbi commented on May 29, 2024

@EtienneAb3d

Hey thanks for the prompt response! Er -- Whisper and FasterWhisper will use the GPU, but what about ffmpeg, demucs, etc -- are those going to take forever? I had figured that running your algorithm on a GPU would make all that "pre-processing" that prevents the Whisper hallucination go a lot faster? I'm using an NVIDIA A100 40GB.

Here's the log so far:

(base) root@instance-2:/home/jsteinberg/WhisperHallu# ls
README.md  data  demucsWrapper.py  hallu.py  markers  transcribeHallu.py
(base) root@instance-2:/home/jsteinberg/WhisperHallu# python hallu.py
Python >= 3.10
/opt/conda/lib/python3.10/site-packages/torch/hub.py:286: UserWarning: You are about to download and run code from an untrusted repository. In a future release, this won't be allowed. To add the repository to your trusted list, change the command to {calling_fn}(..., trust_repo=False) and a command prompt will appear asking for an explicit confirmation of trust, or load(..., trust_repo=True), which will assume that the prompt is to be answered with 'yes'. You can also use load(..., trust_repo='check') which will only prompt for confirmation if the repo is not already trusted. This will eventually be the default behaviour
  warnings.warn(
Downloading: "https://github.com/snakers4/silero-vad/zipball/master" to /root/.cache/torch/hub/master.zip
Using Demucs
Downloading: "https://dl.fbaipublicfiles.com/demucs/hybrid_transformer/955717e8-8726e21a.th" to /root/.cache/torch/hub/checkpoints/955717e8-8726e21a.th
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 80.2M/80.2M [00:00<00:00, 111MB/s]
/opt/conda/lib/python3.10/site-packages/whisper/timing.py:58: NumbaDeprecationWarning: The 'nopython' keyword argument was not supplied to the 'numba.jit' decorator. The implicit default value for this argument is currently False, but it will be changed to True in Numba 0.59.0. See https://numba.readthedocs.io/en/stable/reference/deprecation.html#deprecation-of-object-mode-fall-back-behaviour-when-using-jit for details.
  def backtrace(trace: np.ndarray):
Using standard Whisper
LOADING: large-v2 GPU:0 BS: 2
100%|█████████████████████████████████████| 2.87G/2.87G [00:47<00:00, 65.2MiB/s]
LOADED
=====transcribePrompt
PATH=../230821_0020S12.wav
LNGINPUT=en
LNG=en
PROMPT=Whisper, Ok. A pertinent sentence for your purpose in your language. Ok, Whisper. Whisper, Ok. Ok, Whisper. Whisper, Ok. Please find here, an unlikely ordinary sentence. This is to avoid a repetition to be deleted. Ok, Whisper. 
CMD: ffmpeg -y -i "../230821_0020S12.wav"  -c:a pcm_s16le -ar 16000 "../230821_0020S12.wav.WAV.wav" > "../230821_0020S12.wav.WAV.wav.log" 2>&1
T= 10.130795001983643
PATH=../230821_0020S12.wav.WAV.wav
Demucs using device: cuda:0
Source: drums
Source: bass
Source: other
Source: vocals
T= 186.54959273338318
PATH=../230821_0020S12.wav.WAV.wav.vocals.wav
CMD: ffmpeg -y -i "../230821_0020S12.wav.WAV.wav.vocals.wav" -af "silenceremove=start_periods=1:stop_periods=-1:start_threshold=-50dB:stop_threshold=-50dB:start_silence=0.2:stop_silence=0.2, loudnorm"  -c:a pcm_s16le -ar 16000 "../230821_0020S12.wav.WAV.wav.vocals.wav.SILCUT.wav" > "../230821_0020S12.wav.WAV.wav.vocals.wav.SILCUT.wav.log" 2>&1
T= 58.83332967758179
PATH=../230821_0020S12.wav.WAV.wav.vocals.wav.SILCUT.wav
DURATION=7452
/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py:1501: UserWarning: operator() profile_node %669 : int[] = prim::profile_ivalue(%667)
 does not have profile information (Triggered internally at ../third_party/nvfuser/csrc/graph_fuser.cpp:104.)
  return forward_call(*args, **kwargs)
T= 27.54055142402649
PATH=../230821_0020S12.wav.WAV.wav.vocals.wav.SILCUT.wav.VAD.wav
NOT USING MARKS FOR DURATION > 30s
[0] PATH=../230821_0020S12.wav.WAV.wav.vocals.wav.SILCUT.wav.VAD.wav

from whisperhallu.

jsteinberg-rbi commented on May 29, 2024

Wowza. I got it working.

from whisperhallu.

EtienneAb3d commented on May 29, 2024

@jsteinberg-rbi
Demucs should run GPU. I think this is not possible with ffmpeg, but perhaps there is a possibility I ignore, especially for some features.
What did you do to get it working?

from whisperhallu.

jsteinberg-rbi commented on May 29, 2024

@EtienneAb3d The file I was testing with initially was a 4GB file and it would just spin forever. When I switched to a 2GB it ran in under 10 minutes :)

Question for you: so I ran your script over 30 files last night. Which one of these files has the silence removed?

230821_0020S12.wav
230821_0020S12.wav.WAV.wav
230821_0020S12.wav.WAV.wav.bass.wav
230821_0020S12.wav.WAV.wav.drums.wav
230821_0020S12.wav.WAV.wav.log
230821_0020S12.wav.WAV.wav.other.wav
230821_0020S12.wav.WAV.wav.vocals.wav
230821_0020S12.wav.WAV.wav.vocals.wav.SILCUT.wav
230821_0020S12.wav.WAV.wav.vocals.wav.SILCUT.wav.VAD.wav
230821_0020S12.wav.WAV.wav.vocals.wav.SILCUT.wav.log

from whisperhallu.

EtienneAb3d commented on May 29, 2024

@jsteinberg-rbi
SILCUT = Silence Cut

from whisperhallu.

Should a GPU help this algorithm go faster or no? about whisperhallu HOT 6 OPEN

Comments (6)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent