Giter Club home page Giter Club logo

Comments (6)

EtienneAb3d avatar EtienneAb3d commented on May 29, 2024

WhisperHallu is using Whisper or FasterWisper out of the box, without any modification on them. I don't understand why you didn't get them using your GPU.

from whisperhallu.

jsteinberg-rbi avatar jsteinberg-rbi commented on May 29, 2024

@EtienneAb3d

Hey thanks for the prompt response! Er -- Whisper and FasterWhisper will use the GPU, but what about ffmpeg, demucs, etc -- are those going to take forever? I had figured that running your algorithm on a GPU would make all that "pre-processing" that prevents the Whisper hallucination go a lot faster? I'm using an NVIDIA A100 40GB.

Here's the log so far:

(base) root@instance-2:/home/jsteinberg/WhisperHallu# ls
README.md  data  demucsWrapper.py  hallu.py  markers  transcribeHallu.py
(base) root@instance-2:/home/jsteinberg/WhisperHallu# python hallu.py
Python >= 3.10
/opt/conda/lib/python3.10/site-packages/torch/hub.py:286: UserWarning: You are about to download and run code from an untrusted repository. In a future release, this won't be allowed. To add the repository to your trusted list, change the command to {calling_fn}(..., trust_repo=False) and a command prompt will appear asking for an explicit confirmation of trust, or load(..., trust_repo=True), which will assume that the prompt is to be answered with 'yes'. You can also use load(..., trust_repo='check') which will only prompt for confirmation if the repo is not already trusted. This will eventually be the default behaviour
  warnings.warn(
Downloading: "https://github.com/snakers4/silero-vad/zipball/master" to /root/.cache/torch/hub/master.zip
Using Demucs
Downloading: "https://dl.fbaipublicfiles.com/demucs/hybrid_transformer/955717e8-8726e21a.th" to /root/.cache/torch/hub/checkpoints/955717e8-8726e21a.th
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 80.2M/80.2M [00:00<00:00, 111MB/s]
/opt/conda/lib/python3.10/site-packages/whisper/timing.py:58: NumbaDeprecationWarning: The 'nopython' keyword argument was not supplied to the 'numba.jit' decorator. The implicit default value for this argument is currently False, but it will be changed to True in Numba 0.59.0. See https://numba.readthedocs.io/en/stable/reference/deprecation.html#deprecation-of-object-mode-fall-back-behaviour-when-using-jit for details.
  def backtrace(trace: np.ndarray):
Using standard Whisper
LOADING: large-v2 GPU:0 BS: 2
100%|█████████████████████████████████████| 2.87G/2.87G [00:47<00:00, 65.2MiB/s]
LOADED
=====transcribePrompt
PATH=../230821_0020S12.wav
LNGINPUT=en
LNG=en
PROMPT=Whisper, Ok. A pertinent sentence for your purpose in your language. Ok, Whisper. Whisper, Ok. Ok, Whisper. Whisper, Ok. Please find here, an unlikely ordinary sentence. This is to avoid a repetition to be deleted. Ok, Whisper. 
CMD: ffmpeg -y -i "../230821_0020S12.wav"  -c:a pcm_s16le -ar 16000 "../230821_0020S12.wav.WAV.wav" > "../230821_0020S12.wav.WAV.wav.log" 2>&1
T= 10.130795001983643
PATH=../230821_0020S12.wav.WAV.wav
Demucs using device: cuda:0
Source: drums
Source: bass
Source: other
Source: vocals
T= 186.54959273338318
PATH=../230821_0020S12.wav.WAV.wav.vocals.wav
CMD: ffmpeg -y -i "../230821_0020S12.wav.WAV.wav.vocals.wav" -af "silenceremove=start_periods=1:stop_periods=-1:start_threshold=-50dB:stop_threshold=-50dB:start_silence=0.2:stop_silence=0.2, loudnorm"  -c:a pcm_s16le -ar 16000 "../230821_0020S12.wav.WAV.wav.vocals.wav.SILCUT.wav" > "../230821_0020S12.wav.WAV.wav.vocals.wav.SILCUT.wav.log" 2>&1
T= 58.83332967758179
PATH=../230821_0020S12.wav.WAV.wav.vocals.wav.SILCUT.wav
DURATION=7452
/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py:1501: UserWarning: operator() profile_node %669 : int[] = prim::profile_ivalue(%667)
 does not have profile information (Triggered internally at ../third_party/nvfuser/csrc/graph_fuser.cpp:104.)
  return forward_call(*args, **kwargs)
T= 27.54055142402649
PATH=../230821_0020S12.wav.WAV.wav.vocals.wav.SILCUT.wav.VAD.wav
NOT USING MARKS FOR DURATION > 30s
[0] PATH=../230821_0020S12.wav.WAV.wav.vocals.wav.SILCUT.wav.VAD.wav

from whisperhallu.

jsteinberg-rbi avatar jsteinberg-rbi commented on May 29, 2024

Wowza. I got it working.

from whisperhallu.

EtienneAb3d avatar EtienneAb3d commented on May 29, 2024

@jsteinberg-rbi
Demucs should run GPU. I think this is not possible with ffmpeg, but perhaps there is a possibility I ignore, especially for some features.
What did you do to get it working?

from whisperhallu.

jsteinberg-rbi avatar jsteinberg-rbi commented on May 29, 2024

@EtienneAb3d The file I was testing with initially was a 4GB file and it would just spin forever. When I switched to a 2GB it ran in under 10 minutes :)

Question for you: so I ran your script over 30 files last night. Which one of these files has the silence removed?

230821_0020S12.wav
230821_0020S12.wav.WAV.wav
230821_0020S12.wav.WAV.wav.bass.wav
230821_0020S12.wav.WAV.wav.drums.wav
230821_0020S12.wav.WAV.wav.log
230821_0020S12.wav.WAV.wav.other.wav
230821_0020S12.wav.WAV.wav.vocals.wav
230821_0020S12.wav.WAV.wav.vocals.wav.SILCUT.wav
230821_0020S12.wav.WAV.wav.vocals.wav.SILCUT.wav.VAD.wav
230821_0020S12.wav.WAV.wav.vocals.wav.SILCUT.wav.log

from whisperhallu.

EtienneAb3d avatar EtienneAb3d commented on May 29, 2024

@jsteinberg-rbi
SILCUT = Silence Cut

from whisperhallu.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.