Giter Club home page Giter Club logo

Comments (7)

ggerganov avatar ggerganov commented on May 12, 2024 1

Hi @kevin01881 and thanks for the kind words.
Btw, testing on AMD CPUs I find that whisper.cpp performance is comparable (maybe slightly faster) with the stock PyTorch implementation. Just make sure to run the PyTorch version with the Greedy decoder to make things even. I don't have an Intel CPU though, so not sure how it compares there.

But yeah, on M1 I think we still have a big edge - probably 2 or 3 times faster (I haven't done a proper benchmark yet).
Probably this will be the case until PyTorch has proper support for Arm processors.

Btw, on this note, someone reported that on M1 Max it is efficient to split the job into multiple runs with fewer threads [0].
I guess, we should have a built-in option in whisper.cpp to split the job in N tasks and run the multiple inferences - similar to what @ArtyomZemlyak did earlier in this thread.

[0] openai/whisper#208 (reply in thread)

from whisper.cpp.

ArtyomZemlyak avatar ArtyomZemlyak commented on May 12, 2024

Interesting drop performance for t > 8

from whisper.cpp.

ggerganov avatar ggerganov commented on May 12, 2024

Interesting drop performance for t > 8

Yes, i've noticed that. I have 2 guesses:

  • The computation is memory bound so at some point increasing the number of threads does not help because the memory bandwidth is full
  • I have a problem in my thread synchronization implementation - currently, I use "busy-waiting" on atomic variables which you probably noticed keeps the CPU's at 100% all the time. This is much faster compared to locking mutexes. However, I am not sure if it has some negative side effects for large number of threads. Needs some investigation

The last section V3 is surprising - I don't expect the encode time to be different for different files, given that they are the same length. Something is not right there.

The "parallel" idea is very interesting - I never realised that we can split the file in chunks and run multiple whisper.cpp processes in parallel. This might be a very efficient approach for multi-core systems.
Can you provide some more information about your parallel approach? How did you split the audio?

I think we have to provide an offset argument to main to be able to start the transcription at different start offset of the audio file.

from whisper.cpp.

ArtyomZemlyak avatar ArtyomZemlyak commented on May 12, 2024

In my previos example its just parallel jobs in bash script:

start=$SECONDS

export MODEL=tiny
# export MODEL=base
# export MODEL=small
# export MODEL=large

export THREADS=4

./main  --language ru -t $THREADS -m ../models/ggml-model-$MODEL.bin -f ../audio/cuker1.wav &
./main  --language ru -t $THREADS -m ../models/ggml-model-$MODEL.bin -f ../audio/cuker2.wav &
./main  --language ru -t $THREADS -m ../models/ggml-model-$MODEL.bin -f ../audio/cuker_frag1.wav &
./main  --language ru -t $THREADS -m ../models/ggml-model-$MODEL.bin -f ../audio/gokov1.wav &
./main  --language ru -t $THREADS -m ../models/ggml-model-$MODEL.bin -f ../audio/gokov2.wav &
./main  --language ru -t $THREADS -m ../models/ggml-model-$MODEL.bin -f ../audio/fragmen1t.wav &
./main  --language ru -t $THREADS -m ../models/ggml-model-$MODEL.bin -f ../audio/very_bad_sample.wav &

wait

duration=$(( SECONDS - start ))

echo ""
echo "TOTAL_TIME:"
echo $duration

But if we need same effect on real audio, we can try to use 2 approaches:

  1. VAD - voice activity detection. Find all chunks, where voice exist.
  2. Split finded chunks to little chunks (if they long, > 30 s) and put them to different processes.

But we need synhronize time for output - need remeber timings of chunks and add this timings to resulted output.

from whisper.cpp.

ArtyomZemlyak avatar ArtyomZemlyak commented on May 12, 2024

Or we just can run multiple apps for whisper.cpp - just process multiple audio files in one time. If we dont need fastest recognition of one file, but need a lot of AudioSeconds recognized for ProcessingHour

from whisper.cpp.

kevin01881 avatar kevin01881 commented on May 12, 2024

@ggerganov Thanks very much sir for making whisper.cpp!! It is pure insanity that I can run a model that requires 12 GB of VRAM, on my ultra-slow PC that is pushing 8 years old (i7-5500U). You are a wizard.

This shows how most of todays models are written very poorly as far as efficiency goes. Truly makes one wonder what else we could be running on CPU's that currently requires RTX 3090's or even T4/A100's.

So far, I succesfully ran on this ancient computer: Facebook Research Demucs (stock, no optimized port), Stable Diffusion (openvino port), and thanks to your C++ port now Whisper as well.

from whisper.cpp.

i-am-neo avatar i-am-neo commented on May 12, 2024

@ArtyomZemlyak Careful with the output you get when fragmenting audio for parallel inference jobs.
See openai/whisper#440

cc @ggerganov

from whisper.cpp.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.