Comments (7)
Hi @kevin01881 and thanks for the kind words.
Btw, testing on AMD CPUs I find that whisper.cpp
performance is comparable (maybe slightly faster) with the stock PyTorch implementation. Just make sure to run the PyTorch version with the Greedy decoder to make things even. I don't have an Intel CPU though, so not sure how it compares there.
But yeah, on M1 I think we still have a big edge - probably 2 or 3 times faster (I haven't done a proper benchmark yet).
Probably this will be the case until PyTorch has proper support for Arm processors.
Btw, on this note, someone reported that on M1 Max it is efficient to split the job into multiple runs with fewer threads [0].
I guess, we should have a built-in option in whisper.cpp
to split the job in N
tasks and run the multiple inferences - similar to what @ArtyomZemlyak did earlier in this thread.
[0] openai/whisper#208 (reply in thread)
from whisper.cpp.
Interesting drop performance for t > 8
from whisper.cpp.
Interesting drop performance for t > 8
Yes, i've noticed that. I have 2 guesses:
- The computation is memory bound so at some point increasing the number of threads does not help because the memory bandwidth is full
- I have a problem in my thread synchronization implementation - currently, I use "busy-waiting" on atomic variables which you probably noticed keeps the CPU's at 100% all the time. This is much faster compared to locking mutexes. However, I am not sure if it has some negative side effects for large number of threads. Needs some investigation
The last section V3
is surprising - I don't expect the encode time to be different for different files, given that they are the same length. Something is not right there.
The "parallel" idea is very interesting - I never realised that we can split the file in chunks and run multiple whisper.cpp
processes in parallel. This might be a very efficient approach for multi-core systems.
Can you provide some more information about your parallel approach? How did you split the audio?
I think we have to provide an offset
argument to main
to be able to start the transcription at different start offset of the audio file.
from whisper.cpp.
In my previos example its just parallel jobs in bash script:
start=$SECONDS
export MODEL=tiny
# export MODEL=base
# export MODEL=small
# export MODEL=large
export THREADS=4
./main --language ru -t $THREADS -m ../models/ggml-model-$MODEL.bin -f ../audio/cuker1.wav &
./main --language ru -t $THREADS -m ../models/ggml-model-$MODEL.bin -f ../audio/cuker2.wav &
./main --language ru -t $THREADS -m ../models/ggml-model-$MODEL.bin -f ../audio/cuker_frag1.wav &
./main --language ru -t $THREADS -m ../models/ggml-model-$MODEL.bin -f ../audio/gokov1.wav &
./main --language ru -t $THREADS -m ../models/ggml-model-$MODEL.bin -f ../audio/gokov2.wav &
./main --language ru -t $THREADS -m ../models/ggml-model-$MODEL.bin -f ../audio/fragmen1t.wav &
./main --language ru -t $THREADS -m ../models/ggml-model-$MODEL.bin -f ../audio/very_bad_sample.wav &
wait
duration=$(( SECONDS - start ))
echo ""
echo "TOTAL_TIME:"
echo $duration
But if we need same effect on real audio, we can try to use 2 approaches:
- VAD - voice activity detection. Find all chunks, where voice exist.
- Split finded chunks to little chunks (if they long, > 30 s) and put them to different processes.
But we need synhronize time for output - need remeber timings of chunks and add this timings to resulted output.
from whisper.cpp.
Or we just can run multiple apps for whisper.cpp
- just process multiple audio files in one time. If we dont need fastest recognition of one file, but need a lot of AudioSeconds recognized for ProcessingHour
from whisper.cpp.
@ggerganov Thanks very much sir for making whisper.cpp!! It is pure insanity that I can run a model that requires 12 GB of VRAM, on my ultra-slow PC that is pushing 8 years old (i7-5500U). You are a wizard.
This shows how most of todays models are written very poorly as far as efficiency goes. Truly makes one wonder what else we could be running on CPU's that currently requires RTX 3090's or even T4/A100's.
So far, I succesfully ran on this ancient computer: Facebook Research Demucs (stock, no optimized port), Stable Diffusion (openvino port), and thanks to your C++ port now Whisper as well.
from whisper.cpp.
@ArtyomZemlyak Careful with the output you get when fragmenting audio for parallel inference jobs.
See openai/whisper#440
cc @ggerganov
from whisper.cpp.
Related Issues (20)
- Problem creating node addon HOT 2
- [Feature Request] Any plans for translation using OpenAI instead of DeepL? HOT 1
- Feature request - Support WhisperSpeech for voice generation with whisper model HOT 1
- Problem compiling addon.node (+solution) HOT 7
- ci: windows-msys2 CLANG64 builds are failing HOT 1
- CPU Performance Regression? (Older version much faster) HOT 13
- ci: emscripten builds are failing with Emscripten SDK 3.1.58 HOT 1
- Stream: noise ouput
- Spam Attack HOT 2
- Ubuntu 22.04 - tested commit 8fac645 - microphone is not passing audio to talk-llama , older builds ( from a month passing microphone audio ) HOT 2
- MSVC static runtime library
- The path to metal files is not validated when whisper.cpp is used as a subproject
- Disabling WHISPER_LOG_INFO HOT 2
- Unable to generate the large-v3 CoreML model
- chinese characters not showing up on windows HOT 2
- When transcribing Chinese audio, using whisper_full_get_segment_text can return the correct text, but using whisper_full_get_token_text might result in NULL.
- main.exe does nothing on transcribe task (crash probably) HOT 10
- Read write protected Memory Exception When Try to use Timestamp DTW. HOT 1
- Windows Large V3 Malloc 4GB Limitation HOT 3
- Neural Engine, CoreML not utilized for Apple Silicon
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from whisper.cpp.