Comments (6)
As I mentioned in #8, I think some extra performance could be gained by properly implementing the SIMD routines in ggml.c
for the Android Arm architecture:
I don't think that the current implementation is optimal - it was just something that I hacked to make it run on RPi4.
But in any case - this will probably lead to a few 10s of percent improvement at best. Not sure what is your expectation for these devices. The large model inference will always be at least an order of magnitude longer than the audio length on a phone device.
I don't know what GPUs are available available on modern mobile devices, but I don't plan on supporting them. Usually, it involves using some complex framework (e.g. CUDA, OpenCL, Metal, etc) and it take a lot of expertise and experience to utilize these efficiently.
Regarding the algorithm improvement:
The inference algorithm consists of relatively simple operations with matrices and vectors. Not much can be done to improve the algorithm, other than implementing efficiently these basic operations. The common strategies are:
- SIMD implementation
- cache-aware data storage
- multi-threading
- packing data into smaller elements (for example 32-bit float into 16-bit or 8-bit float / int) at the cost of some precision
I have an idea for reducing the memory for the large model even further that I want to experiment with at some point, but most likely it will fail. So, I don't think the algorithm can be improved in any significant way.
from whisper.cpp.
An alternative solution could be to retrain a small model for a data domain, a specific task, or a specific language.
I think that you want to use a large model just because of its quality. Therefore, it is possible that it is worth considering additional training options for smaller models, but train them so that their quality is satisfactory.
But I can say that with Whisper it will not be an easy task now, since there are no official scripts for training or pre-training. But there are a couple of some written by craftsmen.
Now I'm trying a solution combined from two such scripts, but so far the quality is even worse compared to the original. On the most pre-training dataset, CER decreases, but on real examples, recognition becomes worse.
from whisper.cpp.
@ekorudi You might want to give a try using the latest master
branch - it is possible that the performance is a bit better now. Let me know if there is a build issue.
from whisper.cpp.
Success on compile for Android
$ make
/ok/whisper/arm-compiler/bin/aarch64-linux-android28-clang -O3 -std=c11 -Wall -Wextra -Wno-unused-parameter -Wno-unused-function -pthread -mavx -mavx2 -mfma -mf16c -c ggml.c
clang80: warning: argument unused during compilation: '-mavx' [-Wunused-command-line-argument]
clang80: warning: argument unused during compilation: '-mavx2' [-Wunused-command-line-argument]
clang80: warning: argument unused during compilation: '-mfma' [-Wunused-command-line-argument]
clang80: warning: argument unused during compilation: '-mf16c' [-Wunused-command-line-argument]
/ok/whisper/arm-compiler/bin/aarch64-linux-android28-clang++ -O3 -std=c++11 -Wall -Wextra -Wno-unused-parameter -Wno-unused-function -pthread -c whisper.cpp
/ok/whisper/arm-compiler/bin/aarch64-linux-android28-clang++ -O3 -std=c++11 -Wall -Wextra -Wno-unused-parameter -Wno-unused-function -pthread main.cpp whisper.o ggml.o -o main -static-libstdc++
Error on compile for Linux Intel
$ make
cc -O3 -std=c11 -Wall -Wextra -Wno-unused-parameter -Wno-unused-function -pthread -mavx -mavx2 -mfma -mf16c -c ggml.c
ggml.c:189:36: error: initializer element is not constant
const size_t CACHE_LINE_SIZE_F32 = CACHE_LINE_SIZE/sizeof(float);
^~~~~~~~~~~~~~~
Makefile:60: recipe for target 'ggml.o' failed
make: *** [ggml.o] Error 1
from whisper.cpp.
I changed Makefile
to simpler one to make it work :
gcc="/ok/whisper/arm-compiler/bin/aarch64-linux-android28-clang"
gpp="/ok/whisper/arm-compiler/bin/aarch64-linux-android28-clang++"
main: ggml.o whisper.o main.o
$(gpp) -pthread -o main ggml.o main.o whisper.o -static-libstdc++
adb push main /data/local/tmp/main
ggml.o: ggml.c ggml.h
$(gcc) -pthread -O3 -c ggml.c -mcpu=cortex-a75
main.o: main.cpp ggml.h
$(gpp) -pthread -O3 -std=c++11 -c main.cpp
whisper.o: whisper.cpp whisper.h
$(gpp) -pthread -O3 -std=c++11 -c whisper.cpp
# clean up the directory
clean:
rm -f *.o main
from whisper.cpp.
Error on compile for Linux Intel
Has same issue, then trying compile on server.
Solved installed gcc-9 (previos gcc-7).
from whisper.cpp.
Related Issues (20)
- Project build for examples/stream for Windows (CUDA 12.1)
- Trying to compile for hipblas/ROCm with cmake causes a strange error HOT 2
- Passing --diarize result always in (speaker ?) HOT 1
- It seems that there is no performance gain utilizing Core ML HOT 2
- OpenCL error on Linux HOT 1
- Options for Simultaneous Live Diarised Headset Input and Output transcription (on Windows) eg Cubeb instead of SDL2?
- Short audio files not decoded (empty output) HOT 3
- M1 Max, Whisper CPP 1.5.5 has increased CPU usage HOT 3
- Fake model already exists error
- OpenBLAS v0.3.27 crashing where v0.3.26 doesn't (Windows) HOT 3
- 1.5.5 regression test failure HOT 1
- Problem while building Flutter/Dart binding
- `models/download-coreml-model.sh` didn't work
- Docker "no matching manifest for `linux/arm64`"
- `make stream` failed because of the problem of ‘SDL_HINT_AUDIO_RESAMPLING_MODE’ was not declared in this scope
- Undefined symbols for architecture arm64: "_MTLCopyAllDevices" HOT 1
- Problem creating node addon HOT 2
- [Feature Request] Any plans for translation using OpenAI instead of DeepL? HOT 1
- Feature request - Support WhisperSpeech for voice generation with whisper model HOT 1
- Problem compiling addon.node (+solution) HOT 7
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from whisper.cpp.