Comments (4)
Thanks for the note @coder543! medium
models were hitting an edge case with the Neural Engine that we triaged away for now. Technically, you can still use https://github.com/argmaxinc/whisperkittools to prepare the medium
and medium.en
model assets and use them with cpuAndGPU
compute units without issues. We decided not to fork availability of models across different compute units and preserve a non-leaky abstraction for seamless switching.
from whisperkit.
Gotcha, that is unfortunate, since in my extensive testing of other Whisper apps on iPhone, the Medium model is the best one that can realistically run in real time over long durations. But, small is pretty good too, I guess!
from whisperkit.
@coder543 Have you noticed large
models being too slow? Would be great to get an example audio/video where it falls back in streaming mode on iPhone 12+. We are always looking to improve based on feedback and we can follow up when we improve performance.
from whisperkit.
On the 15 Pro Max that I have, the large models run at an RTF of slightly greater than 1, and they’re just slow in general. The medium models are half the size, so they are just about perfect. When I’ve tested things more in Hello Transcribe over the months, the large models are tolerable and seem to barely keep up with real time… but I prefer the balance that medium provides, if I’m running a model on my phone. (On a powerful desktop, the large models are great.)
I didn’t spend too much time trying the distil models, but I’ve had mixed feeling about the accuracy of those models in past testing.
from whisperkit.
Related Issues (20)
- Standard output while processing. HOT 4
- Can a local model be used without requesting the Hugging Face API? HOT 3
- How do I use a parameter like initial_prompt in Python's Whisper? HOT 1
- When my Mac connects to AirPods, starting recording fails. HOT 6
- Problems with "base" model HOT 4
- Audio input captures only the first channel HOT 1
- Instructions for running the cli version? HOT 3
- how to fix "Ambiguous use of 'transcribe(audioPath:decodeOptions:callback:)'" HOT 3
- Experiencing crash on iPad8,8.
- Add version support
- Is it possible to run turbo model on M1? HOT 1
- VAD: Finishes too early (almost empty transcript) with VAD enabled, completes successfully without. HOT 1
- VAD: Progress reporting doesn't report evenly when VAD is active
- VAD: First time loading a file it works, second and third time loading the same files it just blanks out HOT 1
- VAD issue with English-only models HOT 5
- Publish in CocoaPods HOT 2
- Is it possible to add a TranscriptionSegment callback? HOT 4
- Incorrect word timestamp when using VAD HOT 1
- Prompt string being returned as transcription result
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from whisperkit.