Comments (3)
Note that the Core ML decoder is unable to do a forward pass with multiple tokens in one forward pass at the moment so we need to "decode the prompt" one 1 token at a time. This will likely result in a slowdown in the short term for long prompts. When we bring up the MLX backend, it shouldn't be a problem at all.
from whisperkit.
This is the only remaining feature before we are feature-complete with respect to the OpenAI API. We will implement this before 1.0. Thank you for bringing this up!
from whisperkit.
Yep, good callout @ldenoue, this is definitely needed for parity, and we have been tracking todos for when it is available.
We built a look-up table to address this for the common task and language combinations which is the usePrefillCache
option, but arbitrary text prompts will require either generating the cache 1 token at a time like @atiorh mentioned, or a new model that can generate prompt caches in a single forward pass, which will likely come from integrating MLX #33. See this thread for a similar discussion of the issue huggingface/transformers#23845 (comment)
In the meantime the simplest way to go about this would be opening up the prefill prompt tokens to be set via DecodingOptions
directly, which would enable arbitrary prompts including startofprev and custom vocabulary words as you requested, but would require a forward pass for each token, what do you think?
from whisperkit.
Related Issues (20)
- Indeterminate visionOS tests HOT 3
- Enable word timestamps for distil-large-v3 HOT 1
- Speculative decoding support with Eager streaming mode
- Disallow invalid `--language` values HOT 1
- Use `config.json` for device support filtering
- Incorrect timestamps (0.5sec off) HOT 7
- When transcribing non english audio files, I get results always translated in english :( Even though it's correct but not in the original language. HOT 1
- Clarify the translation capabilities in sample App
- Major difference with whisper.cpp? HOT 4
- How to use custom prompts? Couldn't find the usage from the examples. HOT 2
- English text normalization utilization for Eager Streaming Mode HOT 1
- @atiorh Today, I tested the latest version of the WhisperAX app consistently crashes when loading the 'openai_whisper-large-v3_947MB' file. HOT 1
- Implement test data-driven `unsupportedModelDeviceCombination` at init
- Standard output while processing. HOT 4
- Can a local model be used without requesting the Hugging Face API? HOT 3
- How do I use a parameter like initial_prompt in Python's Whisper? HOT 1
- When my Mac connects to AirPods, starting recording fails. HOT 6
- ~/Documents/huggingface directory gets created even when using `--model-path` in `whisperkit-cli` HOT 2
- Implement async batch predictions HOT 1
- Issue with languages other than English HOT 13
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from whisperkit.