Comments (5)
Supplementary:
It works with DynamicCache.
So it must be something wrong with SinkCache and relevant control code.
from transformers.
It's been a bit since I worked on this, but I think that -self.window_length + self.num_sink_tokens + key_states.shape[-2] >= 0
is not really possible.
window_length
is the max. size of the cache, e.g. 1024.num_sink_tokens
is some (usually small) positive integer, e.g. 4key_states.shape[-2]
is the size of the new additions to the cache.
In the code here:
transformers/src/transformers/cache_utils.py
Lines 703 to 706 in b72752f
We're in the "Shifting cache" phase, i.e. the cache already exists, and now we're adding enough tokens to make it overflow. However, if it already exists, then I think (I'm not 100% on this) we always add 1 new generated token, i.e. key_states.shape[-2]
is 1. So I think a non-negative value can only happen if the num_sink_tokens >= window_length - 1
, which is not normal behaviour.
However, if it's somehow possible to, when the cache already exists, add a bunch of tokens in one go, then I think it would be possible to mess this up. Then, the keys_to_keep
should really be empty (as we're skipping way ahead and keeping no tokens), but the overflow of -self.window_length + self.num_sink_tokens + key_states.shape[-2] >= 0
into the positives is allowing some keys to stay. Then the new tokens will get appended and we'll accidentally get a cache that's too large here:
transformers/src/transformers/cache_utils.py
Line 724 in b72752f
But I think that should probably cause a pretty easy-to-spot crash as the cache is now bigger than the window size, which should not be possible.
- Tom Aarsen
from transformers.
from transformers.
Have not worked on the sink cache so will let @gante answer here!
from transformers.
In cache_utils.py, I noticed that
keys_to_keep = self.key_cache[layer_idx][ :, :, -self.window_length + self.num_sink_tokens + key_states.shape[-2] : ]
might go wrong when -self.window_length + self.num_sink_tokens + key_states.shape[-2] >= 0
Not sure is it relevant
from transformers.
Related Issues (20)
- GroundingDino - Loss calculation exceptions HOT 3
- `stop_strings` Argument in `model.generate()` Results in Exception if Generation Completes Without `stop_string` Being Generated HOT 6
- GemmaTokenizerFast word_ids() returns only zeros HOT 1
- Tokenizers: Character encoding inconsistencies between __call__ and .convert_tokens_to_ids HOT 1
- Memory leak when using CLIPTextModel HOT 2
- Add `StatefulDataLoader` support HOT 7
- How to build and evaluate a vanilla transformer? HOT 1
- Can't create transformer pipeline because pytorch failed to be detected HOT 8
- Trainer having issues with DataLoaderShard when running with torchrun
- cannot import name 'AutoModelForImageToImage' from 'transformers.models.auto.modeling_auto' (/opt/conda/lib/python3.10/site-packages/transformers/models/auto/modeling_auto.py) HOT 1
- linear_sum_assignment error in the object_detection.py guide HOT 2
- A parameter in TrainingArguments: sample_output=True HOT 2
- Mixtral's implementation of auxiliary loss seems incorrect
- LlavaNextProcessor.__init__() got an unexpected keyword argument 'image_token' HOT 5
- Gemma-7b model set my own lm_head but cannot saved and changed pretrained embedding_layer's weights too. HOT 1
- Attention dropout causing problem in attention score distribution
- llama3 with torch.compile used more memory HOT 6
- Quantization support for heads and embeddings HOT 6
- How do I replace a spare tokens? HOT 3
- ValueError: The checkpoint you are trying to load has model type `zoedepth` but Transformers does not recognize this architecture HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from transformers.