Comments (17)
@pfeatherstone that's from the author of alibi. of course they would say it is great
from x-transformers.
@pfeatherstone yea i could make it work
however, i think alibi is really bad and probably should be removed. i would never use it for any serious model
from x-transformers.
@pfeatherstone maybe i'll just make it all work as a personal challenge. i can also fix dynamic pos bias in the presence of memory as well
from x-transformers.
Basically i found rotary positional embedding to not length-extrapolate at all. Like, it's really bad. I'm doing some tests with XPOS and though I haven't got a complete model yet (still training), it looks a bit better. However I'm nervous about limiting the context length.
from x-transformers.
@pfeatherstone yea i could make it work
however, i think alibi is really bad and probably should be removed. i would never use it for any serious model
is this based on personal tests? According to the paper it's the best thing since sliced bread
from x-transformers.
Basically i found rotary positional embedding to not length-extrapolate at all. Like, it's really bad. I'm doing some tests with XPOS and though I haven't got a complete model yet (still training), it looks a bit better. However I'm nervous about limiting the context length.
that's well known, i think i even mention it in the readme. however, there's a lot of research going into fine tuning trained rotary models to longer context, so it is not a big deal
from x-transformers.
i wouldn't use xpos either.. it suffers from the same issues as alibi. i really should start removing features i no longer believe in
from x-transformers.
@pfeatherstone yea i could make it work
however, i think alibi is really bad and probably should be removed. i would never use it for any serious modelis this based on personal tests? According to the paper it's the best thing since sliced bread
which paper?
from x-transformers.
https://ofir.io/train_short_test_long.pdf, the one you reference in your readme. I have to admit I haven't read it in great detail but they suggest AliBI is great.
from x-transformers.
Basically I need a positional embedding that length-extrapolates well, works with memories, and flash attention. Do you have any suggestions?
from x-transformers.
Basically I need a positional embedding that length-extrapolates well, works with memories, and flash attention. Do you have any suggestions?
these days, i would stick with rotary, given the amount of research now going into it
curriculum learn to longer sequence lengths while tuning the rotary theta value (and whatever new tricks recent papers have discovered)
from x-transformers.
What do you mean by curriculum learn to longer sequence lengths? Sorry if my questions are dumb.
from x-transformers.
@pfeatherstone ah, curriculum learning is just a fancy way of saying making training increasingly harder over time, like how you design a curriculum for a student. so start with a small sequence length and slowly increase to your desired length
from x-transformers.
Related Issues (20)
- how to set inputs to the right shape HOT 1
- kv cache breaks generation HOT 5
- Question: How to load model trained on earlier version of x-transformers HOT 3
- Enhancement: Multi Input/Output transformers HOT 1
- XL-recurrence with RotaryEmbedding and mems not working correctly. HOT 34
- Removed biases breaks pre-trained models HOT 5
- Seq len missing in rotary embedding HOT 3
- Adding memmask to ContinuousTransformerWrapper HOT 3
- attn_num_mem_kv > 0 and attn_one_kv_head = True error HOT 8
- Question: How to implement rel_pos_bias in cross_attention? HOT 13
- How to build optimizer HOT 9
- [Minor; noob question] Uniform distribution instead of normal
- RotaryEmbedding XPOS doesn't work with mems HOT 5
- `layer_mem` is unbound (when called from `ContinuousTransformerWrapper`) HOT 6
- Generation for PaLI?
- Confusion about image->caption example HOT 1
- How can I add custom attention masks to a Decoder? HOT 3
- Question: rotary embeddings and bad length extrapolation HOT 1
- [Question] very small attention scores HOT 7
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from x-transformers.