Giter Club home page Giter Club logo

Comments (17)

lucidrains avatar lucidrains commented on June 2, 2024 1

@pfeatherstone that's from the author of alibi. of course they would say it is great

from x-transformers.

lucidrains avatar lucidrains commented on June 2, 2024

@pfeatherstone yea i could make it work

however, i think alibi is really bad and probably should be removed. i would never use it for any serious model

from x-transformers.

lucidrains avatar lucidrains commented on June 2, 2024

@pfeatherstone maybe i'll just make it all work as a personal challenge. i can also fix dynamic pos bias in the presence of memory as well

from x-transformers.

pfeatherstone avatar pfeatherstone commented on June 2, 2024

Basically i found rotary positional embedding to not length-extrapolate at all. Like, it's really bad. I'm doing some tests with XPOS and though I haven't got a complete model yet (still training), it looks a bit better. However I'm nervous about limiting the context length.

from x-transformers.

pfeatherstone avatar pfeatherstone commented on June 2, 2024

@pfeatherstone yea i could make it work

however, i think alibi is really bad and probably should be removed. i would never use it for any serious model

is this based on personal tests? According to the paper it's the best thing since sliced bread

from x-transformers.

lucidrains avatar lucidrains commented on June 2, 2024

Basically i found rotary positional embedding to not length-extrapolate at all. Like, it's really bad. I'm doing some tests with XPOS and though I haven't got a complete model yet (still training), it looks a bit better. However I'm nervous about limiting the context length.

that's well known, i think i even mention it in the readme. however, there's a lot of research going into fine tuning trained rotary models to longer context, so it is not a big deal

from x-transformers.

lucidrains avatar lucidrains commented on June 2, 2024

i wouldn't use xpos either.. it suffers from the same issues as alibi. i really should start removing features i no longer believe in

from x-transformers.

lucidrains avatar lucidrains commented on June 2, 2024

@pfeatherstone yea i could make it work
however, i think alibi is really bad and probably should be removed. i would never use it for any serious model

is this based on personal tests? According to the paper it's the best thing since sliced bread

which paper?

from x-transformers.

pfeatherstone avatar pfeatherstone commented on June 2, 2024

https://ofir.io/train_short_test_long.pdf, the one you reference in your readme. I have to admit I haven't read it in great detail but they suggest AliBI is great.

from x-transformers.

pfeatherstone avatar pfeatherstone commented on June 2, 2024

Basically I need a positional embedding that length-extrapolates well, works with memories, and flash attention. Do you have any suggestions?

from x-transformers.

lucidrains avatar lucidrains commented on June 2, 2024

Basically I need a positional embedding that length-extrapolates well, works with memories, and flash attention. Do you have any suggestions?

these days, i would stick with rotary, given the amount of research now going into it

curriculum learn to longer sequence lengths while tuning the rotary theta value (and whatever new tricks recent papers have discovered)

from x-transformers.

pfeatherstone avatar pfeatherstone commented on June 2, 2024

What do you mean by curriculum learn to longer sequence lengths? Sorry if my questions are dumb.

from x-transformers.

lucidrains avatar lucidrains commented on June 2, 2024

@pfeatherstone ah, curriculum learning is just a fancy way of saying making training increasingly harder over time, like how you design a curriculum for a student. so start with a small sequence length and slowly increase to your desired length

from x-transformers.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.