Giter Club home page Giter Club logo

Comments (10)

lucidrains avatar lucidrains commented on May 22, 2024 1

@bratao if you are curious what my recommendation is, it would be https://github.com/lucidrains/routing-transformer or https://github.com/lucidrains/compressive-transformer-pytorch for now

from performer-pytorch.

bratao avatar bratao commented on May 22, 2024 1

Wow, I loved it. Thanks.

Sorry for the boldness, but I have to take the opportunity to ask an expert like you. If you don´t mind... @lucidrains If you are busy, feel free to ignore my question.

I´m looking for using transformers for Sequence Labeling on very large, almost infinite, documents. This document is composed of many smaller documents, that have some kind of structured information. I model this as a sequence labeling. You can see an example below.
image

A simple first order CRF can get a very good performance, such as 0.98 of f1 , but miss some examples that required a longer context.
I tried LSTM, GRU and SRU with an CRF and can get a good performance, but lower. SRU got 0.96 on my dataset.

Than I moved to transformers. I setup a search that train more than 12 versions of transformers, such as all of your Transformers, TENER, Transformer-xl, R-transformer, Attention-span transformer and Linear Transformers ( from https://github.com/idiap/fast-transformers). But the best result so far is Linear Transformers with an F1 of 0.33. I train with a moving window of 8192 tokens.

Do you have any recommendation or a pointer to where to look for this problem?

Thank you so much, and sorry to bother you.

from performer-pytorch.

bratao avatar bratao commented on May 22, 2024 1

Thank you so much @lucidrains ❤️
I will look closer at compressive and routing. I will also let this hyperparam search run for a little while to see if there is any hope with transformers.

But following your advice, I will focus RNN alternatives such as IndLSTM and QRNN.

from performer-pytorch.

lucidrains avatar lucidrains commented on May 22, 2024 1

@bratao they are not mutually exclusive! try a bidirectional LSTM in earlier layers, followed by attention near the top. Guarantee you'll see gain :)

from performer-pytorch.

lucidrains avatar lucidrains commented on May 22, 2024

@bratao you will like this! https://openreview.net/forum?id=qVyeW-grC2k

Performer isn't quite ready yet. So linear attention variants come with some drawbacks, often swept underneath the rug (as most papers do). There is a memory cost in the autoregressive case. The authors use Jax to solve this issue, but for Pytorch, I may need to reach for CUDA code. Also, the quadratic issue still exists, it's just shifted to the head embedding dimension. I would avoid using this repository until that is solved!

from performer-pytorch.

lucidrains avatar lucidrains commented on May 22, 2024

@bratao how big is your dataset? how many tokens is your average document?

from performer-pytorch.

bratao avatar bratao commented on May 22, 2024

@lucidrains My dataset have approximately 150 documents, with 12k tokens as average. I´m using a training with a batch of size 3 ( it is what fits the T4 GPU) with the optimizer as a search hyperparam that picks between Rancher, SGD, Adam.

from performer-pytorch.

lucidrains avatar lucidrains commented on May 22, 2024

@bratao ahhh, I see, you are in the low-data regime. I think it is best to pull a pre-trained model off the shelf from Huggingface, as I think training a sparse attention network from scratch on such low amounts of data is not enough. Attention is all we need, given enough data and compute :)

from performer-pytorch.

lucidrains avatar lucidrains commented on May 22, 2024

@bratao if you are interested in pre-training a sparse attention network yourself, you can use https://github.com/lucidrains/electra-pytorch , which is the most effective way to do so as of today

from performer-pytorch.

bratao avatar bratao commented on May 22, 2024

Nice, I will try it and report back as soon I get the results!

from performer-pytorch.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.