Giter Club home page Giter Club logo

Comments (3)

fpgaminer avatar fpgaminer commented on May 14, 2024 1

GPT models use masked attention that ensure their predictions are based only on prior tokens:

att = att.masked_fill(self.mask[:,:,:T,:T] == 0, float('-inf'))

Conceptually a GPT model would be evaluated like so:

for i in range(len(x)):
    input = x[:i]
    target = y[i]
    output = model(input)
    loss += loss_func(output, target)

For example, its prediction for the 4th output token is based on the first four input tokens. So it's trying to predict 0 based on just 4, 7, 1, 7.

In practice though the architecture of GPT-like models allow one to compute that entire for loop in a single pass, using masking to ensure that each "column" of the model can only "see" previous tokens.

from mingpt.

ravi-annaswamy avatar ravi-annaswamy commented on May 14, 2024

@fpgaminer Good catch! Thank you.

It was a miss on my part. I knew that GPT models use masked attention (since they are decoder-only models), but since I was processing this addition problem as a seq2seq formulation, the blindness was on me! So this is a nonissue.

Thanks for clarifying immediately. I would like to keep the issue for a couple of hours, just for illustration to others who might have not recognized this fact, then I will close it.

Thanks
Ravi

from mingpt.

ravi-annaswamy avatar ravi-annaswamy commented on May 14, 2024

Non-issue, since gpt uses masked attention that always hides token from the future.

from mingpt.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.