I might have found a slight mistake in the visualization of the transformer architectu

GPT-2 architecture about llms-from-scratch HOT 3 CLOSED

d-kleine commented on August 16, 2024

GPT-2 architecture

from llms-from-scratch.

Comments (3)

rasbt commented on August 16, 2024 1

Thanks for the note, but I think my figure is correct. The best source for this would be the original GPT-2 model implementation, which you can find here:

https://github.com/openai/gpt-2/blob/9b63575ef42771a015060c964af2c3da4cf7c8ab/src/model.py#L123-L130

def block(x, scope, *, past, hparams):
    with tf.variable_scope(scope):
        nx = x.shape[-1].value
        a, present = attn(norm(x, 'ln_1'), 'attn', nx, past=past, hparams=hparams)
        x = x + a
        m = mlp(norm(x, 'ln_2'), 'mlp', nx*4, hparams=hparams)
        x = x + m
        return x, present

Unless I am reading this incorrectly, the shortcut path starts before not after LayerNorm.

from llms-from-scratch.

d-kleine commented on August 16, 2024 1

Yes, I think you're right, including about the official code. For Pre-LN, the residual connection starts with the shortcut before the layer normalization step, which is nicely illustrated in your figure here and the official illustration of Pre-LN on the far right side:

Apologies - I have only found a few graphics illustrating the GPT-2 architecture, and it seems like all of them were incorrect about the Pre-LN step (even on Wikipedia, surprisingly).

Thanks again, and issue can be closed then 🙂

from llms-from-scratch.

rasbt commented on August 16, 2024 1

Nice, I am glad that it all makes sense now (and also looks correct haha). Thanks for raising this issue though, it's always good to have multiple sets of eyes on these things!

from llms-from-scratch.

GPT-2 architecture about llms-from-scratch HOT 3 CLOSED

Comments (3)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent