Document understading of LLMs
- Implement the transformer paper - (ref: https://www.youtube.com/watch?v=kCc8FmEb1nY&t=2215s)
Tasks
- Implement Bigram Model
- Implement self attention logic in adhoc fashion
- Implement self attention logic as part of the Bigram model
- Implement a single multi head attention block as part of bigram model
- Implement multiple layers of multi head attention block as part of bigram model
- add feed forward NN
- add residual connection
- add layer norm
- Check this implementation with implementation by Karparthy Sab
- List the differences and next steps to think about here.
- dropout layers need to be included
- linear layer missing in multihead attention class
- length of linear layer in multihead attention class needs to scaled up (4 x head_size)
- projection layer needs to be added to bring 4 x head_size to head_size
- add dropout at the last of multuheadattention layer
- feedforward in block with two linear layers, emb -> 4*emb -> emb (sacle up and bring back)
- dropout inside this feedforward network
- use multiheadattention in block instead of nn module list (easy refactor)
- [ ]
-
Read through the coursera genai course finsihed materials.
-
Get an overview for the current status