transformer architecture explained in detail with code
there's been a lot of neural net architectures over the years that came and gone but the one which stuck and is the backbone of the todays world is the transformers. every ai product including chatgpt is powered by the transformers.
in this project we're going to look under the hood and see how exactly things work.
we want to understand the attention mechanism and the transformer architecture.
we start with writing attention from scratch.
attention helps the neural network focus on the relevant parts of the input sequence. it works across different modalities.