GPT-like architecture trained on 10M parameters and 300k tokens
Tried to implement GPT architecture (i.e decoder-only transformer) for pretraining input.txt file. Came across roughly 10M parameters using 300k tokens. Goal of this repo was to create a character-level text-generator.
For training run: python train.py