Deep learning for dummies, by Quentin Anthony et al.
All the practical details that go into working with real models!
For training/inference calculations (e.g. FLOPs, memory overhead, and parameter count)
For benchmarks (e.g. communication)
Transformers Math 101. A blog post from EleutherAI on training/inference memory estimations, parallelism, FLOP calculations, and deep learning datatypes
LLM Visualizations. Clear LLM visualizations and animations for basic transformer understanding.
Transformer Inference Arithmetic. A breakdown on the memory overhead, FLOPs, and latency of transformer inference
ML-Engineering Repository. Containing community notes and practical details of everything deep learning training led by Stas Bekman
LLM Finetuning Memory Requirements by Alex Birch. A practical guide on the memory overhead of finetuning models.
Annotated PyTorch Paper Implementations
Everything about Distributed Training and Efficient Finetuning by Sumanth R Hegde. High-level descriptions and links on parallelism and efficient finetuning.
Transformer Training and Inference VRAM Estimator by Alexander Smirnov. User-friendly tool to estimate VRAM overhead.
Cerebras Model Lab. User-friendly tool to apply Chinchilla scaling laws.
GPT Inference
GPT Training
Architecture-Specific Examples
- https://github.com/zphang/minimal-gpt-neox-20b
- https://github.com/zphang/minimal-llama
- https://github.com/zphang/minimal-opt
If you found a bug, typo, or would like to propose an improvement please don't hesitate to open an Issue or contribute a PR.