Fast.ai Deep Learning from the Foundations (Spring 2019)
Part II of Fast.ai's two-part deep learning course, offered through The Data Institute at USF. From March through the end of April in 2019. Part I is here.
A bottom-up approach (through code, not math equations) to becoming an expert deep learning practitioner and experimenter.
We implemented core fastai and PyTorch classes and modules from scratch, achieving similar or better performance. We also practiced coding up techniques introduced in various papers, and then spent significant time on strategies useful in decreasing model training time (parallelization, JIT).
The final two weeks were spent diving deep into Swift for TensorFlow with Chris Lattner, where we saw first-hand how differentiable programming could work, and experienced the joy of coding deep learning models in a language that actually gets sent directly to the compiler.
All in all, I came away with both the know-how to engineer cutting-edge deep learning ideas from scratch with optimized code, as well as the expertise necessary to research and explore new ideas of my own.
- Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification
- Understanding the difficulty of training deep feedforward neural networks
- Fixup Initialization: Residual Learning Without Normalization
- All you need is a good init
- Exact solutions to the nonlinear dynamics of learning in deep linear neural networks
- Self-Normalizing Neural Networks
- Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift
- Layer Normalization
- Instance Normalization: The Missing Ingredient for Fast Stylization
- Group Normalization
- Revisiting Small Batch Training for Deep Neural Networks
- Layer-Sequential Unit-Variance (LSUV) Weight Initialization
- Building fastai's DataBlock API from Scratch
- Improving PyTorch's Optimizers
- Image Augmentation and PyTorch JIT
- NVIDIA's DALI Batch Image Augmentation Library
- All you need is a good init
- Decoupled Weight Regularization
- L2 Regularization versus Batch and Weight Normalization
- Norm matters: efficient and accurate normalization schemes in deep networks
- Three Mechanisms of Weight Decay Regularization
- Adam: A Method for Stochastic Optimization
- Reducing BERT Pre-Training Time from 3 Days to 76 Minutes (LAMB optimizer paper)
- Going Deeper with Convolutions
- Mixup and Label Smoothing
- FP16 Training
- A Flexible & Concise XResNet Implementation
- Transfer Learning from Scratch
- A Survey of Language Model Techniques
- mixup: Beyond Empirical Risk Minimization
- Rethinking the Inception Architecture for Computer Vision (label smoothing is in part 7)
- Bag of Tricks for Image Classification with Convolutional Neural Networks (XResNets)
- Regularizing and Optimizing LSTM Language Models (AWD-LSTM)
- Universal Language Model Fine-tuning for Text Classification (ULMFiT)