This repository contains a submission for NYU ECE-GY 7123 Deep Learning S24 Kaggle Competition titled "Distilled MixUp Squeeze Residual Network", authored by Shubham Singh, Inder Khatri, and Xu Zhou.
We propose a ResNet model evaluated on the CIFAR-10 dataset trained using various techniques such as mixUp data augmentation, Squeeze-and-Excitation (SE) blocks, and knowledge distillation.
- GPU: P100 2 hours
- Batch Size: 128
- Epochs: 200
- Dropout: 0.1
- Validation Accuracy: 96.3%
- Testing Accuracy: 86.9%
- Optimizer: Stochastic Gradient Descent
- Learning Rate: 0.1
- Teacher model: ResNet50
- Student model: Custom model with a ResNet containing 4.6 million parameters, mixUp, and dropout of 0.1.
The weights
folder contains 2 weight files:
- Weights of the trained ResNet50 model.
- Weights of our final model.
- Run the
ResNet50.ipynb
notebook to train ResNet50. - Use weights from ResNet50 to train a distillation notebook containing a smaller model.
- Shubham Singh
- Inder Khatri
- Xu Zhou