================================
- Model Compression, Rich Caruana, 2006
- Dark knowledge, Geoffrey Hinton , OriolVinyals & Jeff Dean, 2014
- Distilling the Knowledge in a Neural Network, Hinton, J.Dean, 2015
- Cross Modal Distillation for Supervision Transfer, Saurabh Gupta, Judy Hoffman, Jitendra Malik, 2015
- Do deep convolutional nets really need to be deep and convolutional?, Gregor Urban, Krzysztof J. Geras, Samira Ebrahimi Kahou, Ozlem Aslan, Shengjie Wang, Rich Caruana, Abdelrahman Mohamed, Matthai Philipose, Matt Richardson, 2016
- Paying More Attention to Attention: Improving the Performance of Convolutional Neural Networks via Attention Transfer, Sergey Zagoruyko, Nikos Komodakis, 2016
- FitNets: Hints for Thin Deep Nets, Adriana Romero, Nicolas Ballas, Samira Ebrahimi Kahou, Antoine Chassang, Carlo Gatta, Yoshua Bengio, 2015
- Deep Model Compression: Distilling Knowledge from Noisy Teachers, Bharat Bhusan Sau, Vineeth N. Balasubramanian, 2016
- Sequence-Level Knowledge Distillation, deeplearning-papernotes, Yoon Kim, Alexander M. Rush, 2016
- Data-Free Knowledge Distillation For Deep Neural Networks, Raphael Gontijo Lopes, Stefano Fenu, 2017
- Dark knowledge, Geoffrey Hinton, 2014
- Model Compression, Rich Caruana, 2016
- Distilling knowledge to specialist ConvNets for clustered classification
- Sequence-Level Knowledge Distillation, Neural Machine Translation on Android
- cifar.torch distillation
- FitNets: Hints for Thin Deep Nets
- Transfer knowledge from a large DNN or an ensemble of DNNs into a small DNN