jl749 / knowledge-distillation-pytorch Goto Github PK

View Code? Open in Web Editor NEW

This project forked from haitongli/knowledge-distillation-pytorch

0.0 0.0 0.0 22.61 MB

A PyTorch implementation for exploring deep and shallow knowledge distillation (KD) experiments with flexibility

License: MIT License

Python 100.00%

knowledge-distillation-pytorch's People

Contributors

knowledge-distillation-pytorch's Issues

KD loss function

knowledge-distillation-pytorch/mnist/distill_mnist.py

Lines 96 to 97 in 4a7c13d

 def distillation(y, labels, teacher_scores, T, alpha): 

 return nn.KLDivLoss()(F.log_softmax(y/T), F.softmax(teacher_scores/T)) * (T*T * 2.0 * alpha) + F.cross_entropy(y, labels) * (1. - alpha)

why x2 ?

from /knowledge-distillation-pytorch/issues/10

change files under mnist/ to modular format+ distillation code reading

what

mnist directory contains basics
code reading + reformatting

why

lots of redundant code (make it easier to read+debug)
understand distillation process

TODO

reformat teacher
read distillation code
remove redundant code

student MNIST code reading + rewrite

what

read MNIST student model + rewrite in modular format

why

repo contains deprecated codes such as Variable

TODO:

read the code
rearrange the code
add evaluation progress bar (either multiprocessing or tqdm)

cross-entropy (log loss) & KL Divergence

how good or bad are the predicted probabilities??

low probability --> high penalty
-log(1.0) = 0
-log(0.8) = 0.22314
-log(0.6) = 0.51082

y = -log(x)

binary cross entropy (only 2 classes)

entropy (log-likelihood)

is a measure of the uncertainty associated with a given distribution

if every balls in a box is green
you have 0 uncertainty to get a red ball (0 entropy)

what if half of the balls are red and the other half blue?

if red:blue ratio is 20:80

H(q)=-(0.2log(0.2)+0.8log(0.8))=0.5

higher the entropy harder to predict

cross-entropy

cross entorpy between two distributions ...

If we, somewhat miraculously, match p(y) to q(y) perfectly, the computed values for both cross-entropy and entropy will match as well.

Since this is likely never happening, cross-entropy will have a BIGGER value than the entropy computed on the true distribution.

e.g.
red, green, blue (probability) = 0.8, 0.1, 0.1
predicted probability = 0.2, 0.2, 0.6

Kullback-Leibler Divergence (KL Divergence)

measure of dissimilarity between two distribution
difference between (cross-entropy and entropy)

	def distillation(y, labels, teacher_scores, T, alpha):
	return nn.KLDivLoss()(F.log_softmax(y/T), F.softmax(teacher_scores/T)) * (TT 2.0 * alpha) + F.cross_entropy(y, labels) * (1. - alpha)

jl749 / knowledge-distillation-pytorch Goto Github PK

knowledge-distillation-pytorch's People

Contributors

knowledge-distillation-pytorch's Issues

why x2 ?

what

why

TODO

what

why

TODO:

how good or bad are the predicted probabilities??

entropy (log-likelihood)

cross-entropy

Kullback-Leibler Divergence (KL Divergence)

Recommend Projects

Recommend Topics

Recommend Org