jl749 / knowledge-distillation-pytorch Goto Github PK

 def distillation(y, labels, teacher_scores, T, alpha): 

 return nn.KLDivLoss()(F.log_softmax(y/T), F.softmax(teacher_scores/T)) * (T*T * 2.0 * alpha) + F.cross_entropy(y, labels) * (1. - alpha)

why x2 ?

from /knowledge-distillation-pytorch/issues/10

cross-entropy (log loss) & KL Divergence

how good or bad are the predicted probabilities??

low probability --> high penalty
-log(1.0) = 0
-log(0.8) = 0.22314
-log(0.6) = 0.51082

y = -log(x)

binary cross entropy (only 2 classes)

entropy (log-likelihood)

is a measure of the uncertainty associated with a given distribution

if every balls in a box is green
you have 0 uncertainty to get a red ball (0 entropy)

what if half of the balls are red and the other half blue?

if red:blue ratio is 20:80

H(q)=-(0.2log(0.2)+0.8log(0.8))=0.5

higher the entropy harder to predict

cross-entropy

cross entorpy between two distributions ...

If we, somewhat miraculously, match p(y) to q(y) perfectly, the computed values for both cross-entropy and entropy will match as well.

Since this is likely never happening, cross-entropy will have a BIGGER value than the entropy computed on the true distribution.

e.g.
red, green, blue (probability) = 0.8, 0.1, 0.1
predicted probability = 0.2, 0.2, 0.6

Kullback-Leibler Divergence (KL Divergence)

measure of dissimilarity between two distribution
difference between (cross-entropy and entropy)

jl749 / knowledge-distillation-pytorch Goto Github PK

knowledge-distillation-pytorch's Issues

student MNIST code reading + rewrite

what

why

TODO:

change files under mnist/ to modular format+ distillation code reading

what

why

TODO

KD loss function

why x2 ?

cross-entropy (log loss) & KL Divergence

how good or bad are the predicted probabilities??

entropy (log-likelihood)

cross-entropy

Kullback-Leibler Divergence (KL Divergence)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

	def distillation(y, labels, teacher_scores, T, alpha):
	return nn.KLDivLoss()(F.log_softmax(y/T), F.softmax(teacher_scores/T)) * (TT 2.0 * alpha) + F.cross_entropy(y, labels) * (1. - alpha)