how good or bad are the predicted probabilities??
low probability --> high penalty
-log(1.0) = 0
-log(0.8) = 0.22314
-log(0.6) = 0.51082
y = -log(x)
binary cross entropy (only 2 classes)
entropy (log-likelihood)
is a measure of the uncertainty associated with a given distribution
if every balls in a box is green
you have 0 uncertainty to get a red ball (0 entropy)
what if half of the balls are red and the other half blue?
if red:blue ratio is 20:80
H(q)=-(0.2log(0.2)+0.8log(0.8))=0.5
higher the entropy harder to predict
cross-entropy
cross entorpy between two distributions ...
If we, somewhat miraculously, match p(y) to q(y) perfectly, the computed values for both cross-entropy and entropy will match as well.
Since this is likely never happening, cross-entropy will have a BIGGER value than the entropy computed on the true distribution.
e.g.
red, green, blue (probability) = 0.8, 0.1, 0.1
predicted probability = 0.2, 0.2, 0.6
Kullback-Leibler Divergence (KL Divergence)
measure of dissimilarity between two distribution
difference between (cross-entropy and entropy)