how good or bad are the predicted probabilities??
low probability --> high penalty
-log(1.0) = 0
-log(0.8) = 0.22314
-log(0.6) = 0.51082
y = -log(x)
![image](https://user-images.githubusercontent.com/67103130/162584555-43d7ea8a-7dc9-4c88-9d46-6a8862616c0e.png)
binary cross entropy (only 2 classes)
![image](https://user-images.githubusercontent.com/67103130/162584753-ea275c32-97a6-4400-a89c-f2a9305be996.png)
entropy (log-likelihood)
is a measure of the uncertainty associated with a given distribution
if every balls in a box is green
you have 0 uncertainty to get a red ball (0 entropy)
what if half of the balls are red and the other half blue?
![image](https://user-images.githubusercontent.com/67103130/162584914-5e0a63b8-dbe7-4a65-af93-f03c9113adb2.png)
![image](https://user-images.githubusercontent.com/67103130/162585053-c49d2f8d-348d-48c0-b7d8-0fca35b962ca.png)
if red:blue ratio is 20:80
H(q)=-(0.2log(0.2)+0.8log(0.8))=0.5
higher the entropy harder to predict
cross-entropy
cross entorpy between two distributions ...
![image](https://user-images.githubusercontent.com/67103130/162585282-bb5f50bf-f9eb-4fa7-bb4c-d101ec0ebd50.png)
If we, somewhat miraculously, match p(y) to q(y) perfectly, the computed values for both cross-entropy and entropy will match as well.
Since this is likely never happening, cross-entropy will have a BIGGER value than the entropy computed on the true distribution.
e.g.
red, green, blue (probability) = 0.8, 0.1, 0.1
predicted probability = 0.2, 0.2, 0.6
![image](https://user-images.githubusercontent.com/67103130/162585465-a536371a-0218-4125-a0b4-4e1c30e2b72d.png)
Kullback-Leibler Divergence (KL Divergence)
measure of dissimilarity between two distribution
difference between (cross-entropy and entropy)
![image](https://user-images.githubusercontent.com/67103130/162585540-bdb7e930-3b3f-4180-8b90-ea328af3c856.png)