Information Entropy - Shannon Entropy - Tech It Yourself

## Thursday, 8 August 2019

1. Shannon Entropy
- This formula measures the uncertainty of information (it is not easy to guess the information). It also express the least number of questions to identify the information.
Note: log is base 2
- We have 3 strings: "AAAAAAAA", "AAAABBCD", "AABBCCDD"
"AAAAAAAA" has S = -1*log(8/8) = 0
"AAAABBCD" has S = -(4/8)*log(4/8) - (2/8)*log(2/8) - (1/8)*log(1/8) - (1/8)*log(1/8) = 1.75
"AABBCCDD" has S = -(2/8)*log(2/8) - (2/8)*log(2/8) - (2/8)*log(2/8) - (2/8)*log(2/8) = 2
- The uncertainty of "AABBCCDD" is largest
Refer this.
2. Apply in Deep Learning - Classification
Use a modification version of Shannon Entropy => Cross-Entropy.
2.1 Binary Cross-Entropy Loss
Output only takes 2 classes.

yi:      True label
p(yi): Predicted label
2.2 Cross-Entropy Loss
Output can takes n (> 2) classes.
q(yc): True label, one-hot encoded
p(yc): Predicted label passed through softmax
Comparing with Shannon Entropy
If p(yc) move colser to q(yc) (minimizes the cross-entropy), Cross-Entropy becomes Shannon Entropy. But Cross-Entropy is often greater than Shannon Entropy then we have Kullback-Leibler Divergence. Kullback-Leibler Divergence measures the divergence between q(yc) and p(yc).
Refer this.