Entropy (for data science) Clearly Explained!!! - StatQuest with Josh Starmer

What is entropy used for in data science?

Entropy is used for a lot of things in data science, including building classification trees, quantifying the relationship between two things with mutual information, and calculating distances between probability distributions with relative entropy and cross entropy.

What is the relationship between surprise and probability?

Surprise is inversely related to probability, meaning that when the probability of an event is low, the surprise is high, and when the probability of an event is high, the surprise is low.

How is surprise calculated?

Surprise is calculated using the log of the inverse of the probability of an event. This gives a curve where the surprise increases as the probability of the event decreases, but the surprise is zero when the probability of the event is one.