What is entropy?

23 Sep 2017, 07:43 • theory, entropy

Intuitively, entropy is the measure of in how many states a system can be. But how can we arrive from this intuition at \( H(X)= -\sum_i p_i \log p_i \)?

Consider an alphabet \(\{\alpha_1,\dots,\alpha_m\}\) and imagine we want to create a sequence with \(k_i\) occurrences of \(\alpha_i\). The length of the sequence is \(n=\sum_{i=1}^m k_i\). The number of all possible sequences formed this way is given by the multinomial coefficient:

\[ {{n} \choose {k_1,\dots,k_m}} = \frac{n!}{k_1! \dots k_m!} \]

However, we want our measure not to depend on \(n\) so we'll take the nth root of the multinomial coefficient. You can think of it as something like the geometric mean:

\[ \sqrt[\leftroot{4}n]{\frac{n!}{k_1! \dots k_m!}} \]

Note that this expression is multiplicative, but entropy is defined as an additive quantity. To convert a multiplicative quantity into an additive one, we can take its logarithm:

\[ \log \left(\frac{n!}{k_1! \dots k_m!}\right)^{\frac{1}{n}} = \frac{\log n! - \log k_1! - \dots - \log k_m!}{n} = \frac{\log n! - \sum_i \log k_i!}{n} \]

At this point we can use Stirling’s formula (\(\log n! \approx n \log n - n\)) to simplify the fraction:

\[ \frac{\log n! - \sum_i \log k_i!}{n} \approx \frac{n \log n - n - \sum_i k_i \log k_i + \sum_i k_i}{n} \]


\[ \begin{array}{ll} H(X) & = \log n - \sum_i \frac{k_i \log k_i}{n} = -\left( \sum_i \frac{k_i}{n} \log k_i - \sum_i \frac{k_i}{n} \log n \right) = -\sum_i \frac{k_i}{n} \log \frac{k_i}{n} \\ & = -\sum_i p_i \log p_i \\ \end{array} \]

Note that \(H(X)\) depends merely on \(p_i=\frac{k_i}{n}\).