### Lecture 4

```Bioinformatics lectures at Rice
University
Lecture 4: Shannon entropy and
mutual information
-- from Science 16 December 2011
The definition of Shannon entropy
In information theory, entropy is a measure of the
uncertainty associated with a random variable. In this
context, the term usually refers to the Shannon entropy,
which quantifies the expected value of the information
contained in a message, usually in units such as bits. In this
context, a 'message' means a specific realization of the
random variable.
Equivalently, the Shannon entropy is a measure of the
average information content one is missing when one does
not know the value of the random variable. The concept
was introduced by Claude E. Shannon in his 1948 paper "A
Mathematical Theory of Communication".
How do we measure information in a message?
Definition a message: a string of symbols.
The following was Shannon’s argument:
From Shannon’s ‘A mathematical theory of communication’
Shannon entropy was established in the
context of telegraph communication
Shannon’s argument to name H as the entropy:
Some Properties of H:
•The amount of entropy is not always an integer number of bits.
•Many data bits may not convey information. For example, data structures
often store information redundantly, or have identical sections regardless of
the information in the data structure.
•For a message of n characters, H is larger when a larger character set is used.
Thus ‘010111000111’ has less information than ‘qwertasdfg12’.
•However, if a character is rarely used, its contribution to H is small, because
since p * log(p)  0 as p0. Also, if a character constitute the vast majority,
e.g., ‘10111111111011’, the contribution of 1s to H is small, since p * log(p) 
0 as p1.
•For a random variable with n outcomes, H reaches maximum when
probabilities of the outcomes are all the same, i.e., 1/n, and H = log(n).
•When x is a continuous variable with a fixed variance, Shannon proved that H
reaches maximum when x follows a Gaussian distribution.
Shannon’s proof:
Summary
Or simply:
Note that:
H>=0
Spi =1
H is larger when there are more probable states.
H can be generally computed whenever there is a p distribution.
Mutual information
Mutual information is measure of dependence.
The concept was introduced by Shannon in 1948
and has become widely used in many different
fields
Formulation of mutual information
Meaning of MI
MI and H
Properties of MI
Connection between correlation and MI.
Example of MI application
Estimated MI between
goals a team scored in a
game and whether
the team was playing at
home or away. The
heights of the grey bars
provide
the approximate 95% of
the null points. The
the line
because it would have
been hidden by the grey