Lecture

```Statistical Theory;
Why is the Gaussian Distribution
so popular?
Rob Nicholls
MRC LMB Statistics Course 2014
Contents
•
•
•
•
•
Continuous Random Variables
Expectation and Variance
Moments
The Law of Large Numbers (LLN)
The Central Limit Theorem (CLT)
Continuous Random Variables
A Random Variable is an object whose value is determined by chance,
i.e. random events
Probability that the random variable X adopts a particular value x:
ì
ïï p Î [0,1]
P(X = x) = í
ï
0
ïî
X: discrete
X: continuous
Continuous Random Variables
Continuous Uniform Distribution:
Probability Density Function:
X ~ U(a, b)
ì
ïï (b - a)-1
f X (x) = í
ï
0
ïî
x Î [a, b]
x Ï [a, b]
Continuous Random Variables
Example:
X ~ U(0,1)
P(X Î [0,1]) =1
Continuous Random Variables
Example:
X ~ U(0,1)
P(X Î [0, 12 ]) = 12
Continuous Random Variables
Example:
X ~ U(0,1)
P(X Î [0, 13 ]) = 13
Continuous Random Variables
Example:
X ~ U(0,1)
P(X Î [0, 101 ]) = 101
Continuous Random Variables
Example:
X ~ U(0,1)
1
1
P(X Î [0, 100
]) = 100
Continuous Random Variables
Example:
X ~ U(0,1)
P(X Î [0, 1n ]) = 1n
Lim P(X Î [0, 1n ]) = 0
n®¥
Lim P(0 £ X £ 1n ) = 0
n®¥
In general, for any continuous random variable X:
Lim P(0 £ X £ e ) = 0
e ®0
Lim P(a £ X £ a + e ) = 0
e®0
Continuous Random Variables
ì
ïï pX (x)
P(X = x) = í
ï
0
ïî
X : discrete
X : continuous
“Why do I observe a value if there’s no probability of observing it?!”
• Data are discrete
• You don’t actually observe the value – precision error
• Some value must occur… even though the probability of observing any
particular value is infinitely small
Continuous Random Variables
For a random variable:
X :W® A
The Cumulative Distribution Function (CDF) is defined as:
FX (x) = P(X £ x)
Properties:
• Non-decreasing
•
Lim FX (x) = 0
•
Lim FX (x) =1
x®-¥
x®+¥
(discrete/continuous)
Continuous Random Variables
Probability Density function:
Cumulative Distribution function:
Boxplot:
fX (x)
FX (x) = P(X £ x)
Continuous Random Variables
Probability Density function:
Cumulative Distribution function:
Boxplot:
fX (x)
FX (x) = P(X £ x)
Continuous Random Variables
Probability Density function:
Cumulative Distribution function:
fX (x)
FX (x) = P(X £ x)
FX (x) =
x
òf
-¥
Boxplot:
X
(y)dy
Continuous Random Variables
Probability Density function:
Cumulative Distribution function:
fX (x)
FX (x) = P(X £ x)
FX (x) =
x
òf
X
(y)dy
-¥
d
FX (x) = f X (x)
dx
Boxplot:
Continuous Random Variables
Probability Density function:
Cumulative Distribution function:
fX (x)
FX (x) = P(X £ x)
FX (x) =
x
òf
X
(y)dy
-¥
d
FX (x) = f X (x)
dx
Boxplot:
Continuous Random Variables
Probability Density function:
Cumulative Distribution function:
fX (x)
FX (x) = P(X £ x)
FX (x) =
x
òf
X
(y)dy
-¥
d
FX (x) = f X (x)
dx
Boxplot:
Continuous Random Variables
Probability Density function:
Cumulative Distribution function:
fX (x)
FX (x) = P(X £ x)
FX (x) =
x
òf
X
(y)dy
-¥
d
FX (x) = f X (x)
dx
Boxplot:
Continuous Random Variables
Probability Density function:
Cumulative Distribution function:
fX (x)
FX (x) = P(X £ x)
FX (x) =
x
òf
X
(y)dy
-¥
d
FX (x) = f X (x)
dx
Boxplot:
Continuous Random Variables
Probability Density function:
Cumulative Distribution function:
fX (x)
FX (x) = P(X £ x)
FX (x) =
x
òf
X
(y)dy
-¥
d
FX (x) = f X (x)
dx
Boxplot:
Continuous Random Variables
Probability Density function:
Cumulative Distribution function:
fX (x)
FX (x) = P(X £ x)
FX (x) =
x
òf
X
(y)dy
-¥
d
FX (x) = f X (x)
dx
Boxplot:
Continuous Random Variables
Probability Density function:
Cumulative Distribution function:
fX (x)
FX (x) = P(X £ x)
FX (x) =
x
òf
X
(y)dy
-¥
d
FX (x) = f X (x)
dx
Boxplot:
Continuous Random Variables
Cumulative Distribution Function (CDF):
Discrete:
Continuous:
Probability Density Function (PDF):
Discrete:
pX (x) = P(X = x)
¥
åp
X
(x) = 1
x=-¥
Continuous:
d
f X (x) = FX (x)
dx
¥
òf
-¥
X
(x)dx =1
Expectation and Variance
Motivational Example:
Experiment on Plant Growth (inbuilt R dataset)
- compares yields obtained under different conditions
Expectation and Variance
Motivational Example:
Experiment on Plant Growth (inbuilt R dataset)
- compares yields obtained under different conditions
•
Compare means to test for differences
•
Consider variance (and shape) of the
distributions – help choose appropriate
prior/protocol
•
Assess uncertainty of parameter
estimates – allow hypothesis testing
Expectation and Variance
Motivational Example:
Experiment on Plant Growth (inbuilt R dataset)
- compares yields obtained under different conditions
•
Compare means to test for differences
•
Consider variance (and shape) of the
distributions – help choose appropriate
prior/protocol
•
Assess uncertainty of parameter
estimates – allow hypothesis testing
In order to do any of this, we need to know how to describe distributions
i.e. we need to know how to work with descriptive statistics
Expectation and Variance
¥
Discrete RV:
E(X) = å xpX (x)
-¥
Sample (empirical):
Continuous RV:
1 n
E(X) = å xi
n i=1
E(X) =
(explicit weighting not required)
¥
ò xf
-¥
X
(x)dx
Expectation and Variance
Normal Distribution:
X ~ N(m, s 2 )
1
f X (x) =
e
2ps
-( x-m )
2s 2
2
Expectation and Variance
Normal Distribution:
X ~ N(m, s 2 )
1
f X (x) =
e
2ps
E(X) =
¥
ò xf
-¥
X
-( x-m )
2
2s 2
(x)dx = m
Expectation and Variance
Standard Cauchy Distribution:
(also called Lorentz)
1
1
f X (x) =
p 1+ x 2
E(X) =
¥
ò xf
-¥
X
(x)dx = undefined
Expectation and Variance
Expectation of a function of random variables:
E(g(X)) =
¥
ò g(x) f
X
(x)dx
-¥
Linearity:
¥
E(a X + b ) = ò (a x + b ) f X (x)dx
-¥
¥
¥
-¥
-¥
= a ò xfX (x)dx + b ò fX (x)dx
= a E(X)+ b
Expectation and Variance
Variance:
2
=
E
(X
m
)
Var(X)
(
)
= E ( (X - E(X))2 )
= E(X 2 ) - E(X)2
X ~ N(0,1)
Expectation and Variance
Variance:
2
=
E
(X
m
)
Var(X)
(
)
= E ( (X - E(X))2 )
= E(X 2 ) - E(X)2
X ~ N(0,1)
Expectation and Variance
Variance:
2
=
E
(X
m
)
Var(X)
(
)
= E ( (X - E(X))2 )
= E(X 2 ) - E(X)2
X ~ N(0,1)
X ~ N(0,2)
Expectation and Variance
Variance:
2
=
E
(X
m
)
Var(X)
(
)
= E ( (X - E(X))2 )
= E(X 2 ) - E(X)2
Population Variance:
n
1
2
s 2 = å( xi - m )
n i=1
Unbiased Sample Variance:
n
1
2
2
s =
( xi - x )
å
n -1 i=1
X ~ N(0,1)
X ~ N(0,2)
Expectation and Variance
Variance:
Var(X) = E(X 2 ) - E(X)2
Standard deviation (s.d.):
Non-linearity:
Var (a X + b ) = E ((a X + b ) ) - E (a X + b )
2
2
Var(X)
Expectation and Variance
Variance:
Var(X) = E(X 2 ) - E(X)2
Standard deviation (s.d.):
Non-linearity:
Var (a X + b ) = E ((a X + b ) ) - E (a X + b )
2
2
= E (a X + 2ab X + b ) - (a E(X) + b )
2
2
2
2
Var(X)
Expectation and Variance
Variance:
Var(X) = E(X 2 ) - E(X)2
Standard deviation (s.d.):
Var(X)
Non-linearity:
Var (a X + b ) = E ((a X + b ) ) - E (a X + b )
2
2
= E (a X + 2ab X + b ) - (a E(X) + b )
2
2
2
2
= (a 2 E(X 2 )+ 2ab E(X)+ b 2 ) - (a 2 E(X)2 + 2ab E(X) + b 2 )
Expectation and Variance
Variance:
Var(X) = E(X 2 ) - E(X)2
Standard deviation (s.d.):
Var(X)
Non-linearity:
Var (a X + b ) = E ((a X + b ) ) - E (a X + b )
2
2
= E (a X + 2ab X + b ) - (a E(X) + b )
2
2
2
2
= (a 2 E(X 2 )+ 2ab E(X)+ b 2 ) - (a 2 E(X)2 + 2ab E(X) + b 2 )
Expectation and Variance
Variance:
Var(X) = E(X 2 ) - E(X)2
Standard deviation (s.d.):
Var(X)
Non-linearity:
Var (a X + b ) = E ((a X + b ) ) - E (a X + b )
2
2
= E (a X + 2ab X + b ) - (a E(X) + b )
2
2
2
2
= (a 2 E(X 2 )+ 2ab E(X)+ b 2 ) - (a 2 E(X)2 + 2ab E(X) + b 2 )
= a 2 ( E(X 2 ) - E(X)2 )
= a 2Var(X)
Expectation and Variance
Often data are standardised/normalised
Z-score/value:
Z=
X -m
s
Example:
X ~ N(m, s )
2
Z ~ N(0,1)
1
f X (x) =
e
2ps
-( x-m )
2s 2
1 - x2 2
fZ (x) =
e
2p
2
Moments
Shape descriptors
Li and Hartley (2006) Computer Vision
Saupe and Vranic (2001) Springer
Moments
Shape descriptors
Li and Hartley (2006) Computer Vision
Saupe and Vranic (2001) Springer
Moments
Shape descriptors
Li and Hartley (2006) Computer Vision
Saupe and Vranic (2001) Springer
Moments
Moments provide a description of the shape of a distribution
Raw moments
m1 = E(X) = m
: mean
Central moments
Standardised moments
0
0
E ( (X - m )2 ) = s 2 : variance
...
...
...
...
...
mn = E(X )
n
E ( (X - m )
n
)
1
E
(( ) )
: skewness
E
(( ) )
: kurtosis
E
(( ) )
X-m 3
s
X-m
4
s
X-m n
s
Moments
Standard Normal:
Standard Log-Normal:
Moments
Moment generating function (MGF):
2
n
t
t
M X (t) = E(e Xt ) =1+ tE(X) + E(X 2 ) +... + E(X n ) +...
2!
n!
Alternative representation of a probability distribution.
n
d
mn = E(X n ) = n M X (0)
dt
Moments
Moment generating function (MGF):
2
n
t
t
M X (t) = E(e Xt ) =1+ tE(X) + E(X 2 ) +... + E(X n ) +...
2!
n!
Alternative representation of a probability distribution.
n
d
mn = E(X n ) = n M X (0)
dt
Example:
X ~ N(m, s )
2
X ~ N(0,1)
Þ
Þ
M X (t) = e
tm + 12 s 2t 2
M X (t) = e
1 t2
2
Moments
However, MGF only exists if E(Xn) exists
M X (t) = E(e Xt )
Characteristic function always exists:
¥
j X (t) = M iX (t) = M X (it) = E(eitX ) = ò eitx fX (x)dx
-¥
Related to the probability density function via Fourier transform
Example:
X ~ N(0,1)
j X (t) = e
-t 2 2
The Law of Large Numbers (LLN)
Motivational Example:
Experiment on Plant Growth (inbuilt R dataset)
- compare yields obtained under different conditions
•
Want to estimate the population mean
using the sample mean.
•
How can we be sure that the
sample mean reliably estimates
the population mean?
The Law of Large Numbers (LLN)
Does the sample mean reliably estimate the population mean?
The Law of Large Numbers:
1 n
Xn = å Xi ¾n®¥
¾¾
®m
n i=1
Providing Xi : i.i.d.
The Law of Large Numbers (LLN)
Does the sample mean reliably estimate the population mean?
The Law of Large Numbers:
1 n
Xn = å Xi ¾n®¥
¾¾
®m
n i=1
Providing Xi : i.i.d.
X ~ U(0,1)
m = 0.5
The Central Limit Theorem (CLT)
Question - given a particular sample, thus known sample mean, how
reliable is the sample mean as an estimator of the population mean?
Furthermore, how much will getting more data improve the estimate of
the population mean?
Related question - given that we want the estimate of the mean to have
a certain degree of reliability (i.e. sufficiently low S.E.), how many
observations do we need to collect?
The Central Limit Theorem helps answer these questions by looking at
the distribution of stochastic fluctuations about the mean as
n ®¥
The Central Limit Theorem (CLT)
The Central Limit Theorem states:
For large n:
Or equivalently:
More formally:
n ( Xn - m ) ~ N(0, s 2 )
æ s2ö
Xn ~ N ç m, ÷
è n ø
æ1 n
ö d
n ç å Xi - m ÷ ¾ ¾
® N(0, s 2 )
è n i=1
ø
Conditions:
E(Xi ) = m
Var(Xi ) = s 2 < ¥
Xi : i.i.d. RVs (any distribution)
The Central Limit Theorem (CLT)
The Central Limit Theorem states:
For large n:
X ~ U(0,1)
n ( Xn - m ) ~ N(0, s 2 )
m = 0.5
s 2 = 112
æ s2ö
Xn ~ N ç m, ÷
è n ø
The Central Limit Theorem (CLT)
The Central Limit Theorem states:
For large n:
X ~ U(0,1)
n ( Xn - m ) ~ N(0, s 2 )
m = 0.5
s 2 = 112
æ s2ö
Xn ~ N ç m, ÷
è n ø
The Central Limit Theorem (CLT)
The Central Limit Theorem states:
For large n:
X ~ U(0,1)
n ( Xn - m ) ~ N(0, s 2 )
m = 0.5
s 2 = 112
æ s2ö
Xn ~ N ç m, ÷
è n ø
The Central Limit Theorem (CLT)
Proof of the Central Limit Theorem:
æ1 n
ö d
n ç å Xi - m ÷ ¾ ¾
® N(0, s 2 )
è n i=1
ø
The Central Limit Theorem (CLT)
Proof of the Central Limit Theorem:
æ1 n
ö d
n ç å Xi - m ÷ ¾ ¾
® N(0, s 2 )
è n i=1
ø
ö d
1 æn
® N(0, s 2 )
çå Xi - nm ÷ ¾¾
n è i=1
ø
The Central Limit Theorem (CLT)
Proof of the Central Limit Theorem:
æ1 n
ö d
n ç å Xi - m ÷ ¾ ¾
® N(0, s 2 )
è n i=1
ø
ö d
1 æn
® N(0, s 2 )
çå Xi - nm ÷ ¾¾
n è i=1
ø
ö d
1 æn
® N(0,1)
çå Xi - nm ÷ ¾¾
s n è i=1
ø
The Central Limit Theorem (CLT)
Proof of the Central Limit Theorem:
æ1 n
ö d
n ç å Xi - m ÷ ¾ ¾
® N(0, s 2 )
è n i=1
ø
ö d
1 æn
® N(0, s 2 )
çå Xi - nm ÷ ¾¾
n è i=1
ø
ö d
1 æn
® N(0,1)
çå Xi - nm ÷ ¾¾
s n è i=1
ø
1 n æ Xi - m ö d
® N(0,1)
ç
÷ ¾¾
å
n i=1 è s ø
The Central Limit Theorem (CLT)
Proof of the Central Limit Theorem:
æ1 n
ö d
n ç å Xi - m ÷ ¾ ¾
® N(0, s 2 )
è n i=1
ø
ö d
1 æn
® N(0, s 2 )
çå Xi - nm ÷ ¾¾
n è i=1
ø
ö d
1 æn
® N(0,1)
çå Xi - nm ÷ ¾¾
s n è i=1
ø
1 n æ Xi - m ö d
® N(0,1)
ç
÷ ¾¾
å
n i=1 è s ø
n
å
i=1
Zi d
¾¾
® N(0,1)
n
Zi =
Xi - m
s
The Central Limit Theorem (CLT)
Proof of the Central Limit Theorem:
n
å
i=1
Zi d
¾¾
® N(0,1)
n
The Central Limit Theorem (CLT)
Proof of the Central Limit Theorem:
n
å
i=1
Zi d
¾¾
® N(0,1)
n
æ t ö
j n Z (t) = j n ç ÷
å ni
å Zi è n ø
i=1
i=1
E(e
æ Z ö
iç ÷t
è nø
) = E(e
æ t ö
iZ ç ÷
è nø
)
The Central Limit Theorem (CLT)
Proof of the Central Limit Theorem:
n
å
i=1
Zi d
¾¾
® N(0,1)
n
æ t ö
j n Z (t) = j n ç ÷
å ni
å Zi è n ø
i=1
i=1
æ t ö
= Õ j Zi ç
÷
è nø
i=1
n
E(e
E(e
æ Z ö
iç ÷t
è nø
it ( Z1+Z2 )
) = E(e
æ t ö
iZ ç ÷
è nø
)
) = E(eitZ1 )E(eitZ2 )
The Central Limit Theorem (CLT)
Proof of the Central Limit Theorem:
n
å
i=1
Zi d
¾¾
® N(0,1)
n
E(e
æ t ö
j n Z (t) = j n ç ÷
å ni
å Zi è n ø
i=1
E(e
i=1
æ t ö
= Õ j Zi ç
÷
è nø
i=1
n
n
æ æ t öö
= çj Z ç ÷÷
è è n øø
æ Z ö
iç ÷t
è nø
it ( Z1+Z2 )
) = E(e
æ t ö
iZ ç ÷
è nø
)
) = E(eitZ1 )E(eitZ2 )
The Central Limit Theorem (CLT)
Proof of the Central Limit Theorem:
n
å
i=1
Zi d
¾¾
® N(0,1)
n
E(e
æ t ö
j n Z (t) = j n ç ÷
å ni
å Zi è n ø
i=1
E(e
it ( Z1+Z2 )
i=1
æ t ö
= Õ j Zi ç
÷
è nø
i=1
n
n
æ æ t öö
= çj Z ç ÷÷
è è n øø
æ Z ö
iç ÷t
è nø
n
æ t2
æ t 2 öö
= ç1- + o ç ÷÷
è 2n è n øø
) = E(e
æ t ö
iZ ç ÷
è nø
)
) = E(eitZ1 )E(eitZ2 )
The Central Limit Theorem (CLT)
Proof of the Central Limit Theorem:
n
å
i=1
Zi d
¾¾
® N(0,1)
n
E(e
æ t ö
j n Z (t) = j n ç ÷
å ni
å Zi è n ø
i=1
E(e
æ Z ö
iç ÷t
è nø
it ( Z1+Z2 )
) = E(e
æ t ö
iZ ç ÷
è nø
)
) = E(eitZ1 )E(eitZ2 )
n
æ xö
x
e = Lim ç1+ ÷
n®¥ è
nø
i=1
æ t ö
= Õ j Zi ç
÷
è nø
i=1
n
n
æ æ t öö
= çj Z ç ÷÷
è è n øø
n
æ t2
æ t 2 öö
= ç1- + o ç ÷÷
è 2n è n øø
¾¾
®e
= characteristic function of a
-t 2 2
N(0,1)
Summary
Considered how:
• Probability Density Functions (PDFs) and Cumulative Distribution
Functions (CDFs) are related, and how they differ in the discrete and
continuous cases
• Expectation is at the core of Statistical theory, and Moments can be
used to describe distributions
• The Central Limit Theorem identifies how/why the Normal distribution
is fundamental
The Normal distribution is also popular for other reasons:
• Maximum entropy distribution (given mean and variance)
• Intrinsically related to other distributions (t, F, χ2, Cauchy, …)
• Also, it is easy to work with
References
Countless books + online resources!
Probability and Statistical theory:
• Grimmett and Stirzker (2001) Probability and Random Processes.
Oxford University Press.
General comprehensive introduction to (almost) everything
mathematics:
• Garrity (2002) All the mathematics you missed: but need to know
for graduate school. Cambridge University Press.
```