### The Central Limit Theorem

```Paul Cornwell
March 31, 2011
1

Let X1,…,Xn be independent, identically
distributed random variables with positive
variance. Averages of these variables will be
approximately normally distributed with
mean μ and standard deviation σ/√n when n
is large.
2


How large of a sample size is required for the
Central Limit Theorem (CLT) approximation
to be good?
What is a ‘good’ approximation?
3

Permits analysis of random variables even
when underlying distribution is unknown

Estimating parameters

Hypothesis Testing

Polling
4


Performing a hypothesis test to determine if
set of data came from normal
Considerations
◦ Power: probability that a test will reject the null
hypothesis when it is false
◦ Ease of Use
5

Problems
◦ No test is desirable in every situation (no universally
most powerful test)
◦ Some lack ability to verify for composite hypothesis
of normality (i.e. nonstandard normal)
◦ The reliability of tests is sensitive to sample size;
with enough data, null hypothesis will be rejected
6

Symmetric

Unimodal

Bell-shaped

Continuous
7

Skewness: Measures the asymmetry of a
distribution.
◦ Defined as the third standardized moment
◦ Skew of normal distribution is 0
 X   
 1  E 

  
3



 X
n
i 1
i
X

3
(n  1) s 3
8

Kurtosis: Measures peakedness or heaviness
of the tails.
◦ Defined as the fourth standardized moment
◦ Kurtosis of normal distribution is 3
4

 
 x    
 2  E 



 

n 

 X
n
i 1
i
X

4
(n  1) s 4
9

Cumulative distribution function:
X
F ( x; n, p)   n C i  p i (1  p) ni
i 0

E[ X ]  np
Var[ X ]  np(1  p)
10
parameters Kurtosis
Skewness
% outside
1.96*sd
K-S
distance
Mean
Std Dev
n = 20
p = .2
-.0014
(.25)
.3325
(1.5)
.0434
.128
3.9999
1.786
n = 25
p = .2
.002
.3013
.0743
.116
5.0007
2.002
n = 30
p = .2
.0235
.2786
.0363
.106
5.997
2.188
n = 50
p = .2
.0106
.209
.0496
.083
10.001
2.832
.005
.149
.05988
.0574
19.997
4.0055
n = 100
p = .2
*from R
11

Cumulative distribution function:
xa
F ( x; a, b) 
ba

ab
E[ X ] 
2
(b  a ) 2
Var[ X ] 
12
12
parameters
Kurtosis
Skewness
% outside
1.96*sd
K-S
distance
Mean
Std Dev
n=5
(a,b) = (0,1)
-.236
(-1.2)
.004
(0)
.0477
.0061
.4998
.1289 (.129)
n=5
(a,b) = (0,50)
-.234
0
.04785
.0058
24.99
6.468 (6.455)
n=5
(a,b) = (0, .1)
-.238
-.0008
.048
.0060
.0500
.0129 (.0129)
n=3
(a,b) = (0,50)
-.397
-.001
.0468
.01
24.99
8.326 (8.333)
*from R
13

Cumulative distribution function:
F ( x;  ) 1  e

E[ X ] 
x
1

Var[ X ] 
1
2
14
parameters
Kurtosis
Skewness
% outside
1.96*sd
K-S
distance
Mean
Std Dev
1.239
(6)
.904
(2)
.0434
.0598
.9995
.4473 (.4472)
n = 10
.597
.630
.045
.042
1.0005
.316 (.316)
n = 15
.396
.515
.0464
.034
.9997
.258 (.2581)
n=5
λ=1
*from R
15

Find n values for more distributions

Refine criteria for quality of approximation

Explore meanless distributions

Classify distributions in order to have more
general guidelines for minimum sample size
16
Paul Cornwell
May 2, 2011
17



Central Limit Theorem: Averages of i.i.d.
variables become normally distributed as
sample size increases
Rate of converge depends on underlying
distribution
What sample size is needed to produce a
good approximation from the CLT?
18


Real-life applications of the Central Limit
Theorem
What does kurtosis tell us about a
distribution?

What is the rationale for requiring np ≥ 5?

What about distributions with no mean?
19

Probability for total distance covered in a
random walk tends towards normal

Hypothesis testing

Confidence intervals (polling)

Signal processing, noise cancellation
20

Measures the “peakedness” of a distribution

Higher peaks means fatter tails
4

 
 x    
 2  E 
3



 

n 

21



Traditional assumption for normality with
binomial is np > 5 or 10
Skewness of binomial distribution increases
as p moves away from .5
Larger n is required for convergence for
skewed distributions
22




Has no moments (including mean, variance)
Distribution of averages looks like regular
distribution
CLT does not apply
1
f ( x) 
 (1  x 2 )
23



α = β = 1/3
Distribution is
symmetric and bimodal
Convergence to normal
is fast in averages
24


Heavier-tailed,
bell-shaped curve
Approaches normal
distribution as
degrees of freedom
increase
25



4 statistics: K-S distance, tail probabilities,
skewness and kurtosis
Different thresholds for “adequate” and
“superior” approximations
Both are fairly conservative
26
Distribution
∣Kurtosis∣
<.5
∣Skewness∣ Tail Prob.
<.25
.04<x<.06
K-S Distance
<.05
max
Uniform
3
1
2
2
3
Beta (α=β=1/3)
4
1
3
3
4
Exponential
12
64
5
8
64
Binomial (p=.1)
11
114
14
332
332
Binomial (p=.5)
4
1
12
68
68
Student’s t
with 2.5 df
NA
NA
13
20
20
Student’s t
with 4.1 df
120
1
1
2
120
27
Distribution
∣Kurtosis∣
<.3
∣Skewness∣ Tail Prob.
<.15
.04<x<.06
K-S Distance
<.02
max
Uniform
4
1
2
2
4
Beta (α=β=1/3)
6
1
3
4
6
Exponential
20
178
5
45
178
Binomial (p=.1)
18
317
14
1850
1850
Binomial (p=.5)
7
1
12
390
390
Student’s t
with 2.5 df
NA
NA
13
320
320
Student’s t
with 4.1 df
200
1
1
5
200
28



Skewness is difficult to shake
Tail probabilities are fairly accurate for small
sample sizes
Traditional recommendation is small for
many common distributions
29
```