Statistical Inference and Regression
Analysis:
Stat-GB.3302.30, Stat-UB.0015.01
Professor William Greene
IOMS Department
Department of Economics
Part 4 – Statistical Inference
4.1 – The Normal Family of
Distributions
Part 4 – Statistical Inference
Independence of Sample Mean and
Variance in Normal Sampling
X  (X1 , X 2 ,..., X N )  n independent Normal[, 2 ]
1
N
Xi

N i 1
1
N
Sample variance =
s2 
(X i  X) 2

i 1
N 1
2
Main result: X and s are independent.
Long elemental proof in text pp 195-197
Sample mean
X
=
 2 
Brief proof: (1) X  sum of normals, X ~ Normal , 
 N
N  1

*** (2) X i  X = linear function of normals, each is ~ N 0,  2
N 

(3) Cov[X, X i  X]  0
(4) In multivariate normals, zero covariance ==> independence
(5) X and s 2 are functions of independent variables
600000
500000
400000
Mushroom
16.2%
Plain
32.5%
Scatterplot of Listing vs IncomePC
Normal - 95% CI
700000
90
500000
400000
200000
100000
15000
60
50
40
17500
20000
22500
25000
IncomePC
27500
30000
32500
6
5
200000
2
1
100000
15000
200000
400000
600000
Listing
800000
1000000
369687
156865
51
80
8
4
0
Mean
StDev
N
10
500000
300000
10
Normal
100
12
700000
400000
30
Marginal Plot of Listing vs IncomePC
Empirical CDF of Listing
14
800000
600000
70
20
300000
200000
369687
156865
51
0.994
0.012
80
600000
Histogram of Listing
900000
Mean
StDev
N
P-Value
95
18
300000
100000
Probability Plot of Listing
99
17500
20000
22500
25000
IncomePC
27500
30000
32500
0
1000000
60
800000
40
Listing
800000
800000
Percent
900000
Frequency
Sausage
5.8%
Scatterplot of Listing vs IncomePC
900000
700000
Listing
Pepper and Onion
7.3%
Boxplot of Listing
C ategory
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball
Listing
Pepperoni
21.8%
Listing
Meatball
Part 4 – Statistical Inference
19/34
Useful Result
2
(N  1)s 2
1
N
2

X

X
~



 i
N 1
2
2 i 1
Note, N-1 degrees of freedom, not N.
(Terms are not independent).
Proof in text.
2
Limiting form: Cov  Xi  X, X j  X  = - 
0
N
Implication: E[s 2 /2 ]  1, Var[s 2 /2 ]  2 / (N  1)
s2 p

1 (converges in mean square)
2

Part 4 – Statistical Inference
20/34
Distribution of the t statistic
 X  



N


~ t N 1
(N  1)s 2
(N  1)
2

4.2 – Interval Estimation
Part 4 – Statistical Inference
22/34
Estimation
Part 4 – Statistical Inference
23/34
Obtaining a Confidence Interval
Pivotal quantity
f(estimator, parameters) that has a
known distribution free of parameters
and data
 Probability statement can be made
 Manipulate the interval to describe the
parameter.

Part 4 – Statistical Inference
24/34
Example – Normal Mean
In random sampling from the normal distribution with mean 
and variance 2 ,
N (x-μ)
~t[N-1] This is free of x.
s
 N (x-μ)

Prob 
 t *  (1  )
s


Therefore,
s


Prob  x-μ 
t *  (1  )
N 

s
s


Prob  xt*    x+
t *  (1  )
N
N 

Part 4 – Statistical Inference
25/34
t distribution – values of t*
Part 4 – Statistical Inference
26/34
Normal Variance
In random sampling from the normal distribution,
( N  1) s 2
~  2 [ N  1]
2

Therefore,
 2

( N  1) s 2
2
Prob   / 2 


1 (  / 2)   (1   )
2



 1
2
1 
Prob  2

 2   (1   )
2
(  / 2) 
 1 (  / 2) ( N  1) s
 ( N  1) s 2
( N  1) s 2 
2
Prob  2
 
  (1  )
2
(  / 2) 
 1 (  / 2)
Part 4 – Statistical Inference
27/34
Part 4 – Statistical Inference
28/34
GSOEP Income Data
Descriptive Statistics for
1 variables
--------+--------------------------------------------------------------------Variable|
Mean
Std.Dev.
Minimum
Maximum
Cases Missing
--------+--------------------------------------------------------------------HHNINC|
.353343
.157058
.035000
1.500000
24
0
--------+---------------------------------------------------------------------
For the mean, t* for 24-1 = 23 degrees of freedom = 2.069
Confidence interval for mean is .353343 +/- 2.069 * (.15708/sqr(24))
= .353343 +/- .032064
Confidence interval for variance: Critical values from chi squared 23 are
11.69 and 38.08. Confidence interval for 2 is
(24-1).157082/38.08 to (24-1).157082/11.69 = .014903 to .048546
Confidence interval for  is
.122078 to .220332
2
Notice, not symmetric around s or s.
Part 4 – Statistical Inference
29/34
Large Sample Results
Part 4 – Statistical Inference
30/34
Confidence Intervals
Relying on the Central Limit Theorem
ˆ
(θ-θ)
d

 N [0,1]
ˆ
EstimatedVar[θ]


ˆ
(θ-θ)
Prob 
 z *  (1  )
 EstimatedVar[θ]

ˆ


Therefore, we use
ˆ  θ  θˆ  z * EstimatedVar[θ]
ˆ   (1  )
Prob θˆ  z * EstimatedVar[θ]


Part 4 – Statistical Inference
31/34
Interpretation of The Interval
that  will lie in specific intervals.
 (1-) percent of the time, the interval
will contain the true parameter

Part 4 – Statistical Inference
32/34
Application: Credit Modeling

1992 American Express analysis of
Application process: Acceptance or
rejection; X = 0 (reject) or 1 (accept).
Cardholder behavior


• Loan default (D = 0 or 1).
• Average monthly expenditure (E = \$/month)
• General credit usage/behavior (Y = number of
charges)
Part 4 – Statistical Inference
33/34
X in 100 samples with N = 144 in each sample
0.7809 is the true proportion in the population of 13,444 we are sampling from.
Part 4 – Statistical Inference
34/34
Estimates plus and minus 1 and 2 standard errors
```