### Inference for the mean vector

```Inference for the mean vector
Univariate Inference
Let x1, x2, … , xn denote a sample of n from the normal
distribution with mean m and variance s2.
Suppose we want to test
H0: m = m0 vs
HA: m ≠ m0
The appropriate test is the t test:
The test statistic:
x  m0
t n
s
Reject H0 if |t| > ta/2
The multivariate Test
Let x1 , x2 , , xn denote a sample of n from the p-variate
normal distribution with mean vector m and covariance
matrix S.
Suppose we want to test
H 0 : m  m0 vs
H A : m  m0
Roy’s Union- Intersection Principle
This is a general procedure for developing a
multivariate test from the corresponding univariate test.
1.
Convert the multivariate problem to a univariate problem by
considering an arbitrary linear combination of the
observation vector.
 X1 
 
i.e. observation vector X   
X p 
 
arbitrary linear combination of the observations
U  aX  a1 X1 
 ap X p
2.
3.
Perform the test for the arbitrary linear combination of the
observation vector.
Repeat this for all possible choices of
 a1 
 
a 
a p 
 
4.
5.
6.
Reject the multivariate hypothesis if H0 is rejected for any
one of the choices for a .
Accept the multivariate hypothesis if H0 is accepted for all
of the choices for a .
Set the type I error rate for the individual tests so that the
type I error rate for the multivariate test is a.
Application of Roy’s principle to the following situation
Let x1 , x2 , , xn denote a sample of n from the p-variate
normal distribution with mean vector m and covariance
matrix S.
Suppose we want to test
H 0 : m  m0 vs
H A : m  m0
Let ui  axi  a1x1i 
 ap xpi
Then u1, …. un is a sample of n from the normal
distribution with mean am and variance a Σa .
to test
H : am  am0 vs
a
0
H : am  am0
a
A
we would use the test statistic:
u  am0
t  n
su
a
1 n  1 n

Now u    ui     axi 
n  i 1  n  i 1

1 n 
1 n 
  a xi   a   xi   ax
n  i 1 
 n i 1 
and
n
n
2
1
1
2
2
su 
axi  ax 
 ui  u  



n  1 i 1
n  1 i 1
2
1 n
 a  xi  x 



n  1 i 1 
1 n



 a  xi  x   xi  x  a



 
n  1 i 1 
 1 n


a
xi  x  xi  x   a  aSa


 n  1 i 1

Thus
ax  am0
n
t  n

a  x  m0 
aSa
aSa
a
a


H
We will reject
0 : a m  a m0
n
if t 
a  x  m0   ta  / 2
aSa
a
or
t

a 2

n  a  x  m0  
aSa
2
 ta2 / 2
Using Roy’s Union- Intersection principle:
We will reject
H0 : m  m0 in favour of H A : m  m0
if
t

a 2

n  a  x  m0  
2
 ta2 / 2 for at least one a
aSa
We accept H0 : m  m0
if
t

a 2

n  a  x  m0  
aSa
2
 ta2 / 2 for all a
i.e.
We reject
H0 : m  m0
if max
n  a  x  m0  
a
2
aSa
 ta  / 2
2
We accept H0 : m  m0
if max
a
n  a  x  m0  
aSa
2
 ta2 / 2
Consider the problem of finding:
max
n  a  x  m0  
aSa
a
2
 max h  a 
a
where
ha 
h  a 
n
a
n  a  x  m0  
aSa
2
n

a  x  m0  x  m0  a
aSa
 aSa  2  x  m0  x  m0  a   a  x  m0  x  m0  a  2Sa 
or


2

 a Sa 
 aSa   x  m0   a  x  m0   Sa 
0
aSa
or a 
S1  x  m0   kS1  x  m0   aopt
a  x  m0 
thus
max h  a  
a
  x  m0  
n aopt

2
 Saopt
aopt
2



1
n  k  x  m0  S  x  m0  



k 2 x  m  S 1SS 1 x  m

0



1
 n  x  m0  S  x  m0 
0

Thus Roy’s Union- Intersection principle states:
We reject H0 : m  m0
if n  x  m0  S1  x  m0   ta2 / 2
We accept
H0 : m  m0

if n  x  m0  S1  x  m0   ta2 / 2
The statistic T 2  n  x  m0  S1  x  m0 
is called Hotelling’s T2 statistic
Choosing the critical value for Hotelling’s T2 statistic
We reject H0 : m  m0

1
2
if T  n  x  m0  S  x  m0   ta  / 2
2
2
To determine ta / 2, we need to find the sampling
distribution of T2 when H0 is true.
It turns out that if H0 is true than
n  p 2 n  p n

1
F
T 
x  m0  S  x  m0 

p  n  1
p  n  1
has an F distribution with n1 = p and n2 = n - p
Thus
Hotelling’s T2 test
We reject H0 : m  m0
or if
n p 2
F
T  Fa  p, n  p 
p  n  1
p  n  1

1
T  n  x  m0  S  x  m0  
Fa  p, n  p   Ta2
n p
2
Another derivation of Hotelling’s T2 statistic
Another method of developing statistical tests is
the Likelihood ratio method.
Suppose that the data vector, x , has joint density
 
f x
Suppose that the parameter vector,  , belongs to
the set W. Let w denote a subset of W.
Finally we want to test
H 0 :   w vs
H A : w
The Likelihood ratio test rejects H0 if
 ˆˆ 
L  
max L 
max f x 


 w
 w
 a



ˆ
max L 
max f x 
L 
 W
 W
 
 
ˆ
 
 

where   the MLE of 
ˆˆ
and   the MLE of  when H0 is true.
The situation
Let x1 , x2 , , xn denote a sample of n from the p-variate
normal distribution with mean vector m and covariance
matrix S.
Suppose we want to test
H 0 : m  m0 vs
H A : m  m0
The Likelihood function is:
n
L  m, S 
1
 2 
np / 2
S
n/2
e

1

 xi  m  S1  xi  m 
2 i1
and the Log-likelihood function is:
l  m, S  ln L  m, S
np
n
1 n
 1
  ln 2  ln S    xi  m  S  xi  m 
2
2
2 i 1
the Maximum Likelihood estimators of
m and S
are
n
1
mˆ  x   xi
n i 1
and
n
1
ˆS    x  x  x  x   n  1 S
i
i
n i 1
n
the Maximum Likelihood estimators of m and S
when H 0 is true are:
ˆˆ
m  mˆ 0
and
1 n
ˆˆ
S    xi  m0  xi  m0 
n i 1
The Likelihood function is:
n
L  m, S 
now

n
i 1
1
 2 
 
np / 2
S

n/2
e

1

 xi  m  S1  xi  m 
2 i1
1
 n 1
1
ˆ
ˆ
ˆ
xi  m S xi  m    xi  x   n S   xi  x 

n
i 1
1
 n 1

  tr  xi  x   n S   xi  x 


i 1
n

1


tr  S   xi  x  xi  x 



i 1
n
n
n 1
n


 
1
n
 n 1 tr  S    xi  x  xi  x   

  i 1

n
n 1
tr  n  1 I  = nn1  n  1 p =np
 
ˆ , Sˆ 
L
m
Thus
1
 2 
ˆˆ ˆˆ 

similarly L  m , S  
np / 2 n 1
n
S
n/2
e
1
 2 
np / 2
ˆˆ
S
n/2

np
2
e

np
2
and
ˆˆ ˆˆ 

L  m, S 

 
L mˆ , Sˆ
 

n 1
n
ˆˆ
S
S
n/2
n/2

 n  1 S
n
n 1
n
n
1

x

m
x

m




i
0
i
0
n i 1
n/2

x

m
x

m



 i 0 i 0
i 1
S
n/2
n/2
n/2
Note:
Let
 A11
A
 A21
A12   u w



A22    w V 
 A11 A22  A21 A111 A12

A 
1
A
A

A
A
 22 11 12 22 A21

1
 u V  u ww

 V u  wV 1w

1
Thus u V  ww  V u  wV 1w
u




and
1
V  ww
1

wV w
u
 1
V
u
Now
 n  1 S

n/2
n

x

m
x

m
  i 0  i 0 
n/2
i 1
 n  1 S
and

2/ n

n

x

m
x

m



 i 0 i 0
i 1
Also
 =  x  x  x  m  x  x  x  m 
x

m
x

m



 i 0 i 0  i
0
i
0
n
n
i 1
i 1
=  xi  x  xi  x    x  m0    xi  x 
n
n
i 1
i 1
= 
n
i 1
 n


  x  m0    xi  x   n  x  m0  x  m0 
 i 1

x x x x  n x m x m 
i

i


0

0

=  xi  x  xi  x   n  x  m0  x  m0 
n
i 1
=  n  1 S  n  x  m0  x  m0 
Thus

2/ n

 n  1 S
n

x

m
x

m
  i 0  i 0 
i 1

 n  1 S

 n  1 S  n  x  m0  x  m0 

S
n

S
x  m0  x  m0 

n 1
n

S
x

m
x

m

0 
0
n 1
2/ n


S
1
using V  ww
1

wV w
u
 1
V
u
Thus
u  n  1,
V  S and
w  n  x  m0 
Then
 2 / n  1 

1
n  x  m0  S  x  m0 
n 1
Thus to reject H0 if  < a i.e.  2/ n  an
or  2/ n  an
and 1 
n  x  m0  S 1  x  m0 
n 1
 a
n

1
n
or n  x  m0  S  x  m0    n  1 a -1
This is the same as Hotelling’s T2 test if
p  n  1
2/ n

Fa  p, n  p 
 n  1 a -1  Ta 
n p
Example
For n = 10 students we measure scores on
–
–
–
–
Math proficiency test (x1),
Science proficiency test (x2),
English proficiency test (x3) and
French proficiency test (x4)
The average score for each of the tests in previous
years was 60. Has this changed?
The data
Student
1
2
3
4
5
6
7
8
9
10
Math
81
73
61
55
61
52
56
65
54
48
Science
89
79
86
70
71
70
74
87
76
71
Eng
73
73
81
76
61
56
56
73
69
62
French
74
74
81
73
66
58
56
69
72
63
Summary Statistics
 60.6
 77.3
x   68.0


 68.6



 102.044
S   56.689
 41.222
 39.489


 0.0245
Note : S 1   -0.0255
 0.0195
 -0.0218

56.689
56.456
42.000
35.356
-0.0255
0.0567
-0.0405
0.0267
0.0195
-0.0405
0.1782
-0.1783
41.222
42.000
75.778
65.111

 60 
-0.0218 


60 
0.0267 

m0  60
-0.1783 


 60 
0.2040 



T 2  n  x  m0  S 1  x  m0   151.135

0.05
T

p  n  1
n p
F0.05  p, n  p  
4 9
6

39.489 
35.356 
65.111 
61.378 

F0.05  4,6  =
4 9
6
4.53  27.18
Simultaneous Inference for means
Recall

T  n  x  m  S 1  x  m 
2
 max t
a
2
 a   max
a
n  ax  am 
2
aS 1a
(Using Roy’s Union Intersection Principle)
Now


2

1

P T  Ta   P n x  m S x  m  Ta 


2


n ax  am


 P max
 Ta

1
 a

aS a


 n ax  am 2




P

T
for
all
a
a
1


aS a


1
2


aS a 
 P  ax  am 
Ta for all a 
n









 1a



Thus


aS 1a 
aS 1a 
P  ax 
Ta  am  ax 
Ta for all a 
n
n


 1a
and the set of intervals
1
1


aS a 
aS a 
ax 
Ta to ax 
Ta
n
n
Form a set of (1 – a)100 % simultaneous
confidence intervals for am
Recall

Ta
n -1 p p ,n p

=
F
n p
a
Thus the set of (1 – a)100 % simultaneous
confidence intervals for am
1

a S a  n -1 p p ,n  p
ax 
Fa
n
n p
aS 1a  n -1 p p ,n  p
to ax 
Fa
n
n p
The two sample problem
Univariate Inference
Let x1, x2, … , xn denote a sample of n from the
normal distribution with mean mx and variance s2.
Let y1, y2, … , ym denote a sample of n from the
normal distribution with mean my and variance s2.
Suppose we want to test
H0: mx = my vs
HA: mx ≠ my
The appropriate test is the t test:
The test statistic:
s pooled 
xy
t
1 1
s pooled

n m
2
2
n

1
s

m

1
s
  x 
 y
nm2
Reject H0 if |t| > ta/2 d.f. = n + m -2
The multivariate Test
Let x1 , x2 , , xn denote a sample of n from the p-variate
normal distribution with mean vector m x and covariance
matrix S.
Let y1 , y2 , , ymdenote a sample of m from the p-variate
normal distribution with mean vector m yand covariance
matrix S.
Suppose we want to test
H 0 : m x  m y vs
H A : mx  m y
Hotelling’s T2 statistic for the two sample problem
T 
2
1
1 1 

 n m 

1
x

y
S
  pooled  x  y 
n 1
m 1
S pooled 
Sx 
Sy
nm2
nm2
if H0 is true than
n  m  p 1 2
F
T
p  n  m  2
has an F distribution with n1 = p and
n2 = n +m – p - 1
Thus
Hotelling’s T2 test
We reject H0 : mx  m y
n  m  p 1 2
if F 
T  Fa  p, n  m  p  1
p  n  m  2
with T 
2
S pooled
1
1 1 
 n  m 
 S 1
x

y
  pooled  x  y 
n 1
m 1

Sx 
Sy
nm2
nm2
Simultaneous inference for the
two-sample problem
• Hotelling’s T2 statistic can be shown to have
been derived by Roy’s Union-Intersection
principle
namely T 
2
1

1
1
 n  m 
 1
x  y   S pooled
x  y 




2
 a x  y   


2
 max t  a   max
a
a
1 1

aS pooled a   
n m
where   mx  m y

Thus


n  m  p 1 2
1a  P F 
T  Fa  p, n  m  p  1 
p  n  m  2


 2 p  n  m  2

 P T 
Fa  p, n  m  p  1 
n  m  p 1


 P T 2  Ta 
p  n  m  2
where Ta 
Fa  p, n  m  p  1
n  m  p 1

Thus


2
 a x  y   


  T    1a
P  max 
a
a
1
1





a S pooled a   


n m


2
  a x  y   




or P 
 Ta for all a   1  a
1 1 
 aS

a


pooled 


n m




Thus
2

1 1




P  a x  y  
 Ta aS pooled a    for all a   1  a

n m




Hence

1 1

P  a  x  y   Ta aS pooled a
  a  m x  m y 
n m

 a  x  y   Ta


1 1
aS pooled a
 for all a   1  a
n m

Thus
a  x  y   Ta

1 1
aS pooled a

n m
form 1 – a simultaneous confidence intervals for
a  m x  m y 
Hotelling’s T2 test
A graphical explanation
Hotelling’s T2 statistic for the two sample problem
T 
2
1
1 1 
 n  m 
where S pooled
 S 1
x

y
  pooled  x  y 
n 1
m 1

Sx 
Sy
nm2
nm2
 a  x  y  


2
2
T  max t  a   max
a
a
1 1

aS pooled a   
n m
2
Note :
ax  ay
t a 
1 1
aS pooled a

n m
is the test statistic for testing:
H0  a  : amx  amy vs H A  a  : amx  amy
Hotelling’s T2 test
X2
Popn A
Popn B
X1
Univariate test for X1
X2
Popn A
Popn B
X1
Univariate test for X2
X2
Popn A
Popn B
X1
Univariate test for a1X1 + a2X2
X2
Popn A
Popn B
X1
Mahalanobis distance
A graphical explanation
Euclidean distance
d
2
 a, b    a  b   a  b     a  b 

p
i 1
i
2
i
points equidistant
from a
a
Mahalanobis distance: S, a covariance matrix
d
2
M
 a, b S   a  b  S  a  b  points equidistant


from a
a
Hotelling’s T2 statistic for the two sample problem
1 1  2
 S1
2

T

x

y
x

y

d
  pooled   M  x , y , S pooled 
 n m 
1
T 
2
 S 1
x

y
  pooled  x  y 
1 1 
 n  m 
nm

1

x

y
S


pooled  x  y 
nm
nm


 d  x, y,
S pooled 
nm


2
M
Case I
X2
Popn A
Popn B
X1
Case II
X2
Popn A
Popn B
X1
Case II
Case I
X2
X2
Popn A
Popn A
Popn B
X1
Popn B
X1
In Case I the Mahalanobis distance between the mean
vectors is larger than in Case II, even though the
Euclidean distance is smaller. In Case I there is more
separation between the two bivariate normal
distributions
```