(Better) Bootstrap Confidence Intervals

Report
TAU Bootstrap Seminar 2011
Dr. Saharon Rosset
(Better)
Bootstrap Confidence Intervals
Shachar Kaufman
Based on Efron and Tibshirani’s
“An introduction to the bootstrap”
Chapter 14
Agenda
• What’s wrong with the simpler intervals?
• The (nonparametric) BCa method
• The (nonparametric) ABC method
– Not really
Example: simpler intervals are bad
 ≔ var 

=0  − 
≔

2
Example: simpler intervals are bad
Under the assumption that
 ,  ~ , Σ i.i.d.
Under the assumption that
 ,  ~ i.i.d.
 Have exact analytical interval
 Can do parametric-bootstrap
 Can do nonparametric bootstrap
Why are the simpler intervals bad?
• Standard (normal) confidence interval
assumes symmetry around 
• Bootstrap-t often erratic in practice
– “Cannot be recommended for general nonparametric
problems”
• Percentile suffers from low coverage
– Assumes nonp. distribution of  ∗ is representative of
 (e.g. has mean  like  does)
• Standard & percentile methods assume
homogenous behavior of , whatever  is
– (e.g. standard deviation of  does not change with )
A more flexible inference model
Account for higher-order
statistics
Mean
Standard deviation
Skewness
∗
A more flexible inference model
• If ~ ,  2 doesn’t work for the data, maybe we could
find a transform  ≔   and constants 0 and  for
which we can accept that
~  − 0  , 2
 ≔ 1 + 
• Additional unknowns
–  ⋅ allows a flexible parameter-description scale
– 0 allows bias: ℙ  <  = Φ 0
–  allows “ 2 ” to change with 
• As we know, “more flexible” is not necessarily “better”
• Under broad conditions, in this case it is (TBD)
Where does this new model lead?
~  − 0  , 2
 ≔ 1 + 

Assume known  and 0 = 0, and initially that  = ,0 ≔ 0, hence
,0 ≔ 1
Calculate a standard -confidence endpoint from this

  ≔ Φ−1  , ,1 ≔   ,0 = 
Now reexamine the actual stdev, this time assuming that

 = ,1
According to the model, it will be

,1 ≔ 1 + ,1 = 1 + 


Where does this new model lead?
~  − 0  , 2
 ≔ 1 + 
Ok but this leads to an updated endpoint

,2 ≔   ,1 =   1 + 
Which leads to an updated





2
,2 = 1 + 
1 + 
= 1 +  + 
If we continue iteratively to infinity this way we end up with
the confidence interval endpoint



,∞ =
1 −  
Where does this new model lead?
• Do this exercise considering 0 ≠ 0 and get

lo,∞
• Similarly for
0 +  
= 0 +
1 −  0 + 

up,∞
with 
1−

Enter BCa
• “Bias-corrected and accelerated”
• Like percentile confidence interval
– Both ends are percentiles  ∗
bootstap instances of  ∗
– Just not the simple
1 ≔ 
2 ≔ 1 − 
1
, ∗
2
of the 
BCa
• Instead
0 +  
1 ≔ Φ 0 +
1 −  0 + 

0 +  1−
2 ≔ Φ 0 +
1 −  0 +  1−
• 0 and  are parameters we will estimate
– When both zero, we get the good-old percentile CI
• Notice we never had to explicitly find  ≔  
BCa
• 0 tackles bias ℙ  <  = Φ 0
0 ≔ Φ−1
# ∗  < 

(since  is monotone)
•  accounts for a standard deviation of  which
varies with  (linearly, on the “normal scale” )
BCa
• One suggested estimator for  is via the jackknife

=1
≔
6
where 
and 
⋅

≔

=1




−
−
3
⋅
2 1.5
⋅
≔   without sample 
1


=1  
• You won’t find the rationale behind this formula in the
book (though it is clearly related to one of the standard
ways to define skewness)
Theoretical advantages of BCa
• Transformation respecting
– If the interval for  is lo , up then the interval
for a monotone   is  lo ,  up
– So no need to worry about finding transforms of 
where confidence intervals perform well
• Which is necessary in practice with bootstrap-t CI
• And with the standard CI (e.g. Fisher corrcoeff trans.)
• Percentile CI is transformation respecting
Theoretical advantages of BCa
• Accuracy

– We want lo s.t. ℙ  < lo = 
– But a practical lo is an approximation where

ℙ  < lo ≅ 
– BCa (and bootstrap-t) endpoints are “second order
accurate”, where
1

ℙ  < lo =  + 

– This is in contrast to the standard and percentile
1
methods which only converge at rate (“first order

accurate”)  errors one order of magnitude greater
But BCa is expensive
• The use of direct bootstrapping to calculate
delicate statistics such as 0 and  requires a
large  to work satisfactorily
• Fortunately, BCa can be analytically
approximated (with a Taylor expansion, for
differentiable   ) so that no Monte Carlo
simulation is required
• This is the ABC method which retains the good
theoretical properties of BCa
The ABC method
• Only an introduction (Chapter 22)
• Discusses the “how”, not the “why”
• For additional details see Diciccio and Efron
1992 or 1996
The ABC method
• Given the estimator in resampling form
= 
– Recall , the “resampling vector”, is an  dimensional
random variable with components  ≔ ℙ  = 1∗
– Recall
0
≔
1 1
1
, ,…,
 

• Second-order Taylor analysis of the estimate
– as a function of the bootstrap resampling
methodology
 
 
 ≔ 
,  ≔ 
0
0
=
=
The ABC method
• Can approximate all the BCa parameter estimates (i.e.
estimate the parameters in a different way)
1
2
– =
– =
1
6

2

=1 
1
2

3
=1 
2

2 3
=1 
– 0 =  − , where

•  ≔  − 
1
•  ≔ 22

=1 
•  ≔something akin to a Hessian component but along a specific
direction not perpendicular to any natural axis (the “least favorable
family” direction)
The ABC method
• And the ABC interval endpoint


1− ≔  +

0
• Where
–≔
–≔

1− 2
 0
with  ≔ 0 + 
1−
• Simple and to the point, aint it?

similar documents