Report

TAU Bootstrap Seminar 2011 Dr. Saharon Rosset (Better) Bootstrap Confidence Intervals Shachar Kaufman Based on Efron and Tibshirani’s “An introduction to the bootstrap” Chapter 14 Agenda • What’s wrong with the simpler intervals? • The (nonparametric) BCa method • The (nonparametric) ABC method – Not really Example: simpler intervals are bad ≔ var =0 − ≔ 2 Example: simpler intervals are bad Under the assumption that , ~ , Σ i.i.d. Under the assumption that , ~ i.i.d. Have exact analytical interval Can do parametric-bootstrap Can do nonparametric bootstrap Why are the simpler intervals bad? • Standard (normal) confidence interval assumes symmetry around • Bootstrap-t often erratic in practice – “Cannot be recommended for general nonparametric problems” • Percentile suffers from low coverage – Assumes nonp. distribution of ∗ is representative of (e.g. has mean like does) • Standard & percentile methods assume homogenous behavior of , whatever is – (e.g. standard deviation of does not change with ) A more flexible inference model Account for higher-order statistics Mean Standard deviation Skewness ∗ A more flexible inference model • If ~ , 2 doesn’t work for the data, maybe we could find a transform ≔ and constants 0 and for which we can accept that ~ − 0 , 2 ≔ 1 + • Additional unknowns – ⋅ allows a flexible parameter-description scale – 0 allows bias: ℙ < = Φ 0 – allows “ 2 ” to change with • As we know, “more flexible” is not necessarily “better” • Under broad conditions, in this case it is (TBD) Where does this new model lead? ~ − 0 , 2 ≔ 1 + Assume known and 0 = 0, and initially that = ,0 ≔ 0, hence ,0 ≔ 1 Calculate a standard -confidence endpoint from this ≔ Φ−1 , ,1 ≔ ,0 = Now reexamine the actual stdev, this time assuming that = ,1 According to the model, it will be ,1 ≔ 1 + ,1 = 1 + Where does this new model lead? ~ − 0 , 2 ≔ 1 + Ok but this leads to an updated endpoint ,2 ≔ ,1 = 1 + Which leads to an updated 2 ,2 = 1 + 1 + = 1 + + If we continue iteratively to infinity this way we end up with the confidence interval endpoint ,∞ = 1 − Where does this new model lead? • Do this exercise considering 0 ≠ 0 and get lo,∞ • Similarly for 0 + = 0 + 1 − 0 + up,∞ with 1− Enter BCa • “Bias-corrected and accelerated” • Like percentile confidence interval – Both ends are percentiles ∗ bootstap instances of ∗ – Just not the simple 1 ≔ 2 ≔ 1 − 1 , ∗ 2 of the BCa • Instead 0 + 1 ≔ Φ 0 + 1 − 0 + 0 + 1− 2 ≔ Φ 0 + 1 − 0 + 1− • 0 and are parameters we will estimate – When both zero, we get the good-old percentile CI • Notice we never had to explicitly find ≔ BCa • 0 tackles bias ℙ < = Φ 0 0 ≔ Φ−1 # ∗ < (since is monotone) • accounts for a standard deviation of which varies with (linearly, on the “normal scale” ) BCa • One suggested estimator for is via the jackknife =1 ≔ 6 where and ⋅ ≔ =1 − − 3 ⋅ 2 1.5 ⋅ ≔ without sample 1 =1 • You won’t find the rationale behind this formula in the book (though it is clearly related to one of the standard ways to define skewness) Theoretical advantages of BCa • Transformation respecting – If the interval for is lo , up then the interval for a monotone is lo , up – So no need to worry about finding transforms of where confidence intervals perform well • Which is necessary in practice with bootstrap-t CI • And with the standard CI (e.g. Fisher corrcoeff trans.) • Percentile CI is transformation respecting Theoretical advantages of BCa • Accuracy – We want lo s.t. ℙ < lo = – But a practical lo is an approximation where ℙ < lo ≅ – BCa (and bootstrap-t) endpoints are “second order accurate”, where 1 ℙ < lo = + – This is in contrast to the standard and percentile 1 methods which only converge at rate (“first order accurate”) errors one order of magnitude greater But BCa is expensive • The use of direct bootstrapping to calculate delicate statistics such as 0 and requires a large to work satisfactorily • Fortunately, BCa can be analytically approximated (with a Taylor expansion, for differentiable ) so that no Monte Carlo simulation is required • This is the ABC method which retains the good theoretical properties of BCa The ABC method • Only an introduction (Chapter 22) • Discusses the “how”, not the “why” • For additional details see Diciccio and Efron 1992 or 1996 The ABC method • Given the estimator in resampling form = – Recall , the “resampling vector”, is an dimensional random variable with components ≔ ℙ = 1∗ – Recall 0 ≔ 1 1 1 , ,…, • Second-order Taylor analysis of the estimate – as a function of the bootstrap resampling methodology ≔ , ≔ 0 0 = = The ABC method • Can approximate all the BCa parameter estimates (i.e. estimate the parameters in a different way) 1 2 – = – = 1 6 2 =1 1 2 3 =1 2 2 3 =1 – 0 = − , where • ≔ − 1 • ≔ 22 =1 • ≔something akin to a Hessian component but along a specific direction not perpendicular to any natural axis (the “least favorable family” direction) The ABC method • And the ABC interval endpoint 1− ≔ + 0 • Where –≔ –≔ 1− 2 0 with ≔ 0 + 1− • Simple and to the point, aint it?