Report

Chapter 4: Properties of the Least Squares Estimators Least-squares estimators. The calculus. Unconstrained optimization. Derivation. (one variable) Unconstrained optimization. (two variables) Properties. (we are here) (we are here) Inference. Key concepts: Parameters. Estimates. Bias Sampling variability. Best linear unbiased estimation. Simulation (MATLAB). Properties of the Least Squares Estimators Assumptions of the Simple Linear Regression Model SR1. yt 1 2 xt et SR2. E (et ) 0 E ( yt ) 1 2 xt SR3. var(et ) 2 var( yt ) SR4. cov(ei , e j ) cov( yi , y j ) 0 SR5. xt is not random and takes at least two values SR6. et ~ N (0, 2 ) (optional) yt ~ N [(1 2 xt ), 2 ] 4.1 The Least Squares Estimators as Random Variables The least squares estimator b2 of the slope parameter 2 , based on a sample of T observations, is b2 T xt yt xt yt T xt2 xt 2 (3.3.8a) The least squares estimator b1 of the intercept parameter 1 is where b1 y b2 x y yt / T and x xt / T (3.3.8b) are the sample means of the observations on y and x, respectively. When the formulas for b1 and b2, are taken to be rules that are used whatever the sample data turn out to be, then b1 and b2 are random variables. In this context we call b1 and b2 the least squares estimators. When actual sample values, numbers, are substituted into the formulas, we obtain numbers that are values of random variables. In this context, we call b1 and b2 the least squares estimates. 4.2 The Sampling Properties of the Least Squares Estimators 4.2.1 The Expected Values of b1 and b2 We begin by rewriting the formula in equation 3.3.8a into the following one that is more convenient for theoretical purposes, b2 2 wt et (4.2.1) where wt is a constant (non-random) given by wt xt x ( xt x )2 (4.2.2) The expected value of a sum is the sum of the expected values (see Chapter 2.5.1): E (b2 ) E 2 wt et E (2 ) E ( wt et ) 2 wt E (et ) 2 [since E (et ) 0] 4.2.1a The Repeated Sampling Context Table 4.1 contains least squares estimates of the food expenditure model from 10 random samples of size T=40 from the same population Table 4.1 Least Squares Estimates from 10 Random Samples of size T=40 n b1 b2 1 51.1314 0.1442 2 61.2045 0.1286 3 40.7882 0.1417 4 80.1396 0.0886 5 31.0110 0.1669 6 54.3099 0.1086 7 69.6749 0.1003 8 71.1541 0.1009 9 18.8290 0.1758 10 36.1433 0.1626 4.2.1b Derivation of Equation 4.2.1 1 2 2 2 2 2 ( x x ) x 2 x x T x x 2 x t t t t T xt T x T xt2 2T x 2 T x 2 xt2 T x 2 (x x ) x 2 t 2 t T x 2 xt2 x xt xt2 x 2 t T (4.2.4b) To obtain this result we have used the fact that x xt / T , so xt T x . ( x x )( y t t y ) xt yt Tx y xt yt x y t t T (4.2.5) b2 in deviation from the mean form is: b2 ( x x )( y y ) (x x ) t t 2 t (4.2.6) Recall that (x t x) 0 (4.2.7) Then, the formula for b2 becomes b2 ( x x )( y y ) ( x x ) y y ( x x ) ( x x ) (x x ) t t t t t 2 2 t t (x x ) (x x ) y ( x x ) ( x x ) t t t 2 t t y wt yt 2 t (4.2.8) where wt is the constant given in equation 4.2.2. To obtain equation 4.2.1, replace yt by yt 1 2 xt et and simplify: b2 wt yt wt (1 2 xt et ) 1 wt 2 wt xt wt et w t 0 , this eliminates the term 1 wt . w x t t 1 , so 2 wt xt 2 , and (4.2.9a) simplifies to equation 4.2.1 b2 2 wt et (4.2.9b) The term w t 0 , because ( xt x ) 1 w t ( x x )2 ( x x )2 ( xt x ) 0 t t using ( x x ) 0 t To show that wt xt 1 we again use ( xt x ) 0 . 2 Another expression for ( xt x ) is ( x x ) ( x x )( x x ) ( x x )x x ( x x ) ( x x )x 2 t t t t t t t t Consequently wt xt (x x )x (x x )x (x x ) (x x )x t t 2 t t t t t 1 4.2.2 The Variances and Covariance of b1 and b2 var(b2 ) E[b2 E (b2 )]2 If the regression model assumptions SR1-SR5 are correct (SR6 is not required), then the variances and covariance of b1 and b2 are: 2 x t 2 var(b1 ) 2 T ( xt x ) 2 var(b2 ) ( xt x )2 x cov(b1 , b2 ) 2 ( xt x ) 2 (4.2.10) Let us consider the factors that affect the variances and covariance in equation 4.2.10. 2 1. The variance of the random error term, , appears in each of the expressions. 2. The sum of squares of the values of x about their sample mean, ( x t x ) 2 , appears in each of the variances and in the covariance. 1. The larger the sample size T the smaller the variances and covariance of the least squares estimators; it is better to have more sample data than less. 2. The term x2 appears in var(b1). 3. The sample mean of the x-values appears in cov(b1,b2). Deriving the variance of b2: The starting point is equation 4.2.1. var(b2 ) var 2 wt et var wt et = wt2 var(et ) =2 wt2 2 ( xt x )2 [since 2 is a constant] [using cov(ei , e j ) 0] [using var(et ) 2 ] The very last step uses the fact that 2 ( x x ) 1 2 t w t 2 2 2 ( x x ) t ( x x ) t (4.2.12) 4.2.3 Linear Estimators The least squares estimator b2 is a weighted sum of the observations yt, b2 wt yt Estimators like b2, that are linear combinations of an observable random variable, linear estimators 4.3 The Gauss-Markov Theorem Gauss-Markov Theorem: Under the assumptions SR1-SR5 of the linear regression model the estimators b1 and b2 have the smallest variance of all linear and unbiased estimators of 1 and 2. They are the Best Linear Unbiased Estimators (BLUE) of 1 and 2 1. The estimators b1 and b2 are “best” when compared to similar estimators, those that are linear and unbiased. The Theorem does not say that b1 and b2 are the best of all possible estimators. 2. The estimators b1 and b2 are best within their class because they have the minimum variance. 3. In order for the Gauss-Markov Theorem to hold, the assumptions (SR1-SR5) must be true. If any of the assumptions 1-5 are not true, then b1 and b2 are not the best linear unbiased estimators of 1 and 2. 4. The Gauss-Markov Theorem does not depend on the assumption of normality 5. In the simple linear regression model, if we want to use a linear and unbiased estimator, then we have to do no more searching 6. The Gauss-Markov theorem applies to the least squares estimators. It does not apply to the least squares estimates from a single sample. Proof of the Gauss-Markov Theorem: * Let b2 kt yt (where the kt are constants) be any other linear estimator of 2. Suppose that kt wt ct , where ct is another constant and wt is given in equation 4.2.2. Into this new estimator substitute yt and simplify, using the properties of wt in equation 4.2.9. b2* kt yt ( wt ct ) yt ( wt ct )(1 2 xt et ) ( wt ct ) 1 ( wt ct ) 2 xt ( wt ct ) et 1 wt 1 ct 2 wt xt 2 ct xt ( wt ct ) et 1 ct 2 2 ct xt ( wt ct ) et since wt = 0 and wt xt = 1. Hence: E (b2* ) 1 ct 2 2 ct xt ( wt ct )E (et ) 1 ct 2 2 ct xt (4.3.2) In order for the linear estimator b2* kt yt to be unbiased it must be true that c t 0 and c x t t 0 These conditions must hold in order for b2* kt yt to be in the class of linear and unbiased estimators So we will assume the conditions (4.3.3) hold and use them to simplify expression (4.3.1): b2* kt yt 2 ( wt ct ) et (4.3.4) We can now find the variance of the linear unbiased * estimator b2 following the steps in equation 4.2.11 and using the additional fact that ct ( xt x ) 1 x c w c x c 0 t t ( x x )2 ( x x )2 t t 2 t t t ( xt x ) Use the properties of variance to obtain: var(b2* ) var 2 ( wt ct ) et ( wt ct ) var(et ) 2 2 ( wt ct ) 2 wt2 2 ct2 2 var(b2 ) 2 ct2 var(b2 ) since ct2 0 4.4 The Probability Distribution of the Least Squares Estimators If we make the normality assumption, assumption SR6 about the error term, then the least squares estimators are normally distributed. 2 xt2 b1 ~ N 1 , T ( x x ) 2 t 2 b2 ~ N 2 , 2 ( x x ) t If assumptions SR1-SR5 hold, and if the sample size T is sufficiently large, then the least squares estimators have a distribution that approximates the normal distributions shown in equation 4.4.1 4.5 Estimating the Variance of the Error Term The variance of the random variable et is var( et ) 2 E[et E (et )]2 E (et2 ) (4.5.1) if the assumption E(et)=0 is correct. Since the “expectation” is an average value we might consider estimating 2 as the average of the squared errors, ˆ 2 e 2 t T But we don’t observe the errors, only the residuals. The least squares residuals are obtained by replacing the unknown parameters by their least squares estimators, eˆt yt b1 b2 xt This leads to: ˆ 2 eˆ 2 t T (4.5.3) There is a simple modification that produces an unbiased estimator, and that is ˆ 2 eˆ 2 t T 2 E (ˆ 2 ) 2 (4.5.4) 4.5.1 Estimating the Variances and Covariances of the Least Squares Estimators Replace the unknown error variance 4.2.10 by its estimator to obtain: 2 in equation xt2 ˆ b1 ) ˆ var( , 2 T ( xt x ) 2 ˆ 2 ˆ b2 ) var( , 2 ( xt x ) ˆ b1 ) se(b1 ) var( ˆ b2 ) se(b2 ) var( x ˆ b1 , b2 ) ˆ 2 cov( 2 ( x x ) t (4.6.6) 4.6.2 The Estimated Variances and Covariances for the Food Expenditure Example Table 4.1 Least Squares Residuals for Food Expenditure Data yˆ b1 b2 x eˆ y yˆ y 52.25 58.32 81.79 119.90 125.80 73.9045 84.7834 95.2902 100.7424 102.7181 21.6545 26.4634 13.5002 19.1576 23.0819 ˆ 2 eˆ 2 t T 2 54311.3315 1429.2456 38 2 x 21020623 t 2 ˆ ˆ var(b1 ) 1429.2456 40(1532463) 490.1200 2 T ( x x ) t ˆ b1 ) 490.1200 22.1387 se(b1 ) var( ˆ 2 1429.2456 ˆ b2 ) var( 0.0009326 2 ( x x ) 1532463 t ˆ b2 ) 0.0009326 0.0305 se(b2 ) var( x 698 ˆ b1 , b2 ) ˆ cov( 1429.2456 2 1532463 0.6510 ( x x ) t 2 On the issue of understanding sampling variability, there are two important exercises. Some experiments with MATLAB will follow in the laboratory. There we will use the least-squares estimator for the model y1, y2, …, yN ~ iid N(, 2). Hence, prove, using the Gauss-Markov theorem, that the statistic, yˆ is BLUE. yi N