Report

FOUR METHODS OF ESTIMATING PM2.5 ANNUAL AVERAGES Yan Liu and Amy Nail Department of Statistics North Carolina State University EPA Office of Air Quality, Planning, and Standards Emissions Monitoring, and Analysis Division Project Objectives Estimation of annual average of PM2.5 concentration Estimation of standard errors associated with annual average estimates Estimation of the probability that a site’s annual average exceeds 15 mg/m3 At 2400 lattice points for 2000, 2001 Comparisons of 4 different methodologies: 1. Quarter-based analysis (Yan) 2. Annual-based analysis (Yan) Daily-based analyses: 3. “Doug’s method” (Bill) 4. Generalized least squares in SAS Proc Mixed (Amy) Why are Standard Errors Important? We may estimate that the annual average for lattice point 329 is 16 mg/m3, which exceeds the standard of 15. But since our estimate has some uncertainty or standard error, we’d like to take this uncertainty into account in order to determine the probability that lattice point 329 exceeds 15. In addition to maps like this ... …we also want maps like this. Note: This Map is WRONG--so don’t show it to anyone! We haven’t figured out the correct way to determine errors, so we cannot correctly draw a probability map yet. Map of 2400 Lattice Points Data Description Concentrations of PM2.5 measured during 2000, 2001 The domain analyzed: the portion of the U.S. east of –100o longitude Concentrations measured every third day Methods 3 & 4 - Daily-Based Used every third day data (122 days per year) Kriged each day to obtain predictions at 2400 lattice points At each lattice point fit a timeseries to the 122 days’ estimates to estimate annual average Calculated timeseries error for annual average (using proc arima) Method 4 - “Amy’s Method” Fit a quadratic surface using Generalized Least Squares in SAS Proc Mixed Restricted (or residual) Maximum Likelihood used to estimate all parameters Did not assume errors iid when fitting quad surf, so coefficients in quad surf estimated based on cov structure Specified an exponential covariance structure with a nugget Estimated each parameter each day Model for one day Yij = o + 1i + 2i2 + 3j + 4j2 + 5ij + ij Where i = lattitude j = longitude E(ij) = 0 Cov(ij, I’j’) = 2n + 2e-dist/ i=i’and j=j’ 2e-dist/ ii’ or j j’ Model for one site Yk = µ + (Yk-1- µ) + ek k = 1,…,122 Where E(ek) = 0 Var (ek) = 2 Note: this is an AR1 model. The errors are iid (0, 2) because the temporal correlation is accounted for using the (Yk-1- µ) term. What if we “propagate” errors? At a given lattice point we have 122 days’ worth of predictions, each with a kriging prediction error. What if we treat the 122 days as independent observations (they aren’t, they are AR1) and combine the errors accordingly? We do this for each of our 2400 lattice points. The Big Problem None of our standard error estimates are correct! We need to learn how to put spatial error components together with temporal error components. Model for all sites and days? Yijk = o,k + 1,ki + 2,ki2 + 3,kj + 4,kj2 + 5,kij + ijk + eijk Where E(ijk ) = 0, E(eijk) = 0 We’ve assumed isotropy and stationarity for simplicity. But how do we model Cov(ijk, i’j’k’), Cov(eijk, ei’j’k’), and Cov (ijk, ei’j’k’)? Separability We’ve been treating the covariance structure as separable--meaning that the 1-D temporal and 2-D spatial covariance structures can be estimated separately and then can be mathematically combined to obtain a 3-D space-time covariance structure. We need to test for separability, and if the covariance components are separable, we need to appropriately combine them. We are just now learning how to do this. Next steps…. Investigate the separability of the covariance structure and the correct method for combining space and time covariance components. Attempt a 3-dimensional kriging. No assumption of separability is required to do this. We must, however, write our own code for this project because there is no software package (to our knowledge) that performs such an analysis. This method would allow us to use even more data than we are using now, as we would not be restricted to every third day. Other next steps…. Try two methods Stefanski recommended. One method avoids the issue of separability by treating the kriging prediction errors as measurement errors on the timeseries “observations.” The other method…