### SPATIAL ANALYSIS OF PM2.5 DATA

```FOUR METHODS OF ESTIMATING
PM2.5 ANNUAL AVERAGES
Yan Liu and Amy Nail
Department of Statistics
North Carolina State University
EPA
Office of Air Quality, Planning, and Standards
Emissions Monitoring, and Analysis Division
Project Objectives





Estimation of annual average of PM2.5 concentration
Estimation of standard errors associated with annual
average estimates
Estimation of the probability that a site’s annual average
exceeds 15 mg/m3
At 2400 lattice points for 2000, 2001
Comparisons of 4 different methodologies:
1. Quarter-based analysis (Yan)
2. Annual-based analysis (Yan)
Daily-based analyses:
3. “Doug’s method” (Bill)
4. Generalized least squares in
SAS Proc Mixed (Amy)
Why are Standard Errors Important?

We may estimate that the annual
average for lattice point 329 is 16
mg/m3, which exceeds the standard of
15. But since our estimate has some
uncertainty or standard error, we’d like
to take this uncertainty into account in
order to determine the probability that
lattice point 329 exceeds 15.
In addition to maps like this ...
…we also want maps like this.
Note: This Map is WRONG--so don’t show it to anyone!
We haven’t figured out the correct way to determine
errors, so we cannot correctly draw a probability map yet.
Map of 2400 Lattice Points
Data Description
Concentrations of PM2.5 measured
during 2000, 2001
 The domain analyzed: the portion of
the U.S. east of –100o longitude
 Concentrations measured every third
day

Methods 3 & 4 - Daily-Based

Used every third day data (122 days per
year)
 Kriged each day to obtain predictions at
2400 lattice points
 At each lattice point fit a timeseries to the
122 days’ estimates to estimate annual
average
 Calculated timeseries error for annual
average (using proc arima)
Method 4 - “Amy’s Method”





Fit a quadratic surface using Generalized Least
Squares in SAS Proc Mixed
Restricted (or residual) Maximum Likelihood used
to estimate all parameters
Did not assume errors iid when fitting quad surf,
so coefficients in quad surf estimated based on
cov structure
Specified an exponential covariance structure
with a nugget
Estimated each parameter each day
Model for one day

Yij = o + 1i + 2i2 + 3j + 4j2 + 5ij + ij
Where i = lattitude j = longitude
 E(ij) = 0
 Cov(ij, I’j’) = 2n + 2e-dist/ i=i’and j=j’
2e-dist/
ii’ or j  j’

Model for one site
Yk = µ + (Yk-1- µ) + ek
k = 1,…,122
 Where E(ek) = 0
 Var (ek) = 2
 Note: this is an AR1 model. The errors
are iid (0, 2) because the temporal
correlation is accounted for using the
(Yk-1- µ) term.

What if we “propagate” errors?

At a given lattice point we have 122
days’ worth of predictions, each with
a kriging prediction error. What if we
treat the 122 days as independent
observations (they aren’t, they are
AR1) and combine the errors
accordingly? We do this for each of
our 2400 lattice points.
The Big Problem
None of our standard error estimates
are correct!
 We need to learn how to put spatial
error components together with
temporal error components.

Model for all sites and days?

Yijk = o,k + 1,ki + 2,ki2 + 3,kj + 4,kj2 +
5,kij + ijk + eijk
Where E(ijk ) = 0, E(eijk) = 0
 We’ve assumed isotropy and
stationarity for simplicity.
 But how do we model Cov(ijk, i’j’k’),
Cov(eijk, ei’j’k’), and Cov (ijk, ei’j’k’)?

Separability

We’ve been treating the covariance structure
as separable--meaning that the 1-D temporal
and 2-D spatial covariance structures can be
estimated separately and then can be
mathematically combined to obtain a 3-D
space-time covariance structure. We need to
test for separability, and if the covariance
components are separable, we need to
appropriately combine them. We are just now
learning how to do this.
Next steps….


Investigate the separability of the covariance
structure and the correct method for combining
space and time covariance components.
Attempt a 3-dimensional kriging. No assumption
of separability is required to do this. We must,
however, write our own code for this project
because there is no software package (to our
knowledge) that performs such an analysis. This
method would allow us to use even more data
than we are using now, as we would not be
restricted to every third day.
Other next steps….
Try two methods Stefanski
recommended.
 One method avoids the issue of
separability by treating the kriging
prediction errors as measurement errors
on the timeseries “observations.”
 The other method…

```