### lecture3

```PCA: Lecture 3
Extensions of PCA and Related Tools
• Extended EOF (EEOF), Singular spectrum analysis (SSA), M-SSA
• Canonical Correlation Analysis (CCA)
• Others
• Complex EOFs
• Maximum Covariance Analysis
• Principal Oscillation Patterns (POP)
• Independent Component Analysis (ISA)
Singular Spectrum Analysis (SSA) or Extended EOF (EEOF)
• PCA makes use of correlation in SPACE
• Weather and climate data (and other geoscience data) usually have high correlation in
space.
• PCA is a useful tool to learn about large scale patterns that explain most of the
variability.
• Since PCs find the combination of variables which explain most of the variability it is
implied that PCs make use of the usually observed high correlation in space.
• But geoscience data are often correlated in TIME
• PCA does not take this into account
• Auto and cross-correlation in time can be very useful for prediction purposes and also
for building probabilistic time series models.
• SSA/EEOFs used to handle temporal correlation
• EEOFs are an extension of the traditional EOF technique to deal not only with spatialbut also with temporal correlations observed in (weather/climate) data
• it is based on the auto-covariance matrix (instead of the usual spatial covariance matrix
from PCA)
• normally used to find propagating or periodic signals in the data
Extended EOF (EEOF)
Implementation for the univariate case
• consider a single times series: xt, t = 1, … , n
• like PCA, eigenvectors and eigenvalues are extracted from the covariance matrix
• The covariance matrix is calculated using a delay window or imposing an
embedding dimension of length M on the time series
x1 , x2 , x3 , x4 ,, xn3 , xn2 , xn1 , xn
x(1)
 x1 
  x2 
 x3 
x( 2 )
x( n  3 )
 x2 
  x3 
 x4 
 xn  3 
  xn  2 
 xn 1 
x( n  2 )
 xn  2 
  xn 1 
 xn 
Singular Spectrum Analysis (SSA)
• Terminology
• SSA is the application of PCA to time series
• also know as EEOFs and Time PCs (T-PCs or T-EOFs)
• when applied to multivariate data (many time series) it is known as multichannel singular spectrum analysis (M-SSA)
• Summary of what it does
• application of PCA to time series which is structured into overlapping moving
windows of data
• the data vectors are fragments of time series rather than spatial distributions
of values at a single time
• the eigenvectors therefore represent characteristic time patterns, rather than
characteristic spatial patterns
• used mainly to identify oscillatory features in the time series
Singular Spectrum Analysis (SSA)
Example application: searching for the sub-seasonal oscillations in the Tropical Pacific
From Hannachi et al., Int. J. Clim., 2007
Singular Spectrum Analysis (SSA)
Applying PCA and then SSA gives:
First PC/EOF is the seasonal cycle
From Hannachi et al., Int. J. Clim., 2007
Singular Spectrum Analysis (SSA)
EPCs 4 and 5
Semi-annual variation in OLR
EEOF/SSA can detect oscillatory or quasi-oscillatory features in the time series
- as a pair of (degenerate) T-PCs
- with same shape but offset by ¼ cycle
- compare with Fourier analysis and pairs of sine, cosine functions
EPCs 8 and 9
(MJO), an eastward
propagating wave of tropical
convective anomalies
(dominant mode of intraseasonal tropical variability)
From Hannachi et al., Int. J. Clim., 2007
Canonical Correlation Analysis (CCA)
• Definition of CCA
• identifies a sequence of pairs of patterns in 2 multivariate data sets, and constructs
sets of transformed variables by projecting the original data onto these patterns
• Difference between PCA and CCA
• PCA looks for patterns with a single multivariate dataset that represent maximum
amounts of the variation in the data
• In CCA, the patterns are chosen such that the projected data onto these patterns
exhibit maximum correlation – while being uncorrelated with the projections onto any
other pattern
• In other words: CCA identifies new variables that maximize the inter-relationships
between two data sets, in contrast to the patterns describing the internal variability within
a single dataset from PCA.
• Can be thought of as an extension to multiple regression
• instead of predicting a scalar y, we are predicting a vector y
Canonical Correlation Analysis (CCA)
• Applications
• In the atmospheric sciences, CCA has been used in diagnostic climatological studies,
in the forecast of El Nino, and the forecast of long-range temperature and precipitation.
• Example for a geophysical field:
• vector x containing observations of one variable at a set of locations
• vector y containing observations of a different variable at a set of locations that may be
the same or different to those in x.
• typically the data are time series of the observations of the two fields
• x and y could be observed at the same time (coupled variability)
• x and y could be lagged in time (statistical prediction)
Canonical Correlation Analysis (CCA)
How to do it:
•
CCA extracts relationships between pairs of data vectors x and y from
their joint covariance matrix
•
Remember: PCA is applied to the covariance matrix of x only
1) Concatenate x and y into a single vector, cT = [xT, yT]
2) Partition the covariance matrix of c, Sc into four blocks:
S 
  S 
S xx 
1
T
[Sc ] 
[C ] [C ]  
n 1
 S yx
xy
yy
3) Transform the data, x and y, into sets of new variables (canonical variates),
v and w:
v = aTx
w = bT y
where a and b are linear weights (like eigenvectors) called canonical
vectors
Canonical Correlation Analysis (CCA)
• Some things to note:
• the number of pairs of canonical variates is the min(dim(x), dim(y))
• a and b are chosen such that
• corr[v1, w1] >= corr[v2,w2] >= … >= corr[vm,wm] >= 0 (each of the M pairs of
canonical variates exhibits no greater correlation than the previous pair)
• corr[vk, wm] = rC(m) for k = m; corr[vk, wm] = 0 for k != m, where rC = canonical
correlations (each canonical variate is uncorrelated with all other variates except its
twin in the mth pair)
• Calculation of canonical vectors and variates
• eigen decomposition to get two sets of eigenvectors, em and fm
• and shared eigenvalues; rC = sqrt(λ)
• also can be done using SVD
• Combining CCA and PCA
• sometimes it is worth performing PCA on the two fields x and y and then CCA on the
Canonical Correlation Analysis (CCA)
• A simple example
• consider two normally distributed 2-D variables x and y with unit variance
• let y1 + y2 = x1 + x2
• the correlation between x and y :
0.5 0.5
Rxy  

0
.
5
0
.
5


• which is relatively weak despite the perfect linear relationship between x and y
• If we apply CCA:
• the largest and only canonical correlation is 1
• and this lies along the direction of the linear relationship
• if we project the data onto the canonical vectors, then the correlation matrix is
1 0
Rxy  

0
1


Canonical Correlation Analysis (CCA)
Example application: Prediction of Wildfire in the Western U.S.
• Seasonal wildfire forecasts based on spring PDSI
• Use CCA to form linear relationships between
PCs of seasonal acres burned (field 1) and PDSI
(field 2)
• Find optimally correlated patterns in the area
burned and preceding soil moisture.
• A linear forecast model was constructed using
the first three canonical correlation pairs (CCs)
calculated for the six area burned and six PDSI
PCs.
• BUT Longer lead time forecasts needed
• Previously forecasts were based on March/April
PDSI data but policy decisions must be made
many months before the fire season.
• So use CCA to form relationships between
Prediction of area burned for 2003 fire season
previous year’s Pacific SSTs and Jan PDSI
From “Westerling et al., 2003, Statistical Forecasts of the 2003 Western Wildfire Season
Using Canonical Correlation Analysis”
Other Extensions and Some Relatives
• Complex-EOF
To extend the EOF analysis to the study of spatial structures that can propagate in time, one
can perform a complex principal component analysis in the frequency domain.
• Maximum Covariance Analysis (MCA)
Finds linear combinations of two sets of vector data, x and y, that maximizes their covariance
(CCA maximizes their correlation).
• Independent Component Analysis (ICA)
ICA seeks directions that are most statistically independent. i.e. that minimize the mutual
information between the data.
• Principal Oscillation Patterns (POP)
POPs are used to examine the oscillation properties and spatial structure of dynamical
processes in the atmosphere
```