Principal component Analysis

Report
Principal Component Analysis
Philosophy of PCA



Introduced by Pearson (1901) and
Hotelling (1933) to describe the
variation in a set of multivariate data in
terms of a set of uncorrelated variables
We typically have a data matrix of n
observations on p correlated variables
x1,x2,…xp
PCA looks for a transformation of the xi
into p new variables yi that are
uncorrelated
The data matrix
case ht (x1) wt(x2) age(x3) sbp(x4) heart rate (x5)
1
175
1225
25
117
56
2
156
1050
31
122
63
n
202
1350
58
154
67
Reduce dimension
The simplet way is to keep one
variable and discard all others: not
reasonable!
 Wheigt all variable equally: not
reasonable (unless they have same
variance)
 Wheigted average based on some
citerion.
 Which criterion?

Let us write it first

Looking for a transformation of the data
matrix X (nxp) such that

Y= T X=1 X1+ 2 X2+..+ p Xp

Where =(1 , 2 ,.., p)T is a column vector
of wheights with
1²+ 2²+..+ p² =1
One good criterion
Maximize the variance of the projection of
the observations on the Y variables
 Find  so that

Var(T X)= T Var(X) 

is maximal
The matrix C=Var(X) is the covariance
matrix of the Xi variables
Let us see it on a figure
Good
Better
Covariance matrix
 v( x1 ) c(x1,x2 ) ........c(x1,x p ) 


c(x1,x2 ) v( x2 ) ........c(x2 ,x p ) 

C=




 c(x ,x ) c(x ,x )..........v( x ) 
2 p
p 
 1 p
And so.. We find that
The direction of  is given by the
eigenvector 1 correponding to the
largest eigenvalue of matrix C
 The second vector that is orthogonal
(uncorrelated) to the first is the one
that has the second highest variance
which comes to be the eignevector
corresponding to the second
eigenvalue
 And so on …

So PCA gives
New variables Yi that are linear
combination of the original variables
(xi):
 Yi= ai1x1+ai2x2+…aipxp ; i=1..p
 The new variables Yi are derived in
decreasing order of importance;
 they are called ‘principal components’

Calculating eignevalues and
eigenvectors
The eigenvalues i are found by
solving the equation
det(C-I)=0
 Eigenvectors are columns of the
matrix A such that
 1 0 ........0 
T


C=A D A
 0 2 .......0 

 Where
D=  0



 0 ............ 
p

An example

Let us take two variables with covariance c>0

C= 1 c 
 c 1
 
C-I= 1  
c
 c 1   


det(C-I)=(1- )²-c²

Solving this we find 1 =1+c
2 =1-c < 1
and eigenvectors

Any eigenvector A satisfies the condition
CA=A
 a1 
1 c   a1   a1  ca 2   a1 


A=   CA= 
   = 
 = 


a

a
2
 
 c 1  2   ca1  a2   a2 
Solving we find A1
A2
PCA is sensitive to scale
If you multiply one variable by a
scalar you get different results
(can you show it?)
 This is because it uses covariance
matrix (and not correlation)
 PCA should be applied on data that
have approximately the same scale
in each variable

Interpretation of PCA

The new variables (PCs) have a
variance equal to their corresponding
eigenvalue
Var(Yi)= i for all i=1…p
Small i  small variance  data
change little in the direction of
component Yi
 The relative variance explained by
each PC is given by i / i

How many components to keep?
Enough PCs to have a cumulative
variance explained by the PCs that is
>50-70%
 Kaiser criterion: keep PCs with
eigenvalues >1
 Scree plot: represents the ability of
PCs to explain de variation in data

Do it graphically
Interpretation of components
See the wheights of variables in each
component
 If Y1= 0.89 X1 +0.15X2-0.77X3+0.51X4
 Then X1 and X3 have the highest
wheights and so are the mots
important variable in the first PC
 See the correlation between variables
Xi and PCs: circle of correlation

Circle of correlation
Normalized (standardized) PCA
If variables have very heterogenous
variances we standardize them
 The standardized variables Xi*

Xi*= (Xi-mean)/variance

The new variables all have the same
variance (1), so each variable have
the same wheight.
Application of PCA in Genomics
PCA is useful for finding new, more
informative, uncorrelated features; it
reduces dimensionality by rejecting
low variance features
 Analysis of expression data
 Analysis of metabolomics data (Ward
et al., 2003)

However
PCA is only powerful if the biological
question is related to the highest
variance in the dataset
 If not other techniques are more
useful : Independent Component
Analysis
 Introduced by Jutten in 1987

What is ICA?
That looks like that
The idea behind ICA
How it works?
Rationale of ICA
Find the components Si that are as
independent as possible in the sens of
maximizing some function F(s1,s2,.,sk)
that measures indepedence
 All ICs (except 1) should be nonNormal
 The variance of all ICs is 1
 There is no hierarchy between ICs

How to find ICs ?
Many choices of objective function F
 Mutual information
f ( s1, s2 ,...,sk )
MI   f ( s1, s2 ,...,sk ) Log
f1 ( s1 ) f 2 ( s2 )...f k ( sk )

We use the kurtosis of the variables
to approximate the distribution
function
 The number of ICs is chosen by the
user

Difference with PCA
It is not a dimensionality reduction
technique
 There is no single (exact) solution for
components; uses different algorithms
(in R: FastICA, PearsonICA, MLICA)
 ICs are of course uncorrelated but
also as independent as possible
 Uninteresting for Normally distributed
variables

Example: Lee and Batzoglou (2003)
Microarray expression data on 7070
genes in 59 Normal human tissue
samples (19 types)
 We are not interested in reducing
dimension but rather in looking for
genes that show tissue specific
expression profile (what make tissue
types differents)

PCA vs ICA
Hsiao et al (2002) applied PCA and
by visual inspection observed three
gene cluster of 425 genes: liverspecific, brain-specific and musclespecific
 ICA identified more tissue-specific
genes than PCA


similar documents