Report

18.338 Course Project Numerical Methods for Empirical Covariance Matrix Analysis Miriam Huntley SEAS, Harvard University May 15, 2013 Real World Data “When it comes to RMT in the real world, we know close to nothing.” -Prof. Alan Edelman, last week RMT Who Cares about Covariance Matrices? • Basic assumption in many areas of data analysis: multivariate data X = Y S1/2 • You get X , want to find S n T • X X can be a very bad estimator if p finite • Current standard using PCA (=SVD): distinguish from null model X = YI • In RMT language: any eigenvalues which lie very far away from the distribution expected for a white Wishart matrix should be considered signal Who Cares about Covariance Matrices? Gene Expression Data 500 1000 Genes 1500 2000 2500 3000 3500 4000 20 40 60 Samples 80 Data from: Alizadeh A, et al. (2000) Distinct types of diffuse large B-cell lymphoma identifed by gene expression profiling. Nature 403:503-511. Why adventure beyond white Wishart? • Null model X = YI not particularly sophisticated. Can we do better? • Noise with structure t Example: Financial data xi = s xi What if there is no right edge? • Known S , how many samples do we need before we recover it from empirical data? Approach: General MP Law p • Data matrixnxp X where X = Y S n • Y entries are iid (real or complex) and E(Yi, j ) = 0, 1/2 p nxp pxp and define g = 2 E(Yi, j ) =1 • Let Hp be the spectral distribution of S p and assume Hp converges weakly to H∞ T • Let FP be the spectral distribution of XX (empirical) and vFP its Stieltjes transform • Then: vFP ® v¥ 1 l dH ¥ (l ) = z -g ò , "z Î C + v¥ (z) 1+ l v¥ (z) See: Silverstein, J. W. and Bai, Z. D. (1995). On the empirical distribution of eigenvalues of a class of large-dimensional random matrices. J. Multivariate Anal. 54, 2,175–192. El Karoui, N., Spectrum estimation for large dimensional covariance matrices using random matrix theory, Ann. Statist. 36 (2008), 2757–2790 Numerical Solutions of General MP Single, True Covariance Matrix True Covariance Matrix Spectral Distribution - 1 l dH(l ) = z -g ò , "z Î C + v(z) 1+ l v(z) Discretize in z Numerically Solve Live Demos… Empirical Spectral Distribution Inverse Solutions of General MP? Single, True Covariance Matrix True Covariance Matrix Spectral Distribution - 1 l dH(l ) = z -g ò , "z Î C + v(z) 1+ l v(z) Discretize in z Numerically Solve Empirical Spectral Distribution Toy Example: Block Covariance Matrix ? Warning: Don’t try this at home Toy Example: Block Covariance Matrix Thanks! This was fun. • Colwell LJ, Qin Y, Manta A and Brenner MP (2013). Signal identification from Sample Covariance Matrices with Correlated Noise. Under Review • El Karoui, N., Spectrum estimation for large dimensional covariance matrices using random matrix theory, Ann. Statist. 36 (2008), 2757–2790 • MARCENKO , V. A. and PASTUR, L. A. (1967). Distribution of eigenvalues in certain sets of random matrices. Mat. Sb. (N.S.) 72 507–536. • Silverstein, J. W. and Bai, Z. D. (1995). On the empirical distribution of eigenvalues of a class of large-dimensional random matrices. J. Multivariate Anal. 54, 2,175–192.