Report

Chapter 15: Classification of TimeEmbedded EEG Using Short-Time Principal Component Analysis by Nguyen Duc Thang 5/2009 Outline Part one Introduction Principal Component Analysis (PCA) Signal Fraction Analysis (SFA) EEG signal representation Short time PCA Part two Classifier Experimental setups, results, and analysis Outline Overview the previous presentations Common Spatial Patterns (CSP) Classifiers Experimental setups and results Analysis, discussion, and conclusions An architecture of EEG-based BCI system PCA, SFA, Short time PCA LDA, SVM Feature extraction Classification The shortcomings of conventional PCA projection line Not good for large number of samples Short time PCA approach Apply PCA on short durations Extract short time PCA features PCA stack D h Time-embedded features D window h n basic vectors D 1 X Dn n Short time PCA features The role of Singular Value Decomposition (SVD) in PCA w1 w2 Using SVD, we can compute the eigenvector w of covariance matrix Cx (maximize variance) T maxw w Cx w Generalized SVD Maximize variance w A Using GSVD, we can find generalized eigenvector w that maximizes the variance when projecting data A into w and minimizes the variance when projecting data B into w maxw wT Ax w Minimize variance B minw wT B y w Common Spatial Pattern For 2-classes: Choose m eigenvectors, that maximize the variance of class A and minimize the variance of class B Choose m eigenvectors, that maximize the variance of class B and minimize the variance of class A The basic vectors W = total 2m eigenvectors Examples: Distinguish left-hand movement and right-hand movement Common Spatial Pattern For n-classes (Combine with classifier) n-classes are converted to n(n-1)/2 2-classes CSP A B (AB), (BC), (CD), (DA), (AC), (BD) D C New trials are assigned to the class for which most classifiers (2-classes) are voting Outline Overview the previous presentations Common Spatial Patterns Classifiers Experimental setups and results Analysis, discussion, and conclusions Linear Discriminant Analysis (LDA) LDA is a simple classification approach in which the samples from each class are modeled by Gaussian distribution 1 k Nk x xC k 1 N K K T ( x )( x ) k k k 1 xC k P( x | Class k ) e 1 ( x k )T 1 ( x k ) 2 Linear discriminant boundary P ( x | Class i ) P( x | Class j ) e 1 ( x i )T 1 ( x i ) 2 e 1 ( x j )T 1 ( x j ) 2 1 T 1 1 T 1 T 1 x i i i x j j j 2 2 1 T 1 T T 1 ij ( x) x ( i j ) ( i i j 1 j ) 2 T 1 Boundary Linear discriminant boundary 13 12 12 23 13 Outline Overview the previous presentations Common Spatial Patterns Classifiers Experimental setups and results Analysis, discussion, and conclusions The parameters of EEG representations l+1 r EEG channels x1 (t ) ... x (t l ) 1 x(t ) ... r (l 1 ) dimensions xr (t ) ... xr (t l ) l: the numbers of lags W=[w1,w2,…wf…wf+m…] → choose m basic vectors f is first chosen basic vector Time-embedded features window s is window size Cross-Validation Training Procedure The training process The training trials are randomly partitioned into 80% for constructing classifier and 20% for evaluating This partition and evaluation process is repeated five times The set of parameters getting best validation performances are chosen The testing process Use the learned parameters to apply to test trials Experiment 1: Five-task dataset The subjects perform five mental tasks: (1) resting task, (2) mental letter writing, (3) mental multiplication of two multi-digit number, (4) visual counting, and (5) visual rotation Each task is repeated five trials 6 electrodes are used: C3, C4, P3, P4, O1, O2, record each trial 10s/250 Hz Learning parameters Confusion matrix for short-time PCA representation averaged over test trials Visualize the classification results Given a set of samples X={x1,x2,…,xn} that belong to k class and have dimension D>3. How to visualize X ? For each class, apply K-means clustering to find N cluster points (center of cluster). We have a total K x N points Using Multidimensional Scale (MDS) to map points in D dimension to d ≤3 (Preserve distance between points) Visualize the classification results Three-task dataset Three subjects perform 3 tasks: imagine left hand movement, imagine right hand movement, and generate words The subjects perform given task 15s, then switch to another task at the operator’s request There are three training dataset and one test set of data EEG signal are recorded at 512 Hz using 32 electrodes Short-time PCA procedure for three-task dataset Bandpass-filtered data 8-30 Hz Down-sampled to 128 Hz The best parameters from the learning process are given Subject Number of lags First vector Number of vectors 1 2 1 5 2 2 1 4 3 3 1 5 The other methods S. Sun et al. remove 7 electrodes, bandpass filtered 8-13 Hz (subjects 1-2),11-15 Hz (subject 3) . Multiclass CSP for extract features and SVM for classification Schlögl et al. downsampled to 128 Hz, extract all bipolar channels (496) +32 monopolar channels. Each channels extract the features: AR (order=3), Bandpower in α and β bands. LDA are used as classifier The other methods (cont.) Arbabi et al. downsampled to 128 Hz, filtered to 0.545 Hz. Using some statistical features and Bayesian classifier. Salehi use all raw data, features: PSD and some statistical time domain features (not mentioned). Bayesian classifier Comparison results Visualize the classification results Outline Overview the previous presentations Common Spatial Patterns Classifiers Experimental setups and results Analysis, discussion, and conclusions Improve the classifier performance by smoothing Many incorrect classification appear as single samples If n continuous samples have same class, we can decide the majority class With smoothing, the accuracy is improved from 78.7% to 82.7% (five-task dataset) Analyze the parameters of EEG representation Number of lags = 2-3, window size 125, the first basic vector should be early in order, number of basic = 20, subtract mean has minor effects Analyze the importance of electrodes The weights of the discriminant functions are summarized corresponding to each electrode The variances of the weights grouped by this way are plotted The parietal electrodes are most important for mental task discrimination Conclusion This chapter describes a new approach of extracting features from EEG signals by using short-time PCA For five-mental dataset, combining short-time PCA with simple classifier LDA, this approach achieves 80% accuracy On three-task dataset, this approach places second among five compared methods Some analysis about the parameters of the system and the roles of electrodes are also given