### Experiments Result And Analysis

```Intelligent Control and Automation,
2008. WCICA 2008.
• NMF considers factorizations of the form:
≈
Where  ∈ + × ,  ∈ + × ,  ∈ + × ,  ≪
• To measure the cost of the decomposition, one popular
approach is to use the Kullback-Leibler (KL) divergence metric,
the cost for factorizing X into ZH is evaluated as:

(|  =

=1

=
, ln
=1 =1
( | ℎ
,
+

ℎ
, ,
, ℎ, − ,

• Using the Expectation Maximization (EM) algorithm and an
appropriately designed auxiliary function, it has been shown in
“Algorithms for non-negative matrix factorization,” the update
rule for the -th iteration for ℎ, () is given by:
,
(−1)
,
(−1) ℎ (−1)

,
,
ℎ, () = ℎ, (−1)
(−1)

,

• while for , () the update rule is given by:
′
,
()
ℎ,
= , (−1)
(−1)
,
(−1)
(−1)
ℎ
ℎ
,
,

(−1)
ℎ
,
• Finally, the basis images matrix  is normalized so that its
column vectors elements sum up to one:
′ ()

,
, () =
′ ()

,
Cluster 1
…
Class 1
Cluster 1
Cluster 1
…
Cluster
Image

…
…
…
Database

…
…
Class
Image
1
Cluster
Image
()()
Cluster 1
Class
…
Cluster
Image
1
Database

Image

Class
Image
()()
Cluster
NMF
Dimensionality
reduction
Feature Vector
()() = ,1 ()() ⋯ , ()()
Mean of Feature Vector for the -ℎ cluster of the -ℎ class：
()() =
1
()()
()()
()() = 1 ()() ⋯  ()()
=1

• Using the above notations we can define the within cluster
scatter matrix  as：

()()
=

()()
−  ()()
η
()()
−

()()

=1 =1 ρ=1
• and the between cluster scatter matrix  as：

()()
=
−  ()()
()()
=1 ≠1 =1 =1
• Our Goal：
↓  [ ] ↑
−

()()

Since we desire the trace of matrix  to be as small as possible
and at the same time the trace of  to be as large as possible,
the new cost function is formulated as：

|| =  || +   −
2
2
1
where  and  are positive constants, while is used to simplify
2
subsequent derivations. Consequently, the new minimization
problem is formulated as:
min  ||
,
subject to :
, ≥ 0, ℎ, ≥ 0,
,
= 1, ∀, , .
• The constrained optimization problem is solved by introducing
Lagrangian multipliers ∶
ℒ

=  || +   −   +
∅, ,
2
2

+

ѱ, ℎ,

ℒ =  || +   −   +  ∅  +  ѱ
2
2
• Consequently, the optimization problem is equivalent to the
minimization of the Lagrangian  min ℒ
,
• To minimize ℒ, we first obtain its partial derivatives with respect to
, and ℎ, and set them equal to zero :
ℒ
=−
ℎ,
ℒ
=−
,

, ,
+

ℎ
, ,
, ,
+

ℎ
, ,

, + ѱ, +
+
2 ℎ,
2 ,

, + ѱ, +
+
2 ℎ,
2 ,
• DNMF combines Fisher’s criterion in the NMF decomposition
• and achieves a more efficient decomposition of the
• provided data to its discriminant parts, thus enhancing
separability
• between classes compared with conventional NMF
= 1 , 2 , … ,  ∈ 1×
↓ Dimensionality reduction
= 1 , 2 , … ,  ∈ 1×
Where  ≪
∗ =
Where  ∈ ×
∗ =
Where  ∈ 1×2
∈ 2×1
∈ 1×1
•
•
•
•
•
•
Introduction
Principal Component Analysis (PCA) Method
Non-negative Matrix Factorization (NMF) Method
PCA-NMF Method
Experiments Result and Analysis
Conclusion
• In this paper, we have detailed PCA and NMF, and applied
them to feature extraction of facial expression images.
• We also try to process basic image matrix and weight matrix of
PCA and make them as the initialization of NMF.
• The experiments demonstrate that the method has got a better
recognition rate than PCA and NMF.
Let’s suppose that m expression images are selected to take part
in training, the training set X is defined by
= 1, 2 …  ∈ ×
(1)
Covariance matrix corresponding to all training samples is
obtained as

=
−   −

(2)
=1
u, average face, is defined by

1
=

=1
(3)
=  ×
# of training data
=

average face

+
+

1
2
3
4
Training Data Set  ∈ ×

+
Let
A = 1 − , 2 −  …  −
Then (2) becomes
=
∈ ×
∈ ×
(4)
(5)
Matrix  has  eigenvectors
and eigenvalues.
Image
50x50
= 50 × 50 = 2500
It is difficult to get 2500 eigenvectors and eigenvalues.
Therefore, we get eigenvectors and eigenvalues of  by solving
eigenvectors and eigenvalues of  .
∈ ×
The vectors  ( = 1,2 … )and scalars  ( = 1,2 … )are the
eigenvectors and eigenvalues of covariance matrix  . Then,
eigenvectors  of  are defined by
1
=
= 1,2 …
6

Sorting  by size: 1 ≥ 2 ≥ ⋯ ≥  > ⋯ > 0
Generally, the scale is capacity that  eigenvalues occupied:

=1
≥ ，usally,  is 0.9~0.99.
(7)

=1
Set  is a projection matrix,
= 1 , 2 …  .
And then, every facial expression
image feature can be denoted
by following equation
=    −
(8)
PCA basic images
Given a non-negative matrix , the NMF algorithms seek to find
non-negative factors  and  of  ,such that :
× ≈ × ×
(9)
where  is the number of feature vector satisfies
×
<
+
(10)
Iterative update formulae are given as follow:

←

←

set  = , then define objective function
min  −  2
(11)
(12)
(13)
And then, every facial expression
image feature can be denoted by
following equation
=  −1
(14)
NMF basic images
First, get projective matrix  and weight matrix  by PCA
method.
Initialization is performed for matrices  and  by following
= min 1, abs
(15)
= min 1, abs
(16)
NMF basic images
PCA-NMF basic images
anger
anger
disgust
disgust
fear
fear
happy
neutral
happy
neutral
surprise
surprise
The comparison of recognition rate for every expression
( The training set comprises 70 images and the test set of 70 images)
The comparison of recognition rate for every expression
( The training set comprises 70 images and the test set of 143 images)
The comparison of recognition rate for every expression
( The training set comprises 140 images and the test set of 73 images)
The discussion or r
The results of experiments demonstrate that NMF and PCA-NMF
can outperform PCA. The best recognition rate of facial
expression image is 93.72%. On the whole, our approach
provides good recognition rates.
```