### Factorization Machine - Department of Computer Science and

```FACTORIZATION MACHINE:
MODEL, OPTIMIZATION AND
APPLICATIONS
1
Yang LIU
Email: [email protected]
Supervisors: Prof. Andrew Yao
Prof. Shengyu Zhang
OUTLINE

Factorization machine (FM)
A generic predictor
 Auto feature interaction


Learning algorithm
Stochastic gradient descent (SGD)
 …


Applications
Recommendation systems
 Regression and classification
 …

2
DOUBAN MOVIE
3
?
?

e.g. Alice rates Titanic 5 at time 13
4

Format:   : ℝ →
= ℝ for regression,
  = {+1, −1} for classification


Training set: Tr = {  1 , 1 ,  2 ,  2 … }

Testing set: Te = {1 , 2 , … },

Objective: to predict {(1 ), (2 ), … }
5
LINEAR MODEL – FEATURE ENGINEERING

Linear SVM

Logistic Regression
= 0 +
1
=
1 + 0 exp(−   )
6
FACTORIZATION MODEL

Linear:   ≔ 0 +

=1
FM:   ≔ 0 +

=1
+

=1

=+1
,
Interaction
between variables

Model parameters Θ = 0 , 1 , …  , 1 , … ,


∈ ℝ ,  = 1, … , , where
is the inner dimension
7
INTERACTION MATRIX
, =  ,
W
8
INTERACTION MATRIX
, =  ,
W
9
INTERACTION MATRIX
, =  ,
?
W
10
INTERACTION MATRIX
W
=
, =  ,
V
T
V
k
11
, =  ,
INTERACTION MATRIX
W
=
V
T
V
k
≔ 0 +

=1
+

=1

=+1
,   12
, =  ,
INTERACTION MATRIX
W
≔ 0 +
=

=1
V
+

=1
T
V

=+1
,   13
, =  ,
INTERACTION MATRIX
W
≔ 0 +
=

=1
V
+

=1
T
V

=+1
,   14
, =  ,
INTERACTION MATRIX
T

W
=
V
T
V

Factorization
≔ 0 +

=1
+

=1

=+1
,   15
, =  ,
INTERACTION MATRIX
T

W
=
V
Factorization
Machine
≔ 0 +

=1
T
V

+

=1

=+1
,   16
FM: PROPERTIES

=


=+1
,
−

∀ ∈ ℝ× ≽ 0, ∃ ∈ ℝ× . .  =
Feature dependency:



=1
Expressiveness:



=1   +
1
0 +    +
2
≔ 0 +
, = 〈 ,  〉 and , = 〈 ,  〉 are dependent
Linear computation complexity:

()
17
OPTIMIZATION TARGET
Min ERROR
 Min ERROR + Regularization


OPT = argmin

Loss function
Θ
,
∈    Θ ,  +
2

∈Θ
1 , 2 = 1 − 2 2
  1 , 2 = ln(1 + exp(−1 2 ))

18
STOCHASTIC GRADIENT DESCENT (SGD)

For item ,  , update  by:

←−

,  + 2
0 : initial value of
 : learning rate
  : regularization


Pros
Easy to implement
 Fast convergence on big training data


Cons
Parameter tuning
 Sequential method

19
APPLICATIONS

EMI Music Hackathon 2012


Song recommendation
Given:
Historical ratings
 User demographics



# features: 51K
# items in training: 188K
?
20
RESULTS FOR EMI MUSIC

FM: Root Mean Square Error (RMSE) 13.27626
Target value [0,100]
 The best (SVD++) is 13.24598


Details
 Regression
 Converges in 100 iterations
 Time for each iteration: < 1 s

Win 7, Intel Core 2 Duo CPU 2.53GHz, 6G RAM
21
OTHER APPLICATIONS

Ads CTR prediction (KDD Cup 2012)

Features

User_info, Ad_info, Query_info, Position, etc.
# features: 7.2M
 # items in training: 160M
 Classification
 Performance:


AUC: 0.80178, the best (SVM) is 0.80893
22
OTHER APPLICATIONS

HiCloud App Recommendation

Features

App_info, Smartphone model, installed apps, etc.
# features: 9.5M
 # items in training: 16M
 Classification
 Performance:


Top 5: 8%, Top 10: 18%, Top 20: 32%; AUC: 0.78
23
SUMMARY
FM: a general predictor
 Works under sparsity
 Linear computation complexity
 Estimates interactions automatically
 Works with any real valued feature vector

THANKS!
24
```