Factorization Machine - Department of Computer Science and

Report
FACTORIZATION MACHINE:
MODEL, OPTIMIZATION AND
APPLICATIONS
1
Yang LIU
Email: [email protected]
Supervisors: Prof. Andrew Yao
Prof. Shengyu Zhang
OUTLINE

Factorization machine (FM)
A generic predictor
 Auto feature interaction


Learning algorithm
Stochastic gradient descent (SGD)
 …


Applications
Recommendation systems
 Regression and classification
 …

2
DOUBAN MOVIE
3
PREDICTION TASK
?
?

e.g. Alice rates Titanic 5 at time 13
4
PREDICTION TASK

Format:   : ℝ → 
 = ℝ for regression,
  = {+1, −1} for classification


Training set: Tr = {  1 , 1 ,  2 ,  2 … }

Testing set: Te = {1 , 2 , … },

Objective: to predict {(1 ), (2 ), … }
5
LINEAR MODEL – FEATURE ENGINEERING

Linear SVM

Logistic Regression
  = 0 +   
1
  =
1 + 0 exp(−   )
6
FACTORIZATION MODEL

Linear:   ≔ 0 +

=1  
FM:   ≔ 0 +

=1  
+

=1

=+1
 ,   
Interaction
between variables

Model parameters Θ = 0 , 1 , …  , 1 , … , 


 ∈ ℝ ,  = 1, … , , where
 is the inner dimension
7
INTERACTION MATRIX
, =  , 
W
8
INTERACTION MATRIX
, =  , 
W
9
INTERACTION MATRIX
, =  , 
?
W
10
INTERACTION MATRIX
W
=
, =  , 
V
T
V
k
11
, =  , 
INTERACTION MATRIX
W
=
V
T
V
k
  ≔ 0 +

=1  
+

=1

=+1
 ,   12
, =  , 
INTERACTION MATRIX
W
  ≔ 0 +
=

=1  
V
+

=1
T
V

=+1
 ,   13
, =  , 
INTERACTION MATRIX
W
  ≔ 0 +
=

=1  
V
+

=1
T
V

=+1
 ,   14
, =  , 
INTERACTION MATRIX
T

W
=
V
T
V

Factorization
  ≔ 0 +

=1  
+

=1

=+1
 ,   15
, =  , 
INTERACTION MATRIX
T

W
=
V
Factorization
Machine
  ≔ 0 +

=1  
T
V

+

=1

=+1
 ,   16
FM: PROPERTIES

=


=+1
 ,   
  −   

∀ ∈ ℝ× ≽ 0, ∃ ∈ ℝ× . .  =  
Feature dependency:



=1
Expressiveness:



=1   +
1
0 +    +  
2
  ≔ 0 +
, = 〈 ,  〉 and , = 〈 ,  〉 are dependent
Linear computation complexity:

()
17
OPTIMIZATION TARGET
Min ERROR
 Min ERROR + Regularization


OPT = argmin

Loss function
Θ
,
∈    Θ ,  +
2



∈Θ
 1 , 2 = 1 − 2 2
  1 , 2 = ln(1 + exp(−1 2 ))

18
STOCHASTIC GRADIENT DESCENT (SGD)

For item ,  , update  by:

 ←−



  ,  + 2 
0 : initial value of 
 : learning rate
  : regularization


Pros
Easy to implement
 Fast convergence on big training data


Cons
Parameter tuning
 Sequential method

19
APPLICATIONS

EMI Music Hackathon 2012


Song recommendation
Given:
Historical ratings
 User demographics



# features: 51K
# items in training: 188K
?
20
RESULTS FOR EMI MUSIC

FM: Root Mean Square Error (RMSE) 13.27626
Target value [0,100]
 The best (SVD++) is 13.24598


Details
 Regression
 Converges in 100 iterations
 Time for each iteration: < 1 s

Win 7, Intel Core 2 Duo CPU 2.53GHz, 6G RAM
21
OTHER APPLICATIONS

Ads CTR prediction (KDD Cup 2012)

Features

User_info, Ad_info, Query_info, Position, etc.
# features: 7.2M
 # items in training: 160M
 Classification
 Performance:


AUC: 0.80178, the best (SVM) is 0.80893
22
OTHER APPLICATIONS

HiCloud App Recommendation

Features

App_info, Smartphone model, installed apps, etc.
# features: 9.5M
 # items in training: 16M
 Classification
 Performance:


Top 5: 8%, Top 10: 18%, Top 20: 32%; AUC: 0.78
23
SUMMARY
FM: a general predictor
 Works under sparsity
 Linear computation complexity
 Estimates interactions automatically
 Works with any real valued feature vector

THANKS!
24

similar documents