poster - Yisong Yue

Report
Linear Submodular Bandits
and their Application to Diversified Retrieval
Yisong Yue (CMU) & Carlos Guestrin (CMU)
Optimizing Recommender Systems
• Every day, users come to news portal
• For each user,
• News portal recommends L articles to cover
the user’s interests
• Users provide feedback (clicks, ratings, “likes”).
• System integrates feedback for future use.
We address two challenges:
• Diversified recommendations
• Exploration for personalization
Challenge 1: Making Diversified
Recommendations
• Should recommend optimally diversified sets of articles.


•“Israel implements unilateral Gaza cease-fire :: WRAL.com”
•“Israel unilaterally halts fire, rockets persist”
•“Gaza truce, Israeli pullout begin | Latest News”
•“Hamas announces ceasefire after Israel declares truce - …”
•“Hamas fighters seek to restore order in Gaza Strip - World - Wire …”
Challenge 2: Personalization
OR
•
•
•
•
?
Different users have different interests
Can only learn interests by recommending and receiving feedback
Exploration versus exploitation dilemma
We model this as a bandit problem!
Linear Submodular Bandits Problem
At each iteration t:
•A set of available articles, At
•Each article represented using D submodular basis functions
•Algorithm selects a set of L articles At
•Algorithm recommends At to user, receives feedback
•Assumptions:
•Pr(like | a,A) = wTΔ(a|A) (conditional submodular independence)
•Regret: (1-1/e)OPT – sum of rewards
•“Israel implements unilateral Gaza cease-fire :: WRAL.com”
•“Obama vows to fight for middle class”
•“Citigroup plans to cut 4500 jobs”
•“Google Android market tops 10 billion downloads”
•“UC astronomers discover two largest black holes ever found”
LSBGreedy
Goal: recommend a set of articles that optimally covers topics
that interest the user.
Modeling Diversity via
Submodular Utility Functions
• We assume a set of D concepts or topics
• Users are modeled by how interested they are in each topic
• Let Fi(A) denote the how well set of articles A covers topic i.
(“topic coverage function”)
• We model user utility as F(A|w) = wT[F1(A), …, FD(A)]
Each topic coverage function Fi(A) is monotone submodular!
A function F is submodular if
"A Í B : F(AÈa) - F(A) ³ F(BÈa) - F(B)
i.e., the benefit of recommending a second (redundant) article is
smaller than adding the first.
Example: Probabilistic Coverage
• Maintain mean and confidence interval of user’s interests
• Greedily recommend articles with highest upper confidence utility
• In example below, chooses article about economy
Mean Estimate by Topic
Uncertainty of Estimate
+
Theorem: with probability 1- δ average regret shrinks as

L
T 

O D
log  
T
  

• Each article a has probability P(i|a) of covering topic I
• Define topic coverage function for set A as
Fi ( A)  1   1  P(i | a)
aA
• Straightforward to show that F is monotone submodular
[El-Arini et al., ‘09]
Properties of Submodular Functions
• Sums of submodular functions are submodular
• So F(A|w) is submodular
• Exact inference is NP-hard!
• Greedy algorithm yields (1-1/e) approximation bound
• Incremental gains are locally linear!
F ( A  a | w)  F ( A | w)  w (a | A)
T
 F1 ( A  a )  F1 ( A) 
 F ( A  a )  F ( A) 
2
2


 (a | A) 





 FD ( A  a )  FD ( A)
• Both properties will be exploited by our online learning algorithm
News Recommender User Study
• 10 days, 10 articles per day
• Compared against
• Multi. Weighting (no exploration) [El-Arini et al, ‘09]
• Ranked Bandits + LinUCB (reduction approach, does not
directly model diversity) [Radlinski et al, ’08; Li et al., ‘10]
Comparison
Win / Tie / Loss
Gain per Day
% Likes
LSBGreedy vs Static
24 / 0 / 0
1.07
67%
LSBGreedy vs MW
24 / 1 / 1
0.54
63%
LSBGreedy vs RankLinUCB 21 / 2 / 4
0.58
61%
• Comparing learned weights for two sessions (LSBGreedy vs MW)
• 1st session, MW overfits to “world “ topic
• 2nd session, user liked few articles, and MW did not learn anything

similar documents