Geometry Approach for k-Regret Query ICDE 2014

Report
GEOMETRY APPROACH
FOR K-REGRET QUERY
ICDE 2014
1
PENG PENG, RAYMOND CHI-WING WONG
CSE, HKUST
OUTLINE
1. Introduction
2. Contributions
3. Preliminary
4. Related Work
5. Geometry Property
6. Algorithm
7. Experiment
2
8. Conclusion
1. INTRODUCTION
Multi-criteria Decision Making:
• Design a query for the user which returns a number of
“interesting” objects to a user
Traditional queries:
3
• Top-k queries
• Skyline queries
1. INTRODUCTION
Top-k queries
• Utility function  ⋅ : 0,1  → [0,1]
• Given a particular utility function , the utility of all the points in D
can be computed.
• The output is a set of k points with the highest utilities.
Skyline queries
4
• No utility function is required.
• A point is said to be a skyline point if a point is not dominated
by any point in the dataset.
• Assume that a greater value in an attribute is more preferable.
• We say that q is dominated by p if and only if   ≤ [] for each
 ∈ [1, ] and there exists an  ∈ [1, ] such that [] < [].
• The output is a set of skyline points.
LIMITATIONS OF
TRADITIONAL QUERIES
Traditional Queries
• Top-k queries
• Advantage: the output size is given by the user and it is
controllable.
• Disadvantage: the utility function is assumed to be known.
• Skyline queries
• Advantage: there is no assumption that the utility function is
known.
• Disadvantage: the output size cannot be controlled.
Recently proposed Query in VLDB2010
• K-regret queries
5
• Advantage: There is no assumption that the utility function is
known and the output size is given by the user and is
controllable.
2. CONTRIBUTIONS
 We give some theoretical properties of k-regret queries
 We give a geometry explanation of a k-regret query.
 We define happy points, candidate points for the k-regret
query.
 Significance: All existing algorithms and new algorithms to
be developed for the k-regret query can also use our
happy points for finding the solution of the k-regret query
more efficiently and more effectively.
 We propose two algorithms for answering a k-regret query
6
 GeoGreedy algorithm
 StoredList algorithm
 We conduct comprehensive experimental studies
3. PRELIMINARY
Notations in k-regret queries
We have  =  ,  ,  ,  . Let  =  ,  .
• Utility function  x : 0,1

→ [0,1].
• (0.5,0.5) is an example where (0.5,0.5) = 0.5 ⋅  + 0.5 ⋅ .
• Consider 3 utility functions, namely,  0.3,0.7 , (0.5,0.5) , (0.7,0.3) .
•  = { 0.3,0.7 , (0.5,0.5) , (0.7,0.3) }.
• Maximum utility  ,  = max ().
•  ,  0.5,0.5
=  0.5,0.5 (2 ) = 0.845 ,
•  ,  0.5,0.5
=  0.5,0.5 1 = 0.870.
7
∈
3. PRELIMINARY
Notations in k-regret queries
• Regret ratio  ,  = 1 −
 ,
 ,
.
Measures how bad a user with f feels after receiving the output S.
If it is 1, the user feels bad; if it is 0, then the user feels happy.
 ,  0.5,0.5
 ,  0.5,0.5
=1−
 ,  0.3,0.7
 ,  0.3,0.7
0.845
0.870
0.901
0.901
=1−
= 0.901,
= 0;
= 0.811,  ,  0.7,0.3
0.811
0.916
= 0.870,
= 0.029;
= 0.901,  ,  0.3,0.7
=1−
 ,  0.7,0.3
 ,  0.7,0.3
= 0.845,  ,  0.5,0.5
= 0.916,
= 0.115.
• Maximum regret ratio   = max (, ).
∈
8
Measures how bad a user feels after receiving the output S.
A user feels better when () is smaller.
•   = max 0, 0.029, 0.115 = 0.115.
3. PRELIMINARY
Problem Definition
• Given a d-dimensional database  of size n and an integer k,
a k-regret query is to find a set of S containing at most k
points such that () is minimized.
• Let  be the maximum regret ratio of the optimal solution.
Example
9
• Given a set of points 1 , 2 , 3 , 4 each of which is represented
as a 2-dimensional vector.
• A 2-regret query on these 4 points is to select 2 points among
1 , 2 , 3 , 4 as the output such that the maximum regret ratio
based on the selected points is minimized among other
selections.
4. RELATED WORK
Variations of top-k queries
• Personalized Top-k queries (Information System 2009)
- Partial information about the utility function is assumed to be known.
• Diversified Top-k queries (SIGMOD 2012)
- The utility function is assumed to be known.
- No assumption on the utility function is made for a k-regret query.
Variations of skyline queries
• Representative skyline queries (ICDE 2009)
- The importance of a skyline point changes when the data is contaminated.
• K-dominating skyline queries (ICDE 2007)
- The importance of a skyline point changes when the data is contaminated.
- We do not need to consider the importance of a skyline point in a k-regret query.
Hybrid queries
• Top-k skyline queries (OTM 2005)
- The importance of a skyline point changes when the data is contaminated.
• -skyline queries (ICDE 2008)
- No bound is guaranteed and it is unknown how to choose .
10
- The maximum regret ratio used in a k-regret query is bounded.
4. RELATED WORK
K-regret queries
• Regret-Minimizing Representative Databases (VLDB 2010)
• Firstly propose the k-regret queries;
• Proves a worst-case upper bound and a lower bound for the
maximum regret ratio of the k-regret queries;
• Propose the best-known fastest algorithm for answering a kregret query.
• Interactive Regret Minimization (SIGMOD 2012)
• Propose an interactive version of k-regret query and an
algorithm to answer a k-regret query.
• Computing k-regret Minimizing set (VLDB 2014)
11
• Prove the NP-completeness of a k-regret query;
• Define a new k-regret minimizing set query and proposed two
algorithms to answer this new query.
5. GEOMETRY PROPERTY
• Geometry explanation of the maximum regret ratio
() given an output set S
Happy point and its properties
12
•
GEOMETRY
EXPLANATION OF ()
• Maximum regret ratio   = max (, ).
∈
How to compute () given an output set ?
13
• The function space F can be infinite.
• The method used in “Regret-Minimizing Representative
Databases” (VLDB2010): Linear Programming
• It is time consuming when we have to call Linear
Programming independently for different ’s.
GEOMETRY
EXPLANATION OF ()
• Maximum regret ratio   = max (, ).
∈
We compute   with Geometry method.
14
• Straightforward and easily understood;
• Save time for computing ().
AN EXAMPLE IN 2-D
(), where  = {1 , 2 , 3 , 4 , 5 , 6 }.
1
4
2
3
6
1
15
1
5
AN EXAMPLE IN 2-D
(), where S= {1 , 2 }.
1
1
4
2
3
6
1
16
5
GEOMETRY
EXPLANATION OF ()
Critical ratio
• A -critical point given  denoted by ’ is defined as the
intersection between the vector  and the surface of ().
′

0.8 + 0.7 = 1
 = (0.67,0.82)
′ = (0.6,0.74)

17
• Critical ratio  ,  =
GEOMETRY
EXPLANATION OF ()
Lemma 0:
• () = max(1 −  ,  )
∈
18
• According to the lemma shown above, we compute (, ) at
first for each  which is outside   and find the greatest
value of 1 −  ,  which is the maximum regret ratio of .
AN EXAMPLE IN 2-D
Suppose that  = 2 , and the output set is S = {1 , 3 }.
 2 ,  =
 6 ,  =
5
2′
2
6′
6
.
.
5
.
1
So,   = max(1 −  ,  )
∈
= max{1 −  5 ,  , 1 −  2 ,  , 1 − (6 , )}.
5′
1
4
2
2′
3
6′
6
1
19
 5 ,  =
5′
HAPPY POINT
The set  is defined as a set of -dimensional points of size
, where for each point  and  ∈ ,  , we have  [] = 
when  = , and  [] =  when  ≠ .
In a 2-dimensional space,  = { ,  }, where  =
,  ,  = (, ).
1
4
2
3
6
2
20
1
5
HAPPY POINT
In the following, we give an example of (  ∪ ) in a 2dimensonal case.
Example:
(  ∪ )
1
4
(  ∪ )
2
3
6
2
21
1
5
HAPPY POINT
Definition of domination:
• We say that q is dominated by p if and only if   ≤ [] for
each  ∈ [1, ] and there exists an  ∈ [1, ] such that [] <
[].
Definition of subjugation:
22
• We say that q is subjugated by p if and only if q is on or below
all the hyperplanes containing the faces of ({} ∪
) and  is below at least one hyperplane containing a face
of ({} ∪ ).
• We say that q is subjugated by p if and only if   ≤ () for
each  ∈  and there exists a  ∈ such that   < ().
AN EXAMPLE IN 2-D
2 subjugates 4 because 4 is below both the line 2 1 and
the line 2 2 .
2 does not subjugates 3 because  is above the line 2 2 .
1
4
2
3
6
2
23
1
5
HAPPY POINT
Lemma 1:
• There may exist a point in  , which cannot be found in the
optimal solution of a k-regret query.
Example:
• In the example shown below, the optimal solution of a 3-regret
query is 5 , 6 , 2 , where 2 is not a point in  .
1
2
3
6
2
24
1
5
AN EXAMPLE IN 2-D
Lemma 2:
•  ⊂ ℎ ⊂ 
Example:
•  = 1 , 2 , 3 , 4 , 5 , 6
1
5
1
4
2
3
6
2
25
•  = 1 , 2 , 5 , 6
• ℎ = 1 , 2 , 3 , 5 , 6
HAPPY POINT
All existing studies are based on  as candidate points
for the k-regret query.
Lemma 3:
• Let  be the maximum regret ratio of the optimal solution.
Then, there exists an optimal solution of a k-regret query,
which is a subset of ℎ when  < ½ .
Example:
26
• Based on Lemma 3, we compute the optimal solution based
on ℎ instead of  .
6. ALGORITHM
Geometry Greedy algorithm (GeoGreedy)
• Pick  boundary points of the dataset  of size  and insert them into an
output set;
• Repeatedly compute the regret ratio for each point which is outside the
convex hull constructed based on the up-to-date output set, and add the
point which currently achieves the maximum regret ratio into the output
set;
• The algorithm stops when the output size is k or all the points in 
are selected.
Stored List Algorithm (StoredList)
• Preprocessing Step:
• Call GeoGreedy algorithm to return the output of an -regret query;
• Store the points in the output set in a list in terms of the order that they are
selected.
• Query Step:
27
• Returns the first k points of the list as the output of a k-regret query.
7. EXPERIMENT
Datasets
Experiments on Synthetic datasets
Experiments on Real datasets
• Household dataset :  = 903077,  = 6
• NBA dataset:  = 21962,  = 5
• Color dataset:  = 68040,  = 9
• Stocks dataset:  = 122574,  = 5
Algorithms:
• Greedy algorithm (VLDB 2010)
• GeoGreedy algorithm
• StoredList algorithm
Measurements:
28
• The maximum regret ratio
• The query time
7. EXPERIMENT
Experiments
• Relationship Among  , ℎ , 
29
• Effect of Happy Points
• Performance of Our Method
RELATIONSHIP AMONG
 , ℎ , 


Household
926
1332
9832
NBA
65
75
447
Color
124
151
1023
Stocks
396
449
3042
30

Dataset
EFFECT OF HAPPY
POINTS
Household: maximum regret ratio
The result based on 
31
The result based on ℎ
EFFECT OF HAPPY
POINTS
Household: query time
The result based on 
32
The result based on ℎ
PERFORMANCE OF
OUR METHOD
Experiments on Synthetic datasets
• Maximum regret ratio
Effect of n
33
Effect of d
PERFORMANCE OF
OUR METHOD
Experiments on Synthetic datasets
• Query time
Effect of n
34
Effect of d
PERFORMANCE OF
OUR METHOD
Experiments on Synthetic datasets
• Maximum regret ratio
Effect of large k
35
Effect of k
PERFORMANCE OF
OUR METHOD
Experiments on Synthetic datasets
• Query time
Effect of large k
36
Effect of k
8. CONCLUSION
• We studied a k-regret query in this paper.
• We proposed a set of happy points, a set of candidate
points for the k-regret query, which is much smaller than
the number of skyline points for finding the solution of the
k-regret query more efficiently and effectively.
• We conducted experiments based on both synthetic and
real datasets.
• Future directions:
37
• Average regret ratio minimization
• Interactive version of a k-regret query
38
THANK YOU!
GEOGREEDY
ALGORITHM
39
GeoGreedy Algorithm
GEOGREEDY
ALGORITHM
An example in 2-d:
In the following, we compute a 4-regret query using
GeoGreedy algorithm.
5
1
4
2
3
6
1
40
1
GEOGREEDY
ALGORITHM
Line 2 – 4:
•  = {5 , 6 }
6
1
41
1
5
GEOGREEDY
ALGORITHM
Line 2 – 4:
•  = {5 , 6 }.
Line 5 – 10 (Iteration 1):
• Since  2 ,  > (1 , ) and  1 ,  < 1, we add
2 in .
5
1
1
1′
4
2
3
1
6
42
2′
GEOGREEDY
ALGORITHM
Line 5 – 10 (Iteration 2):
• After Iteration 1,  = {1 , 5 , 6 }.
• We can only compute  2 ,  which is less than 1 and we
add 1 in .
5
1
4
2′
2
3
6
1
43
1
STOREDLIST
ALGORITHM
Stored List Algorithm
44
• Pre-compute the outputs based on GeoGreedy Algorithm for
 ∈ [1, ].
• The outputs with a smaller size is a subset of the outputs with
a larger size.
• Store the outputs of size n in a list based on the order of the
selection.
STOREDLIST
ALGORITHM
After two iterations in GeoGreedy Algorithm, the output set  =
{1 , 2 , 5 , 6 }.
Since the critical ratio for each of the unselected points is at least
1, we stop GeoGreedy Algorithm and  is the output set with the
greatest size.
We stored the outputs in a list L which ranks the selected points in
terms of the orders they are added into .
That is,  = [5 , 6 , 1 , 2 ].
45
When a 3-regret query is called, we returns the set 5 , 6 , 1 .
EFFECT OF HAPPY
POINTS
NBA: maximum regret ratio
The result based on 
46
The result based on ℎ
EFFECT OF HAPPY
POINTS
NBA: query time
The result based on 
47
The result based on ℎ
EFFECT OF HAPPY
POINTS
Color: maximum regret ratio
The result based on 
48
The result based on ℎ
EFFECT OF HAPPY
POINTS
Color: query time
The result based on 
49
The result based on ℎ
EFFECT OF HAPPY
POINTS
Stocks: maximum regret ratio
The result based on 
50
The result based on ℎ
EFFECT OF HAPPY
POINTS
Stocks: query time
The result based on 
51
The result based on ℎ
PRELIMINARY
Example:
 = { 0.3,0.7 , (0.5,0.5) , (0.7,0.3) }, where (,) =  ⋅  +  ⋅ .
We have  =  ,  ,  ,  .
Let  =  ,  .
Since  ,  0.3,0.7
and  ,  0.3,0.7
we have  ,  0.3,0.7
= 0.901,
= 0.901,
0.901
= 1 − 0.901 = 0.
Similarly,
0.845
0.870
0.811
−
0.916
 ,  0.5,0.5
=1−
= 0,
 ,  0.7,0.3
=1
= 0.115.
52
So, we have   = max 0,0.029,0.115 = 0.115
AN EXAMPLE IN 2-D
Points (normalized):
• 1 , 2 , 3 , 4 , 5 , 6
5
1
1
4
5
1
4
2
3
6
2
3
6
2
1
53
1

similar documents