### Slides

```Finding Skyline Nodes in Large Networks
Arijit Khan
Vishwakarma Singh
Jian Wu
Motivation
If John is interested in Big Data, Cloud
Computing, and Map Reduce, who will be the top-5 people John should ask
Evaluation Metrics:
 Distance from the query node. (John)
 Coverage of the Query Topics. (Big Data, Cloud Computing, Map Reduce)
Finding Skyline Nodes in Large Networks
2
Homogeneous Approach ?
If John is interested in Big Data, Cloud
Computing, and Map Reduce, who will be the top-5 people John should ask
Score = λ . Distance + (1- λ ). Coverage
How to get λ ?
Finding Skyline Nodes in Large Networks
3
Weighted Set Cover ?
 Find nodes with smallest aggregate distance from the query node, such
that they cover all query topics.
u0 = q
Q = { a, b, c }
 Ignore some interesting nodes.
a
 Cannot rank the results.
b
u1
abc
c
u2
u3
a
cd
u5
u4
abc
u6
de
u7
Finding Skyline Nodes in Large Networks
u8
4
Graph Skyline
 Dominance on Coverage: u >c v
Query topics covered by node u is a
superset of the query topics covered
by node v.
 Dominance on Distance: u >d v
Distance of u from q is less than that
of v from q.
 Dominance: u > v
(1) u >c v and u ≥d v ;
or (2) u ≥c v and u >d v.
u0 = q
Q = { a, b, c }
a
b
u1
c
u2
abc
u3
a
cd
u5
u4
abc
u6
de
u7
u8
A node is a skyline node if it is not dominated by any other
node in the network.
Finding Skyline Nodes in Large Networks
5
Ranking of Skyline Nodes
 Too many skyline nodes.
u0 = q
Q = { a, b, c }
 Rank them.
 Dominance Count: # nodes dominated
by a skyline node. [Lin et. al., ICDE ‘07]
 Higher Dominance Count => more
pruning from candidate set.
 1. DC(u4) = {u5, u6, u7},
2. DC(u1) = {u5}
3. DC(u2) = Φ; 4. DC(u3) = Φ
a
b
u1
c
u2
abc
u3
a
cd
u5
u4
abc
u6
de
u7
u8
Given a query node and a set of query topics in a
network, find the top-k skyline nodes with maximum dominance count.
Finding Skyline Nodes in Large Networks
6
Algorithm
 Construct a Query DAG.
 Three variables associated with each DAG node: Count (C), Dominance
(D), Traversal (T).
u0 = q
a
Q = { a, b, c }
b
u1
c
u2
abc
u3
a
cd
u5
u4
abc
C=2
D=T=-
u6
de
u7
Input Network
u8
ab
C=0
D=T=C=2
D=T=-
a
 Naïve Complexity: O(n2r)
abc
 Complexity with
Preprocessing: O(nr2)
C=0
D = - ac
T=-
C=1
D=T=-
b
bc
c
Query DAG
Finding Skyline Nodes in Large Networks
C=0
D=T=-
C=2
D=T=-
7
Query DAG Construction

the label.
For each label, find a sorted list of nodes that contain

Incremental DAG construction.
u0 = q
Q = { a, b, c }
u4
a
b
u1
abc
c
u6
u7
c
cd
u5
u6
de
u7
u4
u3
a
abc
u3
ab
u2
u4
u7
u8
a
u1
u5
b
u2
Finding Skyline Nodes in Large Networks
8
Query DAG Construction (cont.)

the label.
For each label, find a sorted list of nodes that contains

lists in order.
Consider the labels and their sorted
u0 = q
a
Q = { a, b, c }
b
u1
abc
c
cd
u5
u6
de
u7
u7
u3
a
abc
u4
ab
u2
u4
abc
u8
a
u1
u5
b
u2
Finding Skyline Nodes in Large Networks
c
u3
u6
9
Query DAG Construction (cont.)

the label.
For each label, find a sorted list of nodes that contains

lists in order.
Consider the labels and their sorted
u0 = q
a
Q = { a, b, c }
b
u1
abc
c
ab
u2
a
u6
de
u7
ac
u7
bc
cd
u5
abc
u4
u3
a
u4
abc
u8
u1
b
u5
u2
Finding Skyline Nodes in Large Networks
c
u3
u6
10
Find Dominance Variable
 Perform a topological ordering of the DAG nodes to evaluate the
Dominance variable (D) of each DAG node.
 # Nodes dominated (or equal) by coverage.
u0 = q
a
Q = { a, b, c }
b
u1
c
u2
abc
u3
a
cd
u5
u4
abc
C=2
D=7
T=-
u6
de
u7
Input Network
u8
ab
C=0
D=3
T=C=2
D=2
T=-
a
 Naïve Complexity: O(n2r)
abc
C=0
D = 4 ac
T=-
C=1
D=1 b
T=Query DAG
 Complexity by
Topological Ordering: O(3r)
bc
c
Finding Skyline Nodes in Large Networks
C=0
D=3
T=-
C=2
D=2
T=-
11
Find Traversal Variable
 Perform a Breadth First Search (BFS) starting from the query node.
 # Nodes not dominated by distance.
u0 = q
a
b
u1
abc
c
u2
u4
C=2
D=7
T=1
Q = { a, b, c }
a
cd
u6
u5
abc
ab
u3
de
u7
Input Network
C=0
D=3
T=0
h =2 C = 2
u8
D=2
T=2
a
 Complexity by BFS: O(n+e)
abc
C=0
D = 4 ac
T=0
C=1
D=1 b
T=1
Query DAG
bc
c
Finding Skyline Nodes in Large Networks
C=0
D=3
T=0
C=2
D=2
T=2
12
Find Skyline Nodes
 Store DAG nodes into a Lookup Table. Skyline Bit for each DAG node.
 Helps to prune non-skyline nodes directly.
u0 = q
Q = { a, b, c }
abc
a
b
u1
abc
c
u2
u4
a
Input Network
bc
u6
de
u7
ac
h =1
cd
u5
abc
ab
u3
a
b
c
u8
Query DAG
Finding Skyline Nodes in Large Networks
abc
0
ab
0
ac
0
bc
0
a
1
b
1
c
1
Lookup Table
13
Find Skyline Nodes (cont.)
 Store DAG nodes into a Lookup Table. Skyline Bit for each DAG node.
 Helps to prune non-skyline nodes directly.
u0 = q
Q = { a, b, c }
abc
a
b
u1
abc
c
u2
u4
a
ac
bc
cd
u5
abc
ab
u3
u6
h =2
de
u7
Input Network
a
b
c
u8
Query DAG
Finding Skyline Nodes in Large Networks
abc
1
ab
1
ac
1
bc
1
a
1
b
1
c
1
Lookup Table
14
Dominance Count of Skyline Nodes
 DC(u4) = D(abc)-T(abc)-T(ab)-T(ac)-T(bc)-T(a)-T(b)-T(c)-1 = 3
 Top-k Buffer to store top-k skyline nodes.
u0 = q
a
b
u1
abc
ab
C=0
D=3
T=0
u3
a
cd
u5
u6
h =2
abc
abc
c
u2
u4
C=2
D=7
T=0
Q = { a, b, c }
de
u7
Input Network
u8
C=2 a
D=2
T=1
C = 0 ac
D=4
T=0
C=1
D=1 b
T=1
Query DAG
bc C = 0
D=3
T=0
C=2
c D=2
T=1
Finding Skyline Nodes in Large Networks
abc
1
ab
1
ac
1
bc
1
a
1
b
1
c
1
Lookup Table
15
Pruning and Early Termination
 DC(u4) = D(abc)-T(abc)-T(ab)-T(ac)-T(bc)-T(a)-T(b)-T(c)-1 = 3
 Top-k Buffer to store
top-k skyline
nodes.
Dominance
Variable
of a DAG node has smaller value
than the smallest Dominance Count in the top-k buffer.

Skyline Bits of all entries in the Lookup Table are 1’s.
Finding Skyline Nodes in Large Networks
16
Experimental Results
 DC(u4) = D(abc)-T(abc)-T(ab)-T(ac)-T(bc)-T(a)-T(b)-T(c)-1 = 3

0.7M Nodes,
Edges,
10 Node
Labels (distinct).
 Top-k Buffer
to store3M
top-k
skyline
nodes.
 5 Query Topics.
Finding Skyline Nodes in Large Networks
17
Efficiency
 DC(u4) = D(abc)-T(abc)-T(ab)-T(ac)-T(bc)-T(a)-T(b)-T(c)-1 = 3

185M to
Nodes,
Node Labels (distinct).
 Top-k Buffer
store 90M
top-kEdges,
skyline1000
nodes.
 5 Query Topics, Top-5 Result Nodes.
Finding Skyline Nodes in Large Networks
18
Conclusion and Future Works
 DC(u4) = D(abc)-T(abc)-T(ab)-T(ac)-T(bc)-T(a)-T(b)-T(c)-1 = 3
 Efficient Algorithm to find top-k skyline nodes in large attributed
 network.
Top-k Buffer to store top-k skyline nodes.
 Required experimental evaluation in real and synthetic datasets.
 Time Complexity is linear in the number of nodes and edges in the
network. Distance based indexing might improve the efficiency.
 Top-k Skyline set instead of Top-k Skyline nodes might be more
effective.
Finding Skyline Nodes in Large Networks
19
Questions
 DC(u4) = D(abc)-T(abc)-T(ab)-T(ac)-T(bc)-T(a)-T(b)-T(c)-1 = 3
 Top-k Buffer to store top-k skyline nodes.
Thank You ! ! !
Finding Skyline Nodes in Large Networks
20
```