PPT

```Community-based Greedy
Algorithm for Mining Top-K
Influential Nodes in Mobile
Social Networks
Yu Wang1, Gao Cong2, Guojie Song1, Kunqing Xie1
1 Peking University, China
2 Nanyang Technological University, Singapore
Problem and Background

Problem: Given a mobile social network, we aim to mine a
set of top-K influential nodes S such that R(S) is maximized
using the extended Independent Cascade information
diffusion model.

•
A mobile social network plays an essential role as the spread of
information and influence in the form of "word-of-mouth“
The problem is NP-hard.
•
•
computationally expensive to run the greedy algorithm on a large
network.
The previous greedy algorithms take days to finish on 723k nodes
Basic Idea of the Algorithm
Construct
Network from
CDR (call
detailed
record)
Community
Detection: it
based on
diffusion
Model on MSN
Dynamic
programming
Algorithm &
greedy
algorithm
on selected
communities
Step1: Extracting Mobile Social Network

Extract a Mobile Social Network from CDR data and
model it as a directed weighted graph
 A phone user -- a
node
 A directed edge u  v
is established, if there
exits communication
from u to v
 communication time -the weight of the edge
1
8
2
6
4
2
4
10
5
3
Extended Independent Cascade Model

Two states of nodes
 Active

& inactive
Diffusion speed λ
 When
an active node vi contacts an inactive
node vj , the inactive node becomes active at
a probability (rate) λij.
Extended Independent Cascade Model
inactive
active
1
8
2
6
4
inactive
10
1
2
4
5
3
inactive
inactive
active
8
2
6
4
active
2
4
10
5
3
inactive
inactive
active
1
8
2
9
4
active
2
4
10
5
3
active
Step2: Influential Model Based Community
Detection Algorithm

Community Partition




Each node is assigned a unique community label from 1 to N
For each node compute the set of its influenced neighbors using
Independent Cascade diffusion model
Iteratively propagate the labels through the network in finite
iterations
 for each node v ,the label of the community that the majority
of its influenced neighbors belong to  the label of v
Community Combination

the difference between the node’s influence degree in its
community and its influence degree in the network is smaller
than a threshold.
Step3: Community-Based Greedy Algorithm

Choose communities to find the Top-1 influential node
C2
C1
ΔR2=0.3
ΔR1=0.2
ΔR3=0.1
C3
R[1,1]=max{R[0,1], R[3,0]+ΔR1}=0.2
s[1,1]=C1;
R[2,1]=max{R[1,1], R[3,0]+ ΔR2}=0.3
s[2,1]=C2;
R[3,1]=max{R[2,1], R[3,0]+ ΔR3}=0.3
s[3,1]=C2;
So we mine top-1 node in C2
Community-Based Greedy Algorithm

Choose communities to find the Top-2 influential node
C2
C1
ΔR2=0.06
ΔR1=0.2
Note ΔR2 is 0.06, but not 0.3.
ΔR3=0.1
C3
R[1,2]= max{R[0,2], R[3,1]+ΔR1}=0.5
s[1,2]=C1;
R[2,2]= max{R[1,2], R[3,1]+ΔR2}=0.5
s[2,2]=C1;
R[3,2]= max{R[2,2], R[3,1]+ΔR3}=0.5
s[3,2]=C1;
We mine the second node in C1
Experiments

Data Sets
 Extract
a Mobile Social Network from a three-month
CDR (call detailed record) data of a city from China
Mobile
 Node number: 723,201
 Average degree: 13.4
Community distribution
 largest
community size: 95,690
Experiments

Top-k Nodes Mining Methods
 MixedGreedy
Algorithm
 NewGreedy Algorithm
 DegreeDiscount
 Random Method
 CGA
 SPCGA

Parameter study:
 k,
diffusion speed λ, data size
Results

Influence degree and time vs K
Results

Influence degree and time vs diffusion speed λ
Results

Influence degree and time vs network size
Summary



Handle large-scale networks (power-law
distribution degree)
improve the efficiency of existing algorithms by
an order of magnitude while the loss in
approximation precision is small
Can combine with any existing algorithm to find
influential nodes w.r.t. communities
Related work on Top-K Algorithm





Typical Greedy Algorithm( Kempel et al. KDD2003)
CELF Greedy Algorithm (Leskovec et al. KDD2007)
An improved greedy algorithm (Kimura et al.
AAAI2007)
NewGreedy Algorithm, MixedGreedy,
DegreeDiscount Algorithm (Chen et al. KDD2009)
MIA algorithm (Chen et al. KDD2010)
--None of them considers community property
```