Separation of Clusters

Energy Models for Graph Clustering
Bo-Young Kim
Applied Algorithm Lab, KAIST
 Graph clusters : groups of densely connected nodes
 A common structure shared by many real-world system
 Decomposed into subsystems : strong intra-subsystem interaction,
weak inter-subsystem interaction.
e.g. groups of friends, cohesive modules in software systems.
 Graph layout : nodes → positions in low-dimensional Euclidean space
 Goal: Introduce and evaluate measures(=energy models) that
quantify how well a given graph layout reflects the graph clusters
 Node-repulsion LinLog
 Edge-repulsion LinLog
 Interpretability vs. Readability
 The most energy based graph layout technique : produce easily
readable box-and-line visualization of graphs
e.g. Goal of Fruchterman and Reingold model(1991)
 Interpretable layout : positions of Euclidean distances of nodes
reflect certain properties of the graph
• The density of subgraphs (This case)
• Graph-theoretic distances of nodes
• Direction of edges in directed graphs
Basic Definitions
 A graph G=(V,E)
 Assume that connected, undirected, no edge weight
 A d-dimensional layout of the graph G
a vector
of node positions
 The distance of u and v in p :
Clustering Criteria
 A subgraph with many internal edges and few edges to the remaining
graph  a graph cluster
 Coupling measures: a smaller coupling  a better clustering
 The cut
 The node-normalized cut
 The edge-normalized cut
 Shi and Malik’s normalized cut
 Expansion
 Conductance
 Newman’s modularity
The Cut
 The cut
for two disjoint sets of nodes V1 and V2
 Expected cut between V1 and V2 :
 Biased towards bipartitions :
smaller with | V1 | ≪ | V2 | than | V1 | = | V2 |
The Node-Normalized Cut
 The node-normalized cut (=density of the cut, the ratio of the cut)
 Expected value:
 Still biased towards bipartitions (measure of size : # edge)
smaller with deg(V1) ≪ deg(V2) than deg(V1) = deg(V2)
( ∵ Expected value:
The Edge-Normalized Cut
 The edge-normalized cut
 Expected value:
Other Clustering Criteria
 Shi and Malik’s normalized cut
 biased toward small clusters when
is not fixed.
Other Clustering Criteria
 The expansion
 The conductance
 Bias
• The expansion : biased towards similarly-sized clusters
• The conductance : similar bias (cluster size measure : total deg)
Other Clustering Criteria
 Newman’s Modularity
 Two sets case
 If V1∪V2=V,
 maximizing Q’(V1,V2)
= minimizing
The LinLog Energy Models
 The node-repulsion LinLog energy of a layout p
 The edge-repulsion LinLog energy of a layout p
 Symmetry : Edges cause both attraction and repulsion
Separation of Clusters
 The arithmetic mean and the geometric mean
 G=(V,E), F⊆V(2), a layout p
 Degree-weighted geometric mean :
Separation of Clusters
(Proof) Let the layout p0 be a solution of the minimization problem:
(1) ⇔
Separation of Clusters
Separation of Clusters
(Proof) Similar to the Theorem 1.
Interpretable Distance between Clusters
(Proof) p0 : a layout in P with minimum node-repulsion LinLog energy
d0 : the distance of V1 and V2 in p0.
The distance between all node in V1(resp. V2) = 0
 d0 : a minimum of
Interpretable Distance between Clusters
(Proof) Similar to the Theorem 3.
Example 1. Airline Routing
Example 2. World Trade
Example 3. Dictionary
Thank you

similar documents