Report

Energy Models for Graph Clustering Bo-Young Kim Applied Algorithm Lab, KAIST Introduction Graph clusters : groups of densely connected nodes A common structure shared by many real-world system Decomposed into subsystems : strong intra-subsystem interaction, weak inter-subsystem interaction. e.g. groups of friends, cohesive modules in software systems. Graph layout : nodes → positions in low-dimensional Euclidean space Goal: Introduce and evaluate measures(=energy models) that quantify how well a given graph layout reflects the graph clusters Node-repulsion LinLog Edge-repulsion LinLog Introduction Interpretability vs. Readability The most energy based graph layout technique : produce easily readable box-and-line visualization of graphs e.g. Goal of Fruchterman and Reingold model(1991) Interpretable layout : positions of Euclidean distances of nodes reflect certain properties of the graph • The density of subgraphs (This case) • Graph-theoretic distances of nodes • Direction of edges in directed graphs Basic Definitions A graph G=(V,E) Assume that connected, undirected, no edge weight A d-dimensional layout of the graph G a vector of node positions The distance of u and v in p : Clustering Criteria A subgraph with many internal edges and few edges to the remaining graph a graph cluster Coupling measures: a smaller coupling a better clustering The cut The node-normalized cut The edge-normalized cut Shi and Malik’s normalized cut Expansion Conductance Newman’s modularity The Cut The cut for two disjoint sets of nodes V1 and V2 Expected cut between V1 and V2 : Biased towards bipartitions : smaller with | V1 | ≪ | V2 | than | V1 | = | V2 | The Node-Normalized Cut The node-normalized cut (=density of the cut, the ratio of the cut) Expected value: Still biased towards bipartitions (measure of size : # edge) smaller with deg(V1) ≪ deg(V2) than deg(V1) = deg(V2) ( ∵ Expected value: ) The Edge-Normalized Cut The edge-normalized cut Expected value: Other Clustering Criteria Shi and Malik’s normalized cut biased toward small clusters when is not fixed. Other Clustering Criteria The expansion The conductance Bias • The expansion : biased towards similarly-sized clusters • The conductance : similar bias (cluster size measure : total deg) Other Clustering Criteria Newman’s Modularity Two sets case If V1∪V2=V, maximizing Q’(V1,V2) = minimizing The LinLog Energy Models The node-repulsion LinLog energy of a layout p The edge-repulsion LinLog energy of a layout p Symmetry : Edges cause both attraction and repulsion Separation of Clusters The arithmetic mean and the geometric mean G=(V,E), F⊆V(2), a layout p Degree-weighted geometric mean : Separation of Clusters (Proof) Let the layout p0 be a solution of the minimization problem: ∙∙∙(1) Let (1) ⇔ ⇔ ⇔ Separation of Clusters ⇔ ⇔ ∙∙∙(2) Separation of Clusters (Proof) Similar to the Theorem 1. Interpretable Distance between Clusters (Proof) p0 : a layout in P with minimum node-repulsion LinLog energy d0 : the distance of V1 and V2 in p0. The distance between all node in V1(resp. V2) = 0 d0 : a minimum of Interpretable Distance between Clusters (Proof) Similar to the Theorem 3. Example 1. Airline Routing Example 2. World Trade Example 3. Dictionary Thank you