Report

Approximating the Community Structure of the Long Tail Akshay Java, Anupam Joshi, Tim Finin Problem Statement Community Structure of the Long Tail Original Adjacency Matrix “Approximate the membership of the blogs in the long tail using only a small portion of the entire graph” Motivation It is expensive to compute the community structure for large networks. Blog graphs are large, but extremely sparse and exhibit the long tail/ core-periphery structure. Head Long Tail CORE PERIPHERY Intuition Communities emerge around the core (A). The membership of the blogs in the long tail (B) can be approximated by the structure of the core and how the long tail links to the core. Detecting Communities Approximation Methods A community is a set of nodes that has more connections within the set than outside it. Normalized Cut 1 1 NCut(A,B) Cut(A,B) Vol(A) Vol(B) L DW I D 2 *W * D Singular Value Decomposition W U r r VrT (reduced rank) Sampling-based Technique By Nyström Approximation (Fowlkes et al.) A B A L T T B C B The graph can be partitioned using the eigenspectrum of the Laplacian. (Shi and Malik) 1 Sorted (degree) Adjacency Matrix 1 2 A UU T B U T 1 T U A U B B T A 1B B TU1 Thus eigenvectors of L can be easily approximated by sampling. Heuristic Method The second smallest eigenvector of the graph Laplacian is the Fiedler vector. Use the head of the distribution to find the initial communities. The graph can be recursively partitioned using the sign of the values in its Fielder vector. Approximate membership of the tail by using number of links to each community in the head. QuickTime™ and a decompressor are needed to see this picture. Evaluation Conclusions and Ongoing Research Conclusion Community structure of the entire network can be well approximated by sampling and heuristic methods. Advantage Achieve fast, accurate approximation using much smaller sample of the entire graph. Modularity Score Ongoing Research A measure of the quality of community detection Q eii a i 2 i Q = (fraction of intra-community edges)(expected fraction, disregarding communities) Applying the approximation on temporal graphs Clustering graph, tag information simultaneously