Report

Social Network Analysis KSE 652 Social Computing Systems: Design and Analysis Uichin Lee Oct. 16, 2013 Network • Networks are a representation of interaction structure among units. – In the case of social and economic networks, these units (nodes) are individuals or firms. • At some broad level, the study of networks can encompass the study of all kinds of interactions. – – – – – – – – – Information transmission. Web links. Information exchange. Trade. Credit and financial flows. Friendship. Trust. Spread of epidemics. Diffusion of ideas and innovation. Themes • Graph theory (e.g., strong/weak ties, 6 degrees of separation, structural balance) • Game theory (e.g., group interaction modeling, auction) • Markets and strategic interaction on networks • Information network (e.g., web, blogging, search engine) • Network dynamics – Population effects (social practice/network effects) – Structural effects (social contagion/cascading effects) Visual Example – 1 The social network of friendships within a 34-person karate club provides clues to the fault lines that eventually split the club apart (Zachary, 1977) Zachary, Wayne W. "An Information Flow Model for Conflict and Fission in Small Groups." Journal of Anthropological Research 33, no. 4 (1977): 452-473. Visual Example – 2 The network of loans among financial institutions can be used to analyze the roles that different participants play in the financial system, and how the interactions among these roles affect the health of individual participants and the system as a whole (Bech and Atalay 2008) The Topology of the Federal Funds Market." European Central Bank Working Paper Series No. 986, December 2008 Visual Example – 3 The network structure of political blogs prior to the 2004 U.S. Presidential election reveals two natural and well-separated clusters (Adamic and Glance, 2005) "The Political Blogosphere and the 2004 U.S. Election: Divided They Blog.“ KDD 2005 Visual Example – 4 In a network representing international trade, one can look for countries that occupy powerful positions and derive economic benefits from these positions In some settings, such as this map of Medieval trade routes, physical networks constrain the patterns of interaction, giving certain participants an intrinsic economic advantage based on their network position. Visual Example – 5 The spread of an epidemic disease (such as the tuberculosis outbreak shown here) is a form of cascading behavior in a network. The similarities and contrasts between biological and social contagion lead to interesting research questions. (Andre et al. 2007) Transmission Network Analysis to Complement Routine Tuberculosis Contact Investigations." American J. of Public Health 97 Visual Example – 6 Social networks based on communication and interaction can be constructed from the traces left by on-line data. In this case, the pattern of email communication among 436 employees of HP Research Lab is superimposed on the official organizational hierarchy Visual Example – 7 • When people are influenced by the behaviors of their neighbors in the network, the adoption of a new product or innovation can cascade through the network structure. • Here, e-mail recommendations for a Japanese graphic novel spread in a kind of informational or social contagion. (Leskovec 2007) "The Dynamics of Viral Marketing." ACM Transactions on the Web 2007 Graphs • Graph: a way of specifying relationships among a collection of items • A graph consists of a set of objects (called nodes); and these objects are connected by links (called edges) Two graph types: (a) an undirected graph, and (b) a directed graph. Graphs • Path: a sequence of nodes with the property that each consecutive pair in the sequence is connected by an edge • Cycle: a path with at least three edges in which the first and last nodes are the same, but otherwise all the nodes are distinct 13-node Internet graph from December 1970 Graphs • Connectivity: a graph is connected if for every pair of nodes, there is a path between them • A directed graph is called weakly connected if replacing all of its directed edges with undirected edges produces a connected (undirected) graph Graphs • Connected component of a graph is a subset of the nodes such that (1) every node in the subset has a path to every other; and (2) the subset is not part of some larger set with the property that every node can reach every other • Giant component contains a large fraction of nodes Collaboration graph of biological research center SGPP Connectivity of directed graphs Giant weakly connected component (GWCC) Disconnected components (DC) Giant in-component (GIN), contains all vertices from which the GSCC is approachable GSCC: Giant strongly connected component: the set of vertices reachable from its every vertex by a directed path Giant strongly connected component of directed networks, http://arxiv.org/pdf/cond-mat/0103629.pdf Giant out-component (GOUT), the set of vertices approachable from the GSCC by a directed path Tendrils: the rest of the GWCC, i.e., the vertices which have no access to the GSCC and are not reachable from it Graphs • The Small-world phenomenon (aka. six degrees of separation); world looks “small” when you think of how short a path of friends it takes to get from you to almost anyone else • Erdos number (distance to Paul Erdos); Bacon number (distance to Kevin Bacon) A histogram from Travers and Milgram’s paper on their small-world experiment. For each possible length (labeled “number of intermediaries” on the x-axis), the plot shows the number of successfully completed chains of that length. In total, 64 chains reached the target person, with a median length of six. Graphs • Small-world network: high clustering and low diameter? ring network “Collective Dynamics of ‘Small-World’ Networks,” Watts and Strogatz, Nature (1998) Strong and Weak Ties • Granovetter’s work on “strength of weak ties” • Issues: – How does information flow through a social network? – How can different nodes play structurally distinct roles in this process? – How these structural considerations shape the evolution of the network itself over time? Triadic Closure • If two people in a social network have a friend in common, then there is an increased likelihood that they will become friends themselves at some point in the future • Why? Opportunity of meetings, trust of relationship, incentive (of removing latent stress of managing different circles) The formation of the edge between B and C illustrates the effects of triadic closure, since they have a common neighbor A. Clustering Coefficient • The clustering coefficient of a node A is defined as the probability that two randomly selected friends of A are friends with each other. A: 1/6 A: 1/2 Bridges and Local Bridges • A bridging edge is an edge whose removal causes its endpoints to lie in two different components • Local bridge: if endpoints have no friends in common – i.e., if removal will increase the distance between two endpoints – this distance (after removal of a local bridge) is called the span of a local bridge bridging edge local bridge Strong Triadic Closure • Each edge of the social network is labeled here as either a strong tie (S) or a weak tie (W), to indicate the strength of the relationship. • The labeling in the figure satisfies the Strong Triadic Closure Property at each node: if the node has strong ties to two neighbors, then these neighbors must have at least a weak tie between them Tie Strength and Network Structure • How to measure tie-strength? • Who-talks-whom network (cellphone; Onnela 2007): total minutes spent on phone calls between the two ends of the edge • Neighborhood overlap of an edge connecting A & B – i.e., (# nodes who are neighbors of both A and B) / (# nodes who are neighbors of at least one of A or B) – The strength of weak ties predicts that neighborhood overlap should grow as tie strength grows Structure and tie strengths in mobile communication networks. J.-P. Onnela, J. Saramaki, J. Hyvonen, G. Szabo, D. Lazer, K. Kaski, J. Kertesz, and A.-L. Barabasi. Proc. Natl. Acad. Sci. USA, 104:7332–7336, 2007. Tie Strength and Network Structure • Neighborhood overlap ratio of A-F Tie Strength on Facebook • Marlow et al.’s analysis on friendship links • Link categories: mutual, one-way communications (ever communicated?), maintained relationships (viewing profile or newsfeeds?) Four different views of a Facebook user’s network neighborhood Cameron Marlow, Lee Byron, Tom Lento, and Itamar Rosenn. Maintained relationships on Facebook, 2009. Tie Strength on Facebook • The number of links corresponding to maintained relationships, one-way communication, and reciprocal communication as a function of the total neighborhood size for users on Facebook Cameron Marlow, Lee Byron, Tom Lento, and Itamar Rosenn. Maintained relationships on Facebook, 2009. Tie Strength on Facebook Self-report tie-strength (of each friend) Usage data Eric Gilbert and Karrie Karahalios , Predicting Tie Strength With Social Media CHI 2009 number of friends Tie Strength on Twitter number of followees The total number of a user’s strong ties (# friends) as a function of the number of followees he or she has on Twitter Bernardo A. Huberman, Daniel M. Romero, and Fang Wu. Social networks that matter: Twitter under the microscope. First Monday, 14(1), January 2009. Embeddedness • Embeddedness of an edge in a network: the number of common neighbors shared by the two endpoints • Sociology research: if two individuals are connected by an embedded edge (i.e., embeddedness > 0), it is easier for them to trust one another Structural Holes • Contrast between densely-knit groups and boundaryspanning links is reflected in the different positions of nodes A and B in the underlying social network (B: local bridge) B’s position offers advantages: • Info access • Amplifier for creativity • Social gatekeeping Social Capital • Somewhat difficult to define, but “Consensus is growing in the literature that social capital stands for the ability of actors to secure benefits by virtue of membership in social networks/social structure” (Alejandro Porte 1998) • Just as a screwdriver (physical capital) or a university education (cultural capital or human capital) can increase productivity (both individual and collective), so do social contacts affect the productivity of individuals and groups (Putnam 2000) • Social capital as a tension between closure (groups) and brokerage (interactions) [Burt 2000], relating to “bonding” and “bridging” social capital, respectively [Putnam 2000] Alejandro Portes. Social capital: Its origins and applications in modern sociology. Annual Review of Sociology, 24:1–24, 1998. Stephen P. Borgatti, Candace Jones, and Martin G. Everett. Network measures of social capital. Connections, 1998. Ronald S. Burt. The network structure of social capital. Research in Organizational Studies, 22:345–423, 2000. Robert D. Putnam. Bowling Alone: The Collapse and Revival of American Community. Simon & Schuster, 2000. Betweenness Measure and Graph Partitioning • Betweenness of an edge: total amount of flow it carries, counting flow between all pairs of nodes using this edge • Flow between A and B divides itself evenly along all the possible shortest paths from A to B (if there are k shortest paths from A and B, then 1/k units of flow pass along each one Betweenness Measure and Graph Partitioning • Breaking a network down into a set of tightly-knit regions with sparser interconnections between the regions • General approaches to graph partitioning – Divisive: identifying/removing “spanning links” between densely-connected regions – Agglomerative: gluing nodes together into regions (bottom up) • Girvan and Newman’s divisive method (greedily removing an edge with highest edge betweeness) Betweenness Measure and Graph Partitioning Homophily • Homophily – Tendency of being similar to our friends; birds of a feather flock together! • Understanding possible link formation: – Intrinsic (structural) factors: a friendship formed as two people are introduced through a common friend • Related to “triadic closure” properties; increased opportunities, trust of relationship, incentives of facilitating relationship – Contextual factors (or social context): a friendship formed as two people attended the same school or work for the same company • How a network’s surrounding contexts can drive the formation of its links? Homophily Homophily can produce a division of a social network into densely-connected, homogeneous parts that are weakly connected to each other. In this social network from a town’s middle school and high school, two such divisions in the network are apparent: one based on race (with students of different races drawn as differently colored circles), and the other based on friendships in the middle and high schools respectively Race, school integration, and friendship segregation in America. American Journal of Sociology, 2001 Measuring Homophily • Testing homophily (i.e., testing the change of random mixing) – In a network with no homophily, friendships are being formed as if there were random mixing across the given characteristics • Example: a network with p% of males, q% of females (p+q=1) – If the fraction of cross-gender edges is significantly less than 2pq, then there is evidence for homophily Using a numerical measure, one can determine whether small networks such as this one (with nodes divided into two types) exhibit homophily: p = 2/3, q = 1/3 2pq = 4/9 = 8/18 => 8 cross gender edges vs. 5 cross gender edges (reality) => This shows some evidence of homophily Mechanisms Underlying Homophily • Selection: individual characteristics drive the formation of links • Social influence: existing links in the network serve to shape people’s (mutable) characteristics • Interplay of selection and social influence – Ex) teenage friends: teenagers seek out social circles composed of people like them + peer pressure causes them to conform to behavioral patterns within their social circles Nicholas A. Christakis and James H. Fowler. The spread of obesity in a large social network over 32 years. New England Journal of Medicine, 357(4):3700–379, July 2007. Homophily and Health • Longitudinal study (32 years) of ~12K people, – Obesity status: BMI (body mass index) = weight / height^2 – Social network structure: list of friends, siblings, spouse, neighbors • Results: non-obese people clustered in the network in a fashion consistent with homophily • Why? Selection (friending w/ similar obesity status), social influence (i.e., behavioral influence among friends), confounding effects (homophily vs. other factors than obesity) Homophily and Spatial Segregation • Global patterns of spatial segregation can arise from the effect of homophily operating at a local level (Schelling model) The tendency of people to live in racially homogeneous neighborhoods produces spatial patterns of segregation that are apparent both in everyday life and when superimposed on a map — as here, in these maps of Chicago from 1940 and 1960. In blocks colored yellow and orange the percentage of AfricanAmericans is below 25, while in blocks colored brown and black the percentage is above 75. Chicago, 1940 Chicago, 1960 Markus M. M¨obius and Tanya S. Rosenblat. The process of ghetto formation: Evidence from Chicago, 2001. Affiliation • Being part of a group (e.g., company, organization); frequenting a particular place; pursuing a particular hobby or interest • These are all activities that (when shared between two people) tend to increase the likelihood that they will interact and hence form a link in the social network • Such activities are called “foci”, i.e., focal points of social interaction, constituting “social, psychological, legal, or physical entities around which joint activities are organized (e.g., workplaces, voluntary organizations, hangouts, etc.)” Affiliation Network • An affiliation network is a bipartite graph that shows which individuals are affiliated with which groups or activities. • Here, Anna participates in both of the social foci on the right, while Daniel participates in only one. Social Affiliation Network • A social-affiliation network shows both the friendships between people and their affiliation with different social foci. • (co-evolution) structural pattern of memberships can reveal subtleties in the interactions among both members and groups Social Affiliation Network • Closure process: closing the third edge of a triangle in the network Triadic closure (selection; forming links to others who Focus Focal closure share characteristics with you) (social influence) Membership closure Tracking Closure Process Online • We can address these questions empirically using network data as follows. – (i) We take two snapshots of the network at different times. – (ii) For each k, we identify all pairs of nodes who have exactly k friends in common in the first snapshot, but who are not directly connected by an edge. – (iii) We define T(k) to be the fraction of these pairs that have formed an edge by the time of the second snapshot. This is our empirical estimate for the probability that a link will form between two people with k friends in common. – (iv) We plot T(k) as a function of k to illustrate the effect of common friends on the formation of links Tracking Closure Process Online • T(0) is the rate at which link formation happens when it does not close a triangle, while the values of T(k) for larger k determine the rate at which link formation happens when it does close a triangle. • Comparison between T(0) and these other values addresses the most basic question about the power of triadic closure. Tracking Closure Process Online: Triadic closure Quantifying the effects of triadic closure in an e-mail dataset T(k) = 1-(1-p)^k T’(k) = 1-(1-p)^k-1 Number of common friends Suppose that for some small probability p, each common friend that two people have gives them an independent probability p of forming a link. So if two people have k friends in common, the probability they fail to form a link on any given day is (1−p)^k Empirical analysis of an evolving social network. Science, 311:88–90, 2006. Tracking Closure Process Online: Focal closure Quantifying the effects of focal closure in an e-mail dataset # common foci (=# of common classes) Empirical analysis of an evolving social network. Science, 311:88–90, 2006. Prob. of joining a community Tracking Closure Process: membership closure # friends who are already member Quantifying the effects of membership closure in a large online dataset: The plot shows the probability of joining a LiveJournal community as a function of the number of friends who are already members. Friendships are designated by users in their profiles, and foci correspond to membership in user-defined LiveJournal communities Group formation in large social networks: Membership, growth, and evolution, KDD 2006 Tracking Closure Process: Interplay • Similarity in Wikipedia: if editor A has edited the Wikipedia articles on Ithaca NY and Cornell University, and editor B has edited the articles on Cornell University and Stanford University, then their similarity under this measure is 1/3, since they have jointly edited one article (Cornell) out of three that they have edited in total (Cornell, Ithaca, and Stanford). Similarity The average similarity of two editors on Wikipedia, relative to the time (0) at which they first communicated. Each unit corresponds to a single Wikipedia action taken by either of the two editors. The curve increases both before and after the first contact at time 0, indicating that both selection and social influence play a role; the increase in similarity is steepest just before time 0. # of edits after first communication Feedback effects between similarity and social influence in online communities, KDD 2008 Structural Balance • Balanced if a complete graph is divided into two sets of mutual friends • Applications: international relationships (politics) mutual antagonism between all sets If a complete graph can be divided into two sets of mutual friends, with complete mutual antagonism between the two sets, then it is balanced. Further, this is the only way for a complete graph to be balanced. A complete graph is weakly balanced precisely when it can be divided into multiple sets of mutual friends, with complete mutual antagonism between each pair of sets. Summary • Graphs: connectivity, component, giant component, small-world phenomenon • Strong/weak ties: triadic closure, clustering coefficients, bridges/local bridges, strong triadic closure, social capital, graph partitioning (betweenness) • Homophily: testing, underlying mechanisms, implications (health, spatial segregation) • Affiliation: social affiliation network; closure process • Structural balance