Report

Active Learning for Networked Data Based on Non-progressive Diffusion Model Zhilin Yang, Jie Tang, Bin Xu, Chunxiao Xing Dept. of Computer Science and Technology Tsinghua University, China An Example An Example Instances Correlation An Example ? ? ? Instances ? Correlation ? ? Classify each instance into {+1, -1} An Example ? +1 ? Instances +1 Correlation -1 ? An Example ? +1 ? Instances +1 Correlation -1 ? Query for label An Example ? +1 ? Instances +1 Correlation -1 -1 Problem: Active Learning for Networked Data Challenge+1 ? ? It is expensive to query for labels! Questions Instances +1 Which instances should we select to query? Correalation -1 instances do we need to query, for an accurate classifier? How many ? Challenges Active Learning for Networked Data How to leverage network correlation among instances? How to query in a batch mode? Batch Mode Active Learning for Networked Data Given a graph G (VU ,VL , y L , E , X) Features Matrix Labels of labeled instances Labeled instances Unlabeled instances Edges Our objective is maxVs VU Q(VS ) Subject to The utility function A subset of unlabeled instances | VS | k Labeling budget Factor Graph Model ? ? ? Variable Node ? Factor Node ? ? Factor Graph Model The joint probability Local factor function Edge factor function 1 T T P(y | y L ; θ) exp λ f ( yi , xi ) β g ( yi , y j ) Z ( vi , v j )E vi V Log likelihood of labeled instances O (θ) log y|y L exp θ S - log y L exp θ S T T Factor Graph Model Learning Gradient descent O E P ( y | y L ;θ ) S E P ( y ; θ ) S θ Calculate the expectation: Loopy Belief Propagation (LBP) Message from variable to factor y f (xi ) f Message from factor to variable f y (xi ) ~ y f (x f ) y N ( f )\{ y } y f (x j ) i * N ( yi )\ f 1 f ( xi ) y * i 1 i i j i j Question: How to select instances from Factor graph for active learning? Basic principle: Maximize the Ripple Effects ? ? ? ? ? ? Maximize the Ripple Effects ? ? ? ? ? Labeling information is propagated +1 Maximize the Ripple Effects ? ? ? ? ? Labeling information is propagated +1 Maximize the Ripple Effects ? ? ? ? Labeling information is propagated ? Statistical bias is propagated +1 How to model the propagation process in a unlabeled network? Diffusion Model Linear Threshold Model Each instance has a threshold () Each instance at time has two statuses = 0 (inactive) or 1 (active) Each instance has a set of neighbors () Progressive Diffusion Model = 1 iff ∈() −1 ≥ () or −1 = 1 Non-Progressive Diffusion Model = 1 iff ∈() −1 ≥ () Linear Threshold Maximize the Ripple Effects ? Will it be dominated by labeling Based ?on non-progressive diffusion model ? information (active) or statistical bias (inactive)? Maximize the number of activated instances in the end ? Labeling information is An instance has an uncertainty measure μ() propagated ? We aim to activate the most uncertain instances! Statistical bias is propagated +1 Instantiate the Problem Active Learning Based on Non-Progressive Diffusion Model maxVS VU {maxVT VU | VT |} , With constraints f 0 (v) 1 v VS | VS | k The number of activated instances Initially activate all queried instances M s.t. v VT M , f (v) 1 All instances in should be active after convergence v VU \ VT , u VT , (v) (u ) We activate the most uncertain instances f (v) 1 uN ( v ) f 1 (u ) t (v) Based on the non-progressive diffusion Reduce the Problem The original problem Fix | |, maximize | | The reduced problem Fix | |, minimize | | Constraints are inherited. Reduction procedure Enumerate | | by bisection. Solve the reduced problem. Algorithm The reduced problem Fix | |, minimize | | The key idea Find a superset ( ⊆ ) Such that there exists a subset ( ⊆ ) If we initially activate , we can activate finally Algorithm Input: = , for each instance Output: Initialize to be top uncertain instances; For each iteration: greedily select a set with minimum thresholds from \V , while satisfying the constraint that each instance ∈ has at least () neighbors in ∪ ; ← ∪ ; if = ∅ then converges; Greedily select a set with minimum degrees from , while satisfying the constraint that each instance ∈ has at least () neighbors in ; Return ; Theoretical Analysis Convergence Lemma 1 The algorithm will converge within O (| VU | | VT |) time. Correctness Theorem 1 If the algorithm converges, is a feasible solution, i.e., if we initially label , we will activate finally. Approximation Ratio Theorem 2 Let , be the solution given by the algorithm, , represent the optimal solution. Let Δ be the max degree of instances and suppose ≤ (). Then we have | Vs , g | | Vs ,opt ( )2 | (1 ) Avg[2t (v) d (v)] Experiments Datasets Datasets #Variable node #Factor node Coauthor 6,096 24,468 Slashdot 370 1,686 Mobile 314 513 Enron 100 236 Comparison Methods Batch Mode Active Learning (BMAL), proposed by Shi et al. Influence Maximization Selection (IMS), proposed by Zhuang et al. Maximum Uncertainty (MU) Random (RAN) Max Coverage (MaxCo), our method Experiments Performance Related Work Active Learning for Networked Data Actively learning to infer social ties H. Zhuang, J. Tang, W. Tang, T. Lou, A. Chin and X. Wang Batch mode active learning for networked data L. Shi, Y. Zhao and J. Tang Towards active learning on graphs: an error bound minimization approach Q. Gu and J. Han Integreation of active learing in a collaborative crf O. Martinez and G. Tsechpenakis Diffusion Model On the non-progressive spread of influence through social networks M. Fazli, M. Ghodsi, J. Habibi, P. J. Khalilabadi, V. Mirrokni and S. S. Sadeghabad Maximizing the spread of influence through a social network D. Kempe, J. Kleinberg and E. Tardos Conclusion Connect active learning for networked data to non-progressive diffusion model, and precisely formulate the problem Propose an algorithm to solve the problem Theoretically guarantee the convergence, correctness and approximation ratio of the algorithm Empirically evaluate the performance of the algorithm on four datasets of different genres Future work Consider active learning for networked data in a streaming setting, where data distribution and network structure are changing over time About Me Zhilin Yang [email protected] 3rd year undergraduate at Tsinghua Univ. Applying for PhD programs this year Data Mining & Machine Learning Thanks! [email protected]