Report

1 Incremental Update on Probabilistic Frequent Itemsets in Uncertain Databases Ming-Yen Lin , Cheng-Tai Fu , Sue-Chen Hsueh ICUIMC 2010 2 Outline • • • • • Motivation Problem Definition Method Experimental Result Conclusion 3 Motivation • Most algorithms focus on improving the mining efficiency with the assumption that the database is static. • Some patterns may become obsolete and new ones may emerge due to updates. Re-mining the whole uncertain database from scratch is very time consuming owing to the frequentness probabilities computations. 4 Problem Definition • X:itemsets • P(wj):possibe world • Sup(X,wj):support of itemset X in a possible world wj • The probability that the support of X in the uncertain database equals i, denoted as Pi(X) is 5 (Cont.) • P(w8) = { t1= {a, b}, t2 = {a, c, d}} is = 1.0 × 0.5 × 0.4 × 0.2 × 1.0 = 0.04 • P(w6) = { t1 = {a, b}, t2 = {c, d}} = 1.0 × 0.5 × 1 − 0.4 × 0.2 × 1.0 = 0.06 6 (Cont.) • P2({a}) = P(w3) + P(w4) + P(w7) + P(w8) = 0.16 + 0.04 + 0.16 + 0.04 = 0.4 • P1({c,d}) = P(w2) + P(w4) + P(w6) + P(w8) = 0.06 + 0.04 + 0.06 + 0.04 = 0.2 7 (Cont.) • the probability of itemset X is frequent in the uncertain database, named frequentness probability of X and denoted by Pfreq(X), is 8 (Cont.) • minsup=1.0,minprob=0.5 • Pfreq({a}) = P1({a}) + P2({a}) =0.6 + 0.4 = 1.0 • Pfreq({c, d}) = P1({c ,d}) + P2({c, d}) = 0.2 +0 = 0.2 < minprob 9 Approximation Method • F :cumulative distribution function of the Poisson distribution • :the expected support of itemset X • The frequentness probability of itemset X in an uncertain database D, denoted as Pfreq(X), can be well-approximated by a Poisson distribution as: 10 (Cont.) • THEOREM 1. Pfreq(X), if approximated by Equation ,increases monotonically with 11 P-FUP(Probabilistic-Fast Update pattern) Algorithm • DB: the original database • db: the incremental database(having newly appended transactions is produced after the latest mining, using ms and minprob, in DB) • UD: the union of DB and db 12 (Cont.) • Pass One: Pruning Away the Infrequent 1itemset, Generating C1, and Discovering L’1 • LEMMA 1. A 1-itemset X in L1 is infrequent in UD if and only if . < • LEMMA 2. A 1-itemset X not in L1 can become a probabilistic frequent itemset in UD only if . ≥ 13 (Cont.) 14 (Cont.) • Pass k: Pruning Other Infrequent k-itemset, Pruning Candidates, and Discovering Remaining L’k ≥ 2 • LEMMA 3. If a (k-1)-itemset X is infrequent at the (k-1)-th pass,that is, X is in Lk-1 but not in L’k-1, a probabilistic frequent k-itemset in Lk containing the itemset cannot become a probabilistic frequent itemset in the k-th pass. • LEMMA 4. A k-itemset X in Lk is infrequent in UD if and only if . < • LEMMA 5. A k-itemset X not in Lk can become a probabilistic frequent itemset in UD only if . ≥ 15 (Cont.) 16 Experimental Result 17 Experimental Result 18 Conclusion • In this paper, we have proposed the p-FUP algorithm to efficiently solve the problem.