Report

AMCS/CS 340: Data Mining Association Rules Xiangliang Zhang King Abdullah University of Science and Technology Outline: Mining Association Rules • Motivation and Definition • High computational complexity • Frequent itemsets mining Apriori algorithm – reduce the number of candidate Frequent-Pattern tree (FP-tree) • Rule Generation • Rule Evaluation • Mining rules with multiple minimum supports Xiangliang Zhang, KAUST AMCS/CS 340: Data Mining 2 The Market-Basket Model • A large set of items e.g., things sold in a supermarket • A large set of baskets, each of which is a small set of the items e.g., each basket lists the things a customer buys once • Goal: Find “interesting” connections between items • Can be used to model any many‐many relationship, not just in the retail setting TID Items 1 Bread, Milk 2 3 4 5 Bread, Diaper, Beer, Eggs Milk, Diaper, Beer, Coke Bread, Milk, Diaper, Beer Bread, Milk, Diaper, Coke 3 The Frequent Itemsets • Simplest question: Find sets of items that appear together “frequently” in baskets TID Items 1 Bread, Milk 2 3 4 5 Bread, Diaper, Beer, Eggs Milk, Diaper, Beer, Coke Bread, Milk, Diaper, Beer Bread, Milk, Diaper, Coke 4 Definition: Frequent Itemset • Itemset A collection of one or more items Example: {Milk, Bread, Diaper} k-itemset An itemset that contains k items TID Items 1 Bread, Milk 2 3 4 5 Bread, Diaper, Beer, Eggs Milk, Diaper, Beer, Coke Bread, Milk, Diaper, Beer Bread, Milk, Diaper, Coke • Support count () Frequency of occurrence of an itemset E.g. ({Milk, Bread,Diaper}) = 2 • Support Fraction of transactions that contain an itemset E.g. s({Milk, Bread, Diaper}) = 2/5 • Frequent Itemset An itemset whose support is greater than or equal to a minsup threshold 5 Definition: Frequent Itemset • Itemset A collection of one or more items Example: {Milk, Bread, Diaper} k-itemset An itemset that contains k items TID Items 1 Bread, Milk 2 3 4 5 Bread, Diaper, Beer, Eggs Milk, Diaper, Beer, Coke Bread, Milk, Diaper, Beer Bread, Milk, Diaper, Coke • Support count () Frequency of occurrence of an itemset E.g. ({Milk, Bread,Diaper}) = 2 • Support Fraction of transactions that contain an itemset E.g. s({Milk, Bread, Diaper}) = 2/5 • Frequent Itemset An itemset whose support is greater than or equal to a minsup threshold Example: Set minsup = 0.5 The frequent 2-itemsests : {Bread, Milk}, {Bread, Dipper} {Milk, Diaper}, {Diaper, Beer} 6 Definition: Association Rule Association Rule – If-then rules, an implication expression of the form X Y, where X and Y are itemsets – Example: {Milk, Diaper} {Beer} TID Items 1 Bread, Milk 2 3 4 5 Bread, Diaper, Beer, Eggs Milk, Diaper, Beer, Coke Bread, Milk, Diaper, Beer Bread, Milk, Diaper, Coke Rule Evaluation Metrics Example: – Support (s) of transactions that contain both X and Y {Milk, Diaper} Beer Fraction – Confidence (c) Measures how often items in Y appear in transactions that contain X s c (Milk , Diaper, Beer ) |T| 2 0.4 5 (Milk, Diaper, Beer ) 2 0.67 (Milk , Diaper ) 3 7 Application of Association Rule 1. Items = products; Baskets = sets of products a customer bought; many people buy beer and diapers together Run a sale on diapers; raise price of beer 2. Items = documents; Baskets = documents containing a similar sentence; Items that appear together too often could represent plagiarism 3. Items = words; Baskets = Web pages; Co‐occurrence of relatively rare words may indicate an interesting relationship 8 Outline: Mining Association Rules • Motivation and Definition • High computational complexity • Frequent itemsets mining Apriori algorithm – reduce the number of candidate Frequent-Pattern tree (FP-tree) • Rule Generation • Rule Evaluation • Mining rules with multiple minimum supports Xiangliang Zhang, KAUST AMCS/CS 340: Data Mining 9 Association Rule Mining Task • Given a set of transactions T, the goal of association rule mining is to find all rules having - support ≥ minsup threshold - confidence ≥ minconf threshold • Brute-force approach: - List all possible association rules - Compute the support and confidence for each rule - Prune rules that fail the minsup and minconf thresholds Computationally prohibitive! 10 Mining Association Rules TID Items 1 Bread, Milk 2 3 4 5 Bread, Diaper, Beer, Eggs Milk, Diaper, Beer, Coke Bread, Milk, Diaper, Beer Bread, Milk, Diaper, Coke Example of Rules: {Milk,Diaper} {Beer} (s=0.4, c=0.67) {Milk,Beer} {Diaper} (s=0.4, c=1.0) {Diaper,Beer} {Milk} (s=0.4, c=0.67) {Beer} {Milk,Diaper} (s=0.4, c=0.67) {Diaper} {Milk,Beer} (s=0.4, c=0.5) {Milk} {Diaper,Beer} (s=0.4, c=0.5) Observations: • All the above rules are binary partitions of the same itemset: {Milk, Diaper, Beer} • Rules originating from the same itemset have identical support but can have different confidence • Thus, we may decouple the support and confidence requirements 11 Mining Association Rules Two-step approach: 1. Frequent Itemset Generation – Generate all itemsets whose support minsup 2. Rule Generation – Generate high confidence rules from each frequent itemset, where each rule is a binary partitioning of a frequent itemset Frequent itemset generation is still computationally expensive Xiangliang Zhang, KAUST AMCS/CS 340: Data Mining 12 Computational Complexity Given d unique items in all transactions: • Total number of itemsets = 2d -1 • Total number of possible association rules: d d k d k R k 1 j 1 j k 3d 2d 1 1 d 1 If d=6, R = 602 rules d (#items) can be 100K (Wal‐Mart) Xiangliang Zhang, KAUST AMCS/CS 340: Data Mining 13 Outline: Mining Association Rules • Motivation and Definition • High computational complexity • Frequent itemsets mining Apriori algorithm – reduce the number of candidate Frequent-Pattern tree (FP-tree) • Rule Generation • Rule Evaluation • Mining rules with multiple minimum supports Xiangliang Zhang, KAUST AMCS/CS 340: Data Mining 14 Reduce the number of candidates Complete search of frequent items = 2d -1 Priory principle: X , Y : X Y - If an itemset is frequent, then all of its subsets must also be frequent if Y is frequent, all X are frequent - If an itemset is infrequent, then all of this parent-sets must also be infrequent if X is infrequent, all Y are infrequent Xiangliang Zhang, KAUST AMCS/CS 340: Data Mining 15 Illustrating Apriori Principle Database D TID Items 1 Bread, Milk 2 Bread, Diaper, Beer, Eggs 3 4 Milk, Diaper, Beer, Coke Bread, Milk, Diaper, Beer 5 Bread, Milk, Diaper, Coke Minimum Support = 3/5 Items (1-itemsets) Scan D Item Bread Coke Milk Beer Diaper Eggs Count 4 2 4 3 4 1 Item Bread Milk Beer Diaper Eliminate Count 4 4 3 4 Generate Pairs (2-itemsets) Itemset Count {Bread,Milk} 3 {Bread,Diaper} 3 {Milk,Diaper} 3 {Beer,Diaper} 3 Itemset Count 3 Eliminate {Bread,Milk} {Bread,Beer} 2 {Bread,Diaper} 3 {Milk,Beer} 2 {Milk,Diaper} 3 {Beer,Diaper} 3 Scan D Generate Itemset {Bread,Milk} {Bread,Beer} {Bread,Diaper} {Milk,Beer} {Milk,Diaper} {Beer,Diaper} Triplets (3-itemsets) Itemset {Bread,Milk,Diaper} {Bread,Diaper,Beer} Milk {Mile,Diaper,Beer} Prune Itemset {Bread,Milk,Diaper} Scan D Itemset Count {Bread,Milk,Diaper} 23 Not a frequent 3-itemset Apriori Algorithm • Let k=1 • Generate frequent itemsets of length 1 • Repeat until no new frequent itemsets are identified 1. Generate length (k+1) candidate itemsets from length k frequent itemsets 2. Prune candidate itemsets containing subsets of length k that are infrequent 3. Count the support of each candidate by scanning the DB 4. Eliminate candidates that are infrequent, leaving only those that are frequent Xiangliang Zhang, KAUST AMCS/CS 340: Data Mining 17 Factors Affecting Complexity Choice of minimum support threshold • lowering support threshold results in more frequent itemsets • this may increase number of candidates and max length of frequent itemsets Dimensionality (number of items) of the data set • more space is needed to store support count of each item • if number of frequent items also increases, both computation and I/O costs may also increase Size of database • since Apriori makes multiple passes, run time of algorithm may increase with number of transactions Average transaction width • transaction width increases with denser data sets • This may increase max length of frequent itemsets Xiangliang Zhang, KAUST AMCS/CS 340: Data Mining 18 Outline: Mining Association Rules • Motivation and Definition • High computational complexity • Frequent itemsets mining Apriori algorithm – reduce the number of candidate Frequent-Pattern tree (FP-tree) • Rule Generation • Rule Evaluation • Mining rules with multiple minimum supports Xiangliang Zhang, KAUST AMCS/CS 340: Data Mining 19 Mining Frequent Patterns Without Candidate Generation • Compress a large database into a compact, FrequentPattern tree (FP-tree) structure - highly condensed, but complete for frequent pattern mining - avoid costly database scans • Develop an efficient, FP-tree-based frequent pattern mining method - A divide-and-conquer methodology: decompose mining tasks into smaller ones - Avoid candidate generation: sub-database test only! Xiangliang Zhang, KAUST AMCS/CS 340: Data Mining 20 FP-tree construction Header Table Item frequency a b c d e 8 7 6 5 3 Steps: 1. Scan DB once, find frequent 1-itemset (single item pattern) 2. Scan DB again, construct FP-tree (One transaction one path in the FP-tree) 21 FP-tree construction: pointers Header table Item a b c d e Pointer Pointers are used to assist frequent itemset generation 22 Benefits of the FP-tree Structure Completeness: • never breaks a long pattern of any transaction • preserves complete information for frequent pattern mining Compactness • One transaction one path in the FP-tree • Paths may overlap: the more the paths overlap, the more compression achieved • Never be larger than the original database (if not count node-links and counts) • Best case: only one single tranche of nodes • The size of a FP-tree depends on how the items are ordered Xiangliang Zhang, KAUST AMCS/CS 340: Data Mining 23 Different FP-tree by ordering items Header Table Item frequency a 8 b 7 c 6 d 5 e 3 Header Table Item frequency e 3 d 5 c 6 b 7 a 8 24 Generating frequent itemset FP-Growth Algorithm: Bottom-up fashion to derive the frequent itemsets ending with a particular item Paths can be accessed rapidly using the pointers associated to e 25 Generating frequent itemset FP-Growth Algorithm: Decompose the frequent itemset generation problem into multiple sub-problems Example: find frequent itemsets including e How to solve the sub-problem? Start from paths containing node e Construct conditional tree for e (if e is frequent) Prefix Paths ending in {de} conditional tree for {de} Prefix Paths ending in {ce} conditional tree for {ce} Prefix Paths ending in {ae} conditional tree for {ae} 27 Example: find frequent itemsets including e How to solve the sub-problem? 1. Check e is frequent or not 2. Convert the prefix paths into conditional FP-tree 1) Update support counts 2) Remove node e 3) Ignore infrequent node b Count(e) =3 > Minsup=2 Frequent Items: {e} 28 Example: find frequent itemsets including e How to solve the sub-problem? (prefix paths and conditional tree with {de}) 1. Check d is frequent or not in the prefix paths ending in {de} Count(d,e) =2 (Minsup=2) Frequent Items: {e} {d,e} 29 Example: find frequent itemsets including e How to solve the sub-problem? (prefix paths and conditional tree with {de}) 2. Convert the prefix paths (ending in {de}) into conditional FP-tree for {de} 1) Update support counts 2) Remove node d 3) Ignore infrequent node c Frequent Items: {e} {d,e} 30 Example: find frequent itemsets including e How to solve the sub-problem? (prefix paths and conditional tree with {de}) 1. Check a is frequent or not in the prefix paths ending in {ade} Count(a) =2 (Minsup=2) Frequent Items: {e} {d,e} {a,d,e} 31 Example: find frequent itemsets including e How to solve the sub-problem? (prefix paths and conditional tree with {ce}) 1. Check c is frequent or not in the prefix paths ending in {e} Count(c,e) =2 (Minsup=2) Frequent Items: {e} {d,e} {a,d,e} {c,e} 32 Example: find frequent itemsets including e How to solve the sub-problem? (prefix paths and conditional tree with {ce}) 2. Convert the prefix paths (ending in {ce}) into conditional FP-tree for {ce} 1) Update support counts 2) Remove node c Frequent Items: {e} {d,e} {a,d,e} {c,e} 33 Example: find frequent itemsets including e How to solve the sub-problem? (prefix paths and conditional tree with {ce}) 1. Check a is frequent or not in the prefix paths ending in {ace} Count(a) =1 (Minsup=2) Frequent Items: {e} {d,e} {a,d,e} {c,e} 34 Example: find frequent itemsets including e How to solve the sub-problem? (prefix paths and conditional tree with {ae}) 1. Check a is frequent or not in the prefix paths ending in {ae} Count(a,e) =2 (Minsup=2) Frequent Items: {e} {d,e} {a,d,e} {c,e} {a,e} 35 Mining Frequent Itemset using FP-tree General idea (divide-and-conquer) • Recursively grow frequent pattern path using the FP-tree • At each step, a conditional FP-tree is constructed by updating the frequency counts along the prefix paths and removing all infrequent items Properties • Sub-problems are disjoint No duplicate itemsets • FP-growth is an order of magnitude faster Apriori algorithm (depends on the compaction factor of data) - No candidate generation, no candidate test - Use compact data structure - Eliminate repeated database scan - Basic operation is counting and FP-tree building Xiangliang Zhang, KAUST AMCS/CS 340: Data Mining 36 Compacting the Output of Frequent Itemsets --Maximal vs Closed Itemsets • Maximal Frequent itemsets: no immediate superset is frequent • Closed itemsets: no immediate superset has the same count (> 0). Stores not only frequent information, but exact counts. Frequent Itemsets Closed Frequent Itemsets Maximal Frequent Itemsets 51 Example: Maximal vs Closed Itemsets Xiangliang Zhang, KAUST AMCS/CS 340: Data Mining 52 Outline: Mining Association Rules • Motivation and Definition • High computational complexity • Frequent itemsets mining Apriori algorithm – reduce the number of candidate Frequent-Pattern tree (FP-tree) • Rule Generation • Rule Evaluation • Mining rules with multiple minimum supports Xiangliang Zhang, KAUST AMCS/CS 340: Data Mining 53 Rule Generation • Given a frequent itemset L, find all non-empty subsets f L such that f L – f satisfies the minimum confidence requirement If {A,B,C,D} is a frequent itemset, candidate rules: ABC D, A BCD, AB CD, BD AC, ABD C, B ACD, AC BD, CD AB, ACD B, C ABD, AD BC, BCD A, D ABC BC AD, • If |L| = k, then there are 2k – 2 candidate association rules (ignoring L and L) 54 Rule Generation How to efficiently generate rules from frequent itemsets? In general, confidence does not have an anti-monotone property c(ABC D) can be larger or smaller than c(AB D) s(ABCD) c(ABCD) = ------------s(ABC) s(ABD) c(ABD) =-----------s(AB) only s(ABC) < s(AB), s(ABCD) < s(ABD) (support has monotone property) But confidence of rules generated from the same itemset has an anti-monotone property e.g., L = {A,B,C,D}: c(ABC D) c(AB CD) c(A BCD) Computing the confidence of a rule does not require additional scans of data 55 Rule Generation for Apriori Algorithm Apriori algorithm: • level-wise approach for generating association rules • each level: same number of items in rule consequent • rules with t consequent items are used to generate rules with t+1 consequent items ABCD=>{ } BCD=>A CD=>AB BD=>AC D=>ABC ACD=>B BC=>AD C=>ABD ABD=>C AD=>BC B=>ACD ABC=>D AC=>BD A=>BCD AB=>CD 56 Rule Generation for Apriori Algorithm • Candidate rule is generated by merging two rules that share the same prefix in the rule consequent • Join(CD=>AB,BD=>AC) would produce the candidate rule D => ABC • Prune rule D=>ABC if its subset AD=>BC does not have high confidence CD=>AB BD=>AC D=>ABC Xiangliang Zhang, KAUST AMCS/CS 340: Data Mining 57 Rule Generation for Apriori Algorithm Low-confidence Rules BCD=>A CD=>AB BD=>AC D=>ABC ABCD=>{ } ACD=>B BC=>AD C=>ABD Pruned Rules (all rules containing item A as consequent) ABD=>C AD=>BC B=>ACD ABC=>D AC=>BD AB=>CD A=>BCD 58 Outline: Mining Association Rules • Motivation and Definition • High computational complexity • Frequent itemsets mining Apriori algorithm – reduce the number of candidate Frequent-Pattern tree (FP-tree) • Rule Generation • Rule Evaluation • Mining rules with multiple minimum supports Xiangliang Zhang, KAUST AMCS/CS 340: Data Mining 59 Interesting rules ? • Rules with high support and confidence may be useful, but not “interesting” The “then” part is a frequent action in “if-them” rules Example: {Milk,Beer} {Diaper} (s=0.4, c=1.0) Prob(Diaper) = 4/5 = 0.8 • high interest suggests a cause that might be worth investigating TID Items 1 Bread, Milk 2 3 4 5 Bread, Diaper, Beer, Eggs Milk, Diaper, Beer, Coke Bread, Milk, Diaper, Beer Bread, Milk, Diaper, Coke 60 Computing Interestingness Measure Given a rule X Y, information needed to compute rule interestingness can be obtained from a contingency table Contingency table for X Y Y Y X f11 f10 f1+ X f01 f00 fo+ f+0 |T| f+1 f11: f10: f01: f00: the # of transactions containing both X and Y the # of transactions containing only X, no Y the # of transactions containing only Y, no X the # of transactions containing no X, no Y Used to define various measures support (f11/|T|), confidence(f11/f1+), lift, Gini, J-measure, etc 62 Drawback of Confidence Coffee Coffee Tea 15 5 20 Tea 75 5 80 90 10 100 Association Rule: Tea Coffee Confidence= P(Coffee|Tea) = 15/20 = 0.75 but P(Coffee) = 90/100 = 0.9 Although confidence is high, rule is misleading P(Coffee|Tea) =75/80= 0.9375 63 Example: Lift/Interest Coffee Coffee Tea 15 5 20 Tea 75 5 80 90 10 100 Association Rule: Tea Coffee Confidence= P(Coffee|Tea) = 15/20 = 0.75 but P(Coffee) = 90/100 = 0.9 Lift = 0.75/0.9= 0.8333 (< 1, therefore is negatively associated) 66 There are lots of measures proposed in the literature Tan,Steinbach, Kumar Introduction to Data Mining Outline: Mining Association Rules • Motivation and Definition • High computational complexity • Frequent itemsets mining Apriori algorithm – reduce the number of candidate Frequent-Pattern tree (FP-tree) • Rule Generation • Rule Evaluation • Mining rules with multiple minimum supports Xiangliang Zhang, KAUST AMCS/CS 340: Data Mining 71 Threshold for rules ? • How to set the appropriate minsup threshold? If minsup is set too high, we could miss itemsets involving interesting rare items (e.g., expensive products, jewelry) If minsup is set too low, - it is computationally expensive - the number of itemsets is very large - extract spurious patterns (cross-support patterns having weak correlations, e.g. milk (s=0.7) and caviar (s=0.0004) ) • Using a single minimum support threshold may not be effective Xiangliang Zhang, KAUST AMCS/CS 340: Data Mining 72 Problems with the single minsup • Single minsup: It assumes that all items in the data are of the same nature and/or have similar frequencies. • Not true: In many applications, some items appear very frequently in the data, while others rarely appear. E.g., in a supermarket, people buy food processor and cooking pan much less frequently than they buy bread and milk. Xiangliang Zhang, KAUST AMCS/CS 340: Data Mining 73 Multiple minsups model • Each item i can have a minimum item support MIS(i) • The minimum support of a rule R is expressed as the lowest MIS value of the items that appear in the rule. i.e., a rule R: a1, a2, …, ak ak+1, …, ar satisfies its minimum support if its actual support is min(MIS(a1), MIS(a2), …, MIS(ar)). • By providing different MIS values for different items, the user effectively expresses different support requirements for different rules. Xiangliang Zhang, KAUST AMCS/CS 340: Data Mining 74 Multiple Minimum Support How to apply multiple minimum supports? MS(i): minimum support for item i e.g.: MIS(Milk)=5%, MIS(Coke) = 3%, MIS(Broccoli)=0.1%, MIS(Salmon)=0.5% MIS({Milk, Broccoli}) = min (MIS(Milk), MIS(Broccoli)) = 0.1% Challenge: Support is no longer anti-monotone Suppose: Support(Milk, Coke) = 1.5% and Support(Milk, Coke, Broccoli) = 0.5% {Milk,Coke} is infrequent but {Milk,Coke,Broccoli} is frequent Xiangliang Zhang, KAUST AMCS/CS 340: Data Mining 75 Summary • Association rule mining has been extensively studied in the data mining community. • There are many efficient algorithms and model variations • Other related work includes - Multi-level or generalized rule mining Constrained rule mining Incremental rule mining Maximal frequent itemset mining Numeric association rule mining Rule interestingness and visualization Parallel algorithms … Xiangliang Zhang, KAUST AMCS/CS 340: Data Mining 79 Related Resources • Tools, Softwares A set of software for Frequent pattern Mining: Apriori, Eclat, FPgrowth, RElim, SaM etc. http://www.borgelt.net/fpm.html Frequent Itemset Mining Implementations Repository http://fimi.cs.helsinki.fi/src/ Arules: Mining association rules and frequent itemsets http://cran.r-project.org/web/packages/arules/index.html • Annotated Bibliography on Association Rule Mining http://michael.hahsler.net/research/bib/association_rules/ • Apriori Demo in Silverlight, http://codeding.com/?article=13 Xiangliang Zhang, KAUST AMCS/CS 340: Data Mining 80 References: Frequent-pattern Mining • • • • • • • • • • • R. Agarwal, C. Aggarwal, and V. V. V. Prasad. A tree projection algorithm for generation of frequent itemsets. In Journal of Parallel and Distributed Computing (Special Issue on High Performance Data Mining), 2000. R. Agrawal, T. Imielinski, and A. Swami. Mining association rules between sets of items in large databases. SIGMOD'93, 207-216, Washington, D.C. R. Agrawal and R. Srikant. Fast algorithms for mining association rules. VLDB'94. J. Han, J. Pei, and Y. Yin. Mining frequent patterns without candidate generation. SIGMOD’00. H. Mannila, H. Toivonen, and A. I. Verkamo. Efficient algorithms for discovering association rules. KDD'94. A. Savasere, E. Omiecinski, and S. Navathe. An efficient algorithm for mining association rules in large databases. VLDB'95. C. Silverstein, S. Brin, R. Motwani, and J. Ullman. Scalable techniques for mining causal structures. VLDB'98. R. Srikant and R. Agrawal. Mining generalized association rules. VLDB'95. R. Srikant and R. Agrawal. Mining quantitative association rules in large relational tables. SIGMOD'96. H. Toivonen. Sampling large databases for association rules. VLDB'96. M.J. Zaki, S. Parthasarathy, M. Ogihara, and W. Li. New algorithms for fast discovery of association rules. KDD’97. Xiangliang Zhang, KAUST AMCS/CS 340: Data Mining 81 References: Performance Improvements • • • • • • • • S. Brin, R. Motwani, J. D. Ullman, and S. Tsur. Dynamic itemset counting and implication rules for market basket analysis. SIGMOD'97, 1997. D.W. Cheung, J. Han, V. Ng, and C.Y. Wong. Maintenance of discovered association rules in large databases: An incremental updating technique. ICDE'96, T. Fukuda, Y. Morimoto, S. Morishita, and T. Tokuyama. Data mining using twodimensional optimized association rules: Scheme, algorithms, and visualization. SIGMOD'96, Montreal, Canada. E.-H. Han, G. Karypis, and V. Kumar. Scalable parallel data mining for association rules. SIGMOD'97, Tucson, Arizona. G. Piatetsky-Shapiro. Discovery, analysis, and presentation of strong rules. In G. Piatetsky-Shapiro and W. J. Frawley, Knowledge Discovery in Databases,. AAAI/MIT Press, 1991. S. Sarawagi, S. Thomas, and R. Agrawal. Integrating association rule mining with relational database systems: Alternatives and implications. SIGMOD'98. K. Yoda, T. Fukuda, Y. Morimoto, S. Morishita, and T. Tokuyama. Computing optimized rectilinear regions for association rules. KDD'97. M. J. Zaki, S. Parthasarathy, M. Ogihara, and W. Li. Parallel algorithm for discovery of association rules. Data Mining and Knowledge Discovery, 1:343-374, 1997. Xiangliang Zhang, KAUST AMCS/CS 340: Data Mining What you should know • What is the motivation of association rule mining? • What are the basic steps for mining association rules? • How does Apriori algorithm work? • What is the issue of Apriori algorithm? How to solve it? • How does Frequent-Pattern tree work? • How to generate rules from frequent itemsets? • How to mine rules with multiple minimum supports? • How to evaluate the Rules? Xiangliang Zhang, KAUST AMCS/CS 340: Data Mining 83 Demo 1. Apriori on “census” data http://www.borgelt.net/apriori.html 2. Apriori Demo in Silverlight, http://codeding.com/?article=13 Xiangliang Zhang, KAUST AMCS/CS 340: Data Mining 84