2007 Power Point

By Dan Stalloch
 Association – what could be linked together in away
with something
 Patterns – sequential and time series, shows us how
often certain things occur
 Classification – shows us how data is grouped
 Prediction – the detection of a stable occurrence
within the data that may continue into the future
 Identification – what can be found out by system usage
or what might be present in a thing
 Classification – how the data could be grouped
 Optimization – finding ways to utilize resources
 Apriori – frequent large item sets
 Sampling – small frequent item sets
 Frequent-Pattern (FP) Tree and FP-Growth – better
version of Apriori
Partition – efficient way to use the Apriori algorithm
Decision Tree Induction – constructing a decision tree
from a training data set
k-Means – creates clustering
And others
 Marketing – analyzing customer behavior
 Finance – keeping track of credit and fraud
 Manufacturing – optimizing use of resources
 Health Care – checking patterns for useful information
 http://archive.ics.uci.edu/ml/machine-learning-
 This is a Car database from a depository of databases
made available to everyone through UCI
 When mining a database it is essential to ask what
would you like to be able to predict from it and in this
instance we would like to know which cars have decent
 We might also be able to predict which companies are
likely to stay in business
 We must create or use programs that shows us either a
2-D contingency table or a 3-D contingency table
 We use a formula to decide which areas have the
highest information gain dependent on what we would
like to know. That forumula goes
 like this
 IG(Y|X) = H(Y) - H(Y | X)
 Where H(X) = the entropy of X
 http://www.autonlab.org/tutorials/dtree18.pdf
 http://archive.ics.uci.edu/ml/machine-learning-
 http://www.autonlab.org/tutorials/infogain11.pdf
 Chapter 28 from Fundamentals of Database Systems
6th Edition By Elmasri and Navathe
 Pictures from Andrew W. Moore Slides

similar documents