Machine Learning

Report
Machine Learning
Spring 2013
Rong Jin
1
CSE847 Machine Learning


Instructor: Rong Jin
Office Hour:



Textbook





Tuesday 4:00pm-5:00pm
TA, Qiaozi Gao, Thursday 4:00pm-5:00pm
Machine Learning
The Elements of Statistical Learning
Pattern Recognition and Machine Learning
Many subjects are from papers
Web site: http://www.cse.msu.edu/~cse847
2
Requirements


~10 homework assignments
Course project




Topic: visual object recognition
Data: over one million images with extracted
visual features
Objective: build a classifier that automatically
identifies the class of objects in images
Midterm exam & final exam
3
Goal

Familiarize you with the state-of-art in
Machine Learning




Breadth: many different techniques
Depth: Project
Hands-on experience
Develop the way of machine learning thinking


Learn how to model real-world problems by
machine learning techniques
Learn how to deal with practical issues
4
Course Outline
Theoretical Aspects
Practical Aspects
• Information Theory
• Supervised Learning Algorithms
• Optimization Theory
• Unsupervised Learning Algorithms
• Probability Theory
• Important Practical Issues
• Learning Theory
• Applications
5
Today’s Topics



Why is machine learning?
Example: learning to play backgammon
General issues in machine learning
6
Why Machine Learning?


Past: most computer programs are mainly
made by hand
Future: Computers should be able to program
themselves by the interaction with their
environment
7
Recent Trends




Recent progress in algorithm and theory
Growing flood of online data
Computational power is available
Growing industry
8
Big Data Challenge
•
•
2.7 Zetabytes (1021) of data
exists in the digital universe
today.
Huge amount of data
generated on the Internet
every minute
•
•
•
YouTube users upload 48
hours of video,
Facebook users share 684,478
pieces of content,
Instagram users share 3,600
new photos,
http://www.visualnews.com/2012/06/19/how-much-data-created-every-minute/
Big Data Challenge

High dimensional data appears in many
applications of machine learning
Fine grained visual
classification [1]
• 250,000 features
Why Data Size Matters ?

•
Matrix completion
Classification, clustering, recommender systems
Why Data Size Matters ?
•
Matrix can be perfectly recovered provided
the number of observed entries  O(rnlog2(n))
Why Data Size Matters ?
•
The recovery error can be arbitrarily large if
the number of observed entries < O(rnlog(n))
Why Data Size Matters ?
error
O(rnlog (n))
O(rnlog2(n))
Unknown
# observed entries
Alibaba Small and Micro Financial Services
•
Difficult to access finance for small & medium
business
•
Minimum loan
•
Tedious loan approval procedure
•
Low approval rate
•
Long cycle
•
Completely big data driven
•
Leverage e-commerce data to financial services
Shipping Insurance for Returned Products
•
•
•
Insurance contracts has year-on-year growth rate of 100%.
Over 1 billion contracts in 2013
Over 100 million contracts one day on November 11, 2013
Overall rate of compensation
140.00%
120.00%
100.00%
80.00%
60.00%
40.00%
Shipping Insurance for Returned
Products
Fixed rate
 Uniform 5% fixed rate
Simple
Dynamic pricing
 Millions of features, real
time pricing
 Machine learned model
Highly accurate
Actuarial approach
 Solely based on historical
data and demographics
Easy to explain
Data based pricing
 Pricing model based
on a few couple
parameters
Relatively accurate
Three Niches for Machine Learning

Data mining: using historical data to improve
decisions


Software applications that are difficult to program by
hand



Medical records  medical knowledge
Autonomous driving
Image Classification
User modeling

Automatic recommender systems
18
Typical Data Mining Task
Given:
• 9147 patient records, each describing pregnancy and birth
• Each patient contains 215 features
Task:
• Classes of future patients at high risk for Emergency Cesarean Section
19
Data Mining Results
One of 18 learned rules:
If
no previous vaginal delivery
abnormal 2nd Trimester Ultrasound
Malpresentation at admission
Then
probability of Emergency C-Section is 0.6
20
Credit Risk Analysis
Learned Rules:
If
Then
If
Then
Other-Delinquent-Account > 2
Number-Delinquent-Billing-Cycles > 1
Profitable-Costumer ? = no
Other-Delinquent-Account = 0
(Income > $30K or Years-of-Credit > 3)
Profitable-Costumer ? = yes
21
Programs too Difficult to Program By Hand

ALVINN drives 70mph on highways
22
Programs too Difficult to Program By Hand

ALVINN drives 70mph on highways
23
Programs too Difficult to Program By Hand
Visual object recognition
Classify Bird Images
Positive Examples

Train
Negative Examples
Statistical Model

Test



24
Image Retrieval using Texts
25
Software that Models Users
History
What to Recommend?
Description:A homicide detective and a
Description: A high-school boy
fire marshall must stop a pair of murderers
who commit videotaped crimes to become
media darlings
is given the chance to write a story
about an up-and-coming rock band
as he accompanies it on their
concert tour.
Rating:
Description: A biography of sports legend,
Muhammad Ali, from his early days to his
days in the ring
Rating:
Description: Benjamin Martin is drawn
into the American revolutionary war against
his will when a brutal British commander
kills his son.
Rating:
Recommend: ?No
Description: A young
adventurer named Milo Thatch
joins an intrepid group of
explorers to find the mysterious
lost continent of Atlantis.
Recommend: ?Yes
26
Netflix Contest
27
Relevant Disciplines








Artificial Intelligence
Statistics (particularly Bayesian Stat.)
Computational complexity theory
Information theory
Optimization theory
Philosophy
Psychology
…
28
Today’s Topics



Why is machine learning?
Example: learning to play backgammon
General issues in machine learning
29
What is the Learning Problem

Learning = Improving with experience at some task




Improve over task T
With respect to performance measure P
Based on experience E
Example: Learning to Play Backgammon



T: Play backgammon
P: % of games won in world tournament
E: opportunity to play against itself
30
Backgammon



More than 1020 states (boards)
Best human players see only small fraction of all board
during lifetime
Searching is hard because of dice (branching factor > 100)
31
TD-Gammon by Tesauro (1995)


Trained by playing with itself
Now approximately equal to the best human
player
32
Learn to Play Chess



Task T: Play chess
Performance P: Percent of games won in the
world tournament
Experience E:




What experience?
How shall it be represented?
What exactly should be learned?
What specific algorithm to learn it?
33
Choose a Target Function

Goal:


Policy: : b  m
Choice of value
function

B = board
 = real values
V: b, m  
34
Choose a Target Function

Goal:


Policy: : b  m
Choice of value
function


B = board
 = real values
V: b, m  
V: b  
35
Value Function V(b): Example Definition



If b final board that is won:
If b final board that is lost:
V(b) = 1
V(b) = -1
If b not final board
V(b) = E[V(b*)]
where b* is final board after playing optimally
36
Representation of Target Function V(b)
Same value
Lookup table
for each board
(one entry for each board)
Summarize experience into
• Polynomials
• Neural Networks
No Learning
No Generalization
37
Example: Linear Feature
Representation

Features:




Linear function:


pb(b), pw(b) = number of black (white) pieces on board b
ub(b), ub(b) = number of unprotected pieces
tb(b), tb(b) = number of pieces threatened by opponent
V(b) = w0pb(b)+ w1pw(b)+ w2ub(b)+ w3uw(b)+ w4tb(b)+
w5tw(b)
Learning:

Estimation of parameters w0, …, w5
38
Tuning Weights

Given:




board b
Predicted value V(b)
Desired value V*(b)
Calculate
error(b) = (V*(b) – V(b))2
For each board feature fi
wi wi + cerror(b)fi

Stochastically minimizes
b (V*(b)-V(b))2
Gradient Descent Optimization
39
Obtain Boards



Random boards
Beginner plays
Professionals plays
40
Obtain Target Values


Person provides value V(b)
Play until termination. If outcome is




Win: V(b)  1
Loss: V(b)  -1
Draw: V(b)  0
for all boards
for all boards
for all boards
Play one move: b  b’
V(b)  V(b’)

Play n moves: b  b’… b(n)

V(b)  V(b(n))
41
A General Framework
MathematicalM
odeling
Statistics
Finding Optimal
Parameters
+
Optimization
Machine Learning
42
Today’s Topics



Why is machine learning?
Example: learning to play backgammon
General issues in machine learning
43
Importants Issues in Machine Learning

Obtaining experience

How to obtain experience?


How many examples are enough?


PAC learning theory
Learning algorithms




Supervised learning vs. Unsupervised learning
What algorithm can approximate function well, when?
How does the complexity of learning algorithms impact the learning accuracy?
Whether the target function is learnable?
Representing inputs



How to represent the inputs?
How to remove the irrelevant information from the input representation?
How to reduce the redundancy of the input representation?
44

similar documents