### Document

```Amit Sethi, EEE, IIT G @ Cepstrum, Oct 16, 2011
1
Objectives:

Understand what is machine learning

Motivate why it has become so important

Identify Types of learning and salient
frameworks, algorithms and their utility

Take a sneak peak at the next set of problems
2

What is learning?

Why learn?

Types of learning and salient frameworks

Frontiers
3

Example: Learning to ride a bicycle
 T: Task of learning to ride a bicycle
 P: Performance of balancing while moving
 E: Experience of riding in many situations

Is it wise to memorize all situations and
appropriate responses by observing an
expert?
4
Improve on task, T, with respect to
performance metric, P, based on experience, E.
T: Playing checkers
P: Percentage of games won against an arbitrary opponent
E: Playing practice games against itself
T: Recognizing hand-written words
P: Percentage of words correctly classified
E: Database of human-labeled images of handwritten words
T: Driving on four-lane highways using vision sensors
P: Average distance traveled before a human-judged error
E: A sequence of images and steering commands recorded while
observing a human driver.
T: Categorize email messages as spam or legitimate.
P: Percentage of email messages correctly classified.
E: Database of emails, some with human-given labels
Source: Introduction to Machine Learning by Raymond J. Mooney
5


Determine f such that yn=f(xn) and g(y, x) is
minimized for unseen values of y and x pairs.
Form of f is fixed, but some parameters can
be tuned:
 So, y=fθ(x), where, x is observed, and y needs to
be inferred
 e.g. y=1, if mx > c, 0 otherwise, so θ = (m,c)

Machine Learning is concerned with designing
algorithms that learn “better” values of θ
given “more” x (and y) for a given problem
6



What is the scope of the task?
How will performance be measured?
How should learning be approached?
 Scalability:
 How can we learn fast?
 How much resources are needed to learn?
 Generalization:
 How will it perform in unseen situations?
 Online learning:
 Can it learn and improve while performing the task?
7









Artificial Intelligence
Data Mining
Probability and Statistics
Information theory
Numerical optimization
Adaptive Control Theory
Neurobiology
Psychology (cognitive, perceptual, dev.)
Linguistics
8

What is learning?

Why learn?

Types of learning and salient frameworks

Frontiers
9


Develop systems that are too difficult/expensive to construct
manually because they require specific detailed skills or
knowledge tuned to a specific task (knowledge engineering
bottleneck).
Develop systems that can automatically adapt and
customize themselves to individual users.
 Personalized news or mail filter
 Personalized tutoring

Discover new knowledge from large databases (data
mining).
 Market basket analysis (e.g. diapers and beer)
 Medical text mining (e.g. migraines to calcium channel blockers to
magnesium)
Source: Introduction to Machine Learning by Raymond J. Mooney
10

Computational studies of learning may help us
understand learning in humans and other
biological organisms.
 Hebbian neural learning
▪ “Neurons that fire together, wire together.”
log(perf. time)
 Power law of practice
log(# training trials)
Source: Introduction to Machine Learning by Raymond J. Mooney
11

Many basic effective and efficient algorithms
available

Large amounts of data available

Large amounts of computational resources
available
Source: Introduction to Machine Learning by Raymond J. Mooney
12
Automatic vehicle navigation
• Road recognition
• Automatic navigation
Speech recognition
• Speech to text
• Automated services over the phone
Face detection
• Facebook face tagging suggestions
• Camera autofocus for portraits
13

What is learning?

Why learn?

Types of learning and salient frameworks

Frontiers
14

Remember, y=fθ(x)?
 y can be continuous or categorical
 y may be known for some x or none at all
 f can be simple (e.g. linear) or complex
 f can incorporate some knowledge of how x was
generated or be blind to the generation
 etc…
15

Supervised learning:
 For, y=fθ(x), a set of xi, yi (usually classes) are known
 Now predict yj for new xj

Examples:
 Two classes of protein with given amino acid sequences
 Labeled male and female face images
16

In a nutshell:
 Input is non-linearly transformed by
hidden layers usually a “fuzzy” linearly
classified combination
 Output is a linear combination of the
hidden layer

Use when:
 Want to model a non-linear function
 Labeled data is available
 Don’t want to write new s/w

Variations:
 Competitive learning for classification
 Many more…
17

In a nutshell:
 Learns optimal boundary
between two classes (red line)

Use when:
 Labeled class data is available
 Want to minimize chance of
error in the test case

Variations:
 Non-linear mapping of the input
vectors using “Kernels”
18

Unsupervised learning:
 For, y=fθ(x), only a set of xi
are known
 Predict y, such that y is
simpler than x but retains its
essence

Examples:
 Clustering (when y is a class
label)
 Dimensionality reduction
(when y is continuous)
19

In a nutshell:
 Grouping a similar objects based

on a definition of similarity
 That is, intra vs. inter cluster
similarity, e.g. distance from
center of the cluster
Use when:
 Class labels are not available, but

you have a desired number of
clusters in mind
Variations:
 Different similarity measures
 Automatic detection of number of
clusters
 Online clustering
20

In a nutshell:
 High dimensional data, where not
all dimensions are independent,
e.g. (x1, x2, x3), where x3=ax1+bx2+c

Use when:
 You want to perform linear
dimensionality reduction

Variations:
 ICA
 Online PCA
21

In a nutshell:
 Learning a lower-dimensional
manifold (e.g. surface) close to
which the data lies

Use when:
 You want to perform non-
linear dimensionality reduction

Variations:
 SOM
22

Generative models:
 For, y=fθ(x), we have some idea of how x was generated given
x and θ

Examples:
 HMMs: Given phonemes and {age, gender}, we know how the
speech can be generated
 Bayesian Networks: Given {gender, age, race} we have some
idea of what a face will look like for different emotions
23

Discriminative Models:
 Do not care about how
the data was generated
 Finding the right
features is of prime
importance
 Followed by finding the
right classifier

Examples:
 SVM
 MLP
Source: “Automatic Recognition of Facial Actions in Spontaneous Expressions” by Bartlett et al in Journal of Multimedia, Sep 2006
24

What is learning?

Why learn?

Types of learning and salient frameworks

Frontiers
25

1980s:










Advanced decision tree and rule learning
Explanation-based Learning (EBL)
Learning and planning and problem solving
Utility problem
Analogy
Cognitive architectures
Resurgence of neural networks (connectionism, backpropagation)
Valiant’s PAC Learning Theory
Focus on experimental methodology
1990s







Data mining
Adaptive software agents and web applications
Text learning
Reinforcement learning (RL)
Inductive Logic Programming (ILP)
Ensembles: Bagging, Boosting, and Stacking
Bayes Net learning
Source: Introduction to Machine Learning by Raymond J. Mooney
26

2000s








Support vector machines
Kernel methods
Graphical models
Statistical relational learning
Transfer learning
Sequence labeling
Collective classification and structured outputs
Computer Systems Applications
▪
▪
▪
▪
Compilers
Debugging
Graphics
Security (intrusion, virus, and worm detection)
 E mail management
 Personalized assistants that learn
 Learning in robotics and vision
Source: Introduction to Machine Learning by Raymond J. Mooney
27
Bioinformatics
• Gene expression prediction (just scratched the surface)
• Automated drug discovery
Speech recognition
• Context recog., e.g. for digital personal assistants (SiRi?)
• Better than Google translate; imagine visiting Brazil
Image and video processing
• Automatic event detection in video
• “Seeing” software for the blind
28
Robotics
• Where is my iRobot?
• Would you raise a “robot” child and make it learn?
Advanced scientific calculations
• Weather modeling through prediction
• Vector field or FEM calculation through prediction
Who knows…
• Always in search of new problems
29

Learning the structure of classifiers

Automatic feature discovery and active
learning

Discovering the limits of learning
 Information theoretic bounds?

Learning that never ends

Explaining human learning

Computer languages with ML primitives
Adapted from: “The Discipline of Machine Learning” by Tom Mitchell, 2006
30
Thank you!
31






Inference: Using a system to get the output variable for a
given input variable
Learning: Changing parameters according to an algorithm
to improve performance
Training: Using machine learning algorithm to learn
function parameters based on input and (optionally)
output dataset known as “training set”
Validation and Testing: Using inference (without training)
to test the performance of the learned system on data
Offline learning: When all training happens prior to
testing, and no learning takes place during testing
Online learning: When learning and testing happen for
the same data
32
```