### Multiple Instance Learning

```Multiple Instance Learning
Outline






Motivation
Multiple Instance Learning (MIL)
Diverse Density
Single Point Concept
Disjunctive Point Concept
SVM Algorithms for MIL






Single Instance Learner (SIL)
Sparse MIL
mi-SVM
MI-SVM
Results
Some Thoughts
Part I: Multiple Instance Learning (MIL)
Motivation


It is not always possible to provide labeled data for
training
Reasons:





Requires substantial human effort
Requires expensive tests
Disagreement among experts
Labeling is not possible at instance level
Objective: present a learning algorithm that can learn
from ambiguously labeled training data
Multiple Instance Learning (MIL)



In MIL, instead of giving the learner labels for the
individual examples, the trainer only labels collections of
examples, which are called bags.
A bag is labeled positive if there is at least one positive
example in it
It is labeled negative if all the examples in it are negative
Negative Bags (Bi-)
Positive Bags (Bi+)
Multiple Instance Learning (MIL)




The key challenge with MIL is coping with the ambiguity
of not knowing which examples in the positive bag are
actually positive and which are not
MIL model was first formalized by Dietterich et al. to deal
with the drug activity prediction problem
Following that, an algorithm called Diverse Density was
developed to provide a solution to MIL
Later, the method was extended to deal real-valued labels
Diverse Density




Diversity Density solves MIL problem by examining the
distribution of the instances
It looks for a point that is close to instances in different
positive bags and that is far from the instances in the
negative bags
Such a point represents the concept that we would like
to learn
Diversity Density is the measure of the intersection of the
positive bags minus the union of the negative bags.
Diversity Density – Molecular Example


Suppose the shape of candidate molecule can be
described by a feature vector
If a molecule is labeled positive, then at least one place
along the manifold it took the right shape to fit into the
target protein
Diversity Density – Molecular Example
Noisy-Or for Estimating the Density


It is assumed that the event can only happen if at least
one of the causations occurred
It is also assumed that the probability of any cause failing
to trigger the event is independent of any other cause
Diverse Density - Formally

By maximizing the Diverse Density we can find the point
of intersection (the desired concept)
where
Alternatively, one can use most-likely-cause estimator
Single Point Concept



A concept that corresponds to single point in feature space
Every Bi+ has at least one instance that is equal to the true
concept corrupted by some Gaussian noise.
Every Bi- has no instances that are equal to the true concept
corrupted by some Gaussian noise
Where
k = number of dimensions in feature space
sk = scaling vector
Disjunctive Point Concept


More complicated concepts are disjunction of d-single
point concepts
A bag is positive if at least one of its instances is in the
concept xt1, xt2 or xtd
Density Surfaces
Part II: SVM Algorithms for MIL
Single Instance Learning MIL

SIL-MIL: Single Instance Learning approach


Applies bag’s label to all instances in the bag
A normal SVM is trained on the resulting dataset
Sparse MIL



All instances from negative bags are real negative instances
A bag is represented as the sum of all its instances normalized by its
1 or 2-norm
Results

Datasets used:




AIMed: sparse dataset created from a corpus of protein-protein
interactions. Contains 670 positive and 1,040 negative bags
CBIR: Content Based Image Retrieval domain. The task is to
categorize images as to whether they contain an object of
interest
MUSK: drug activity dataset. Bags corresponds to molecule,
while bag instances correspond to three dimensional
conformation of same molecule
TST: text categorization dataset in which MEDLINE articles are
represented as bags of overlapping text passages.
Results
mi-SVM




Instance level classification
Treats label instance labels yi as unobserved hidden variable
Goal is to maximize the margin over the unknown instance
labels
Suitable for instance classification
MI-SVM


Bag level classification
Goal is to maximize the bag margin, which is



The “most positive” instance in case of positive bags
The “least negative” instance in case of negative bags
Suitable for bag classification
Results: mi-SVM vs. MI-SVM
Corel image data sets
TREC9 document categorization sets
Some Thoughts



Can find multiple positive concepts in a single bag and
learn these concepts?
Does varying sizes of negative bags have an influence on
the learning algorithm?
Can we re-formulate MIL using Fuzzy Logic?
References

O. Maron and T. Lozano-Pérez, "A framework for multiple-instance
learning," 1998, pp. 570-576.

R. C. Bunescu and R. J. Mooney, "Multiple instance learning for sparse
positive bags," 2007, pp. 105-112.

J.Yang, "Review of Multi-Instance Learning and Its applications," 2008.

S. Andrews, et al., "Support vector machines for multiple-instance learning,"
Advances in neural information processing systems, pp. 577-584, 2003.
```