### Probability in AI - Temple University

```Made by: Maor Levy, Temple University 2012
1


Probability expresses uncertainty.
Pervasive in all of Artificial Intelligence
 Machine learning
 Information Retrieval (e.g., Web)
 Computer Vision
 Robotics

Based on mathematical calculus.

Probability of a fair coin:
1
P(COIN  tail ) 
2
1
P( tail ) 
2
2

Example: Probability of cancer
 P(has cancer) = 0.02
 P(¬has cancer) = 0.98

Multiple events: cancer, test result
 P(has cancer, test positive)

The problem with joint distributions: it takes
2 − 1 numbers to specify them!
3

Conditional Probability describes the cancer
test:
 P(test positive | has cancer) = 0.9
 P(test positive | ¬has cancer) = 0.2

Put this together with: Prior probability
 P(has cancer) = 0.02
 P(test negative | has cancer) = 0.1

Total probability is a fundamental rule relating
marginal probabilities to conditional
probabilities.
  =

=  ( = )
4

In summary:
 P(has cancer) = 0.02
 P(¬has cancer) = 0.98
 P(test positive | has cancer) = 0.9
 P(test positive | ¬has cancer) = 0.2
 P(test negative | has cancer) = 0.1
 P(test negative | ¬has cancer) = 0.8



P(cancer) and P(Test positive | cancer) is called
the model.
Calculating P(Test positive) is called
prediction.
Calculating P(Cancer | test positive) is called
diagnostic reasoning.
5

A belief network consists of:
◦ A
directed acyclic graph with nodes labeled with random variables
◦ a domain for each random variable
◦ a set of conditional probability tables for each variable
◦ given its parents (including prior probabilities for nodes with

no parents).
A belief network is a graph: the nodes are random
variables; there is an arc from the parents of each
node into that node.
◦ A belief network is automatically acyclic by construction.
◦ A belief network is a directed acyclic graph (DAG) where nodes
are random variables.
◦ The parents of a node n are those variables on which n directly
depends.
◦ A belief network is a graphical representation of dependence
and independence:
 A variable is independent of its non-descendants given its parents.
6



Whether l1 is lit (L1_lit) depends only on the
status of the light (L1_st) and whether there is
power in wire w0. Thus, L1_lit is independent
of the other variables given L1_st and W0.
In a belief network, W0 and L1_st are parents
of L1_lit.
Similarly, W0 depends
only on whether there
is power in w1, whether
there is power in w2,
the position of switch
s2 (S2_pos), and the
status of switch s2 (S2_st).
7

To represent a domain in a belief network, you
need to consider:
 What are the relevant variables?
 What will you observe?
 What would you like to find out (query)?
 What other features make the model simpler?
 What values should these variables take?
 What is the relationship between them? This should
be expressed in terms of local influence.
 How does the value of each variable depend on its
parents? This is expressed in terms of the
conditional probabilities.
8

The power network can be used in a number
of ways:
◦ Conditioning on the status of the switches and
circuit
◦ breakers, whether there is outside power and the
position of the switches, you can simulate the
lighting.
◦ Given values for the switches, the outside power,
and whether the lights are lit, you can determine the
posterior probability that each switch or circuit
breaker is ok or not.
◦ Given some switch positions and some outputs and
some intermediate values, you can determine the
probability of any other variable in the network.
9

A Bayes network is a form of probabilistic
graphical model. Specifically, a Bayes network
is a directed acyclic graph of nodes
representing variables and arcs representing
dependence relations among the variables.
 A representation of the joint distribution over all the
variables represented by nodes in the graph. Let the
variables be X(1), ..., X(n).

Let parents(A) be the parents of the node A.
 Then the joint distribution for X(1) through X(n) is
represented as the product of the probability
distributions P(Xi | Parents(Xi)) for i = 1 to n:
n
P( X 1  x1 ,..., X n  xn )   P(X i  xi | Parents( X i))
i 1
 If X has no parents, its probability distribution is
said to be unconditional, otherwise it is conditional. 10

Examples of Bayes network:
11



True Bayesians actually consider conditional
probabilities as more basic than joint
probabilities.
It is easy to define P(A|B) without reference to
the joint probability P(A,B).
Bayes’ Rule:
◦ Posterior =

ℎ ∗
ℎ
→  =
Back to the cancer example:
◦ P has cancer test positive) =
◦ =
0.9∗0.02
0.02∗0.9+0.98∗0.2
=
0.018
0.214

ℎ )∗(ℎ )
( )
≈ 0.0841
12


Two variables are independent if:
∀,  ∶     = (, )
It means that the occurrence of one event makes it
neither more nor less probable that the other occurs.
 This says that their joint distribution factors into a product two
simpler distributions

This implies: ∀,  ∶  | = ()
 We write

Independence is a simplifying modeling assumption
 Empirical joint distributions: at best “close” to independent

For example:
 The event of getting a 6 the first time a die is rolled and the
event of getting a 6 the second time are independent.
 By contrast, the event of getting a 6 the first time a die is rolled and
the event that the sum of the numbers seen on the first and second
trials is 8 are not independent.
13


Two events are dependent if the outcome or
occurrence of the first affects the outcome or
occurrence of the second so that the probability is
changed.
Example: A card is chosen at random from a standard
deck of 52 playing cards. Without replacing it, a
second card is chosen. What is the probability that the
first card chosen is a queen and the second card
chosen is a jack?
 Probabilities:
 P(queen on first pick) =
4
52
 P(jack on 2nd pick given queen on 1st pick) =
 P(queen and jack) =
4
52
4
4
4
51
∗ 51 = 663
14

X and Y are conditionally independent given a third
event Z precisely if the occurrence or non-occurrence
of X and the occurrence or non-occurrence of Y are
independent events in their conditional probability
distribution given Z. We write:
15

Why Bayes Networks?





A
P(A)
P(B)
P(C|A,B)
P(D|E)
P(E|C)
B
C
D
E

Joint Distribution of any five variables is: 25 − 1 = 31

In Bayes network:
◦ P(A,B,C,D,E)=P(A)*P(B)*P(C|A,B)*P(D|E)*P(E|C)
◦ Parameters:
1
1
4
2
2
Total of 10
16


The Naïve network has 216 − 1 = 65535
Bayes Network needs only 47 numerical
probabilities to specify the joint.
17

Are X and Y conditionally
independent given evidence
vars {Z}?
Active Triples
Inactive Triples
 Yes, if X and Y “separated” by Z
 Look for active paths from X to Y
 No active paths = independence!

A path is active if each triple
is active:
 Causal chain A B C where B is
unobserved (either direction)
 Common cause A B C where B is
unobserved
 Common effect (aka v-structure)
A B C where B or one of its
descendants is observed


All it takes to block a path is
a single inactive segment
18

Examples:
L
R
B
R
B
T
D
T
T’
Yes
Yes
T’
Yes
Yes
19

Overview:
 Bayes network:
 Graphical representation of joint distributions
 Efficiently encode conditional independencies
 Reduce number of parameters from exponential to linear (in
many cases)
20
```