### Inference via message passing - Indiana University Computer

```CS B553: ALGORITHMS FOR
OPTIMIZATION AND LEARNING
Message Passing / Belief Propagation
MESSAGE PASSING/ BELIEF PROPAGATION
Interpretation of VE steps as passing “messages”
along a graph
 Exact for polytrees



Arbitrary graphs -> clique trees
Loopy belief propagation

Approximate inference
CLUSTER GRAPHS

An undirected graph G
Each node contains a scope Ci  X
 Each factor  in the BN has Scope[]Ci for some Ci
 Two adjacent nodes have non-empty intersection
 Running intersection property:


The set of all nodes in which Ci contains a variable X forms
a connected path in G
CLUSTER TREES
B
A
P(B)
P(A|B)
CLUSTER TREES
B
A,B
A
MESSAGE PASSING INTERPRETATION OF
VE
B
A,B
A
Query variable
B=P(B)
AB=P(A|B)
A=1
CLUSTER TREES & BELIEF PROPAGATION
Sends “message” dB->AB = B
B
A,B
A
B=P(B)
AB=P(A|B)
A=1
CLUSTER GRAPHS & BELIEF
PROPAGATION
Sends “message” dB->AB = B
B
Compute dAB->A = BAB dB->AB
Send to A
A,B
A
B=P(B)
AB=P(A|B)
A=1
CLUSTER GRAPHS & BELIEF
PROPAGATION
Sends “message” dB->AB = B
B
Compute dAB->A = BAB dB->AB
Send to A
Compute bA = A dAB->A
Done
A,B
A
B=P(B)
AB=P(A|B)
A=1
PASSING UP-STREAM
Query variable
B
A,B
A
B=P(B)
AB=P(A|B)
A=1
PASSING UP-STREAM
B
A,B
Sends message dA->AB = A
A
B=P(B)
AB=P(A|B)
A=1
PASSING UP-STREAM
B
Computes dAB->B = AAB dA->AB
A,B
B=P(B)
AB=P(A|B)
(= 1)
Sends message dA->AB = A
A
A=1
PASSING UP-STREAM
Computes bB = B dAB->B
Computes dAB->B = AAB dA->AB
B
A,B
B=P(B)
AB=P(A|B)
(= 1)
Sends message dA->AB = A
A
A=1
MESSAGE PASSING RULES IN CLUSTER
TREES

Init:
Each node Ci contains a factor i=P , where product is
taken over factors assigned to Ci
 Each directed edge maintains message di->j (initially nil)


Repeat while some message into the query variable is
nil:
Pick a node Ci that is “ready” to send to Cj: has received
messages from all neighbors except Cj
 Compute and send the message di->j

k1

k2
k3
ComputeMessage(i,j):
Let Si,j= Ci  Cj
 Return Ci-Sij i PkAdj(i)-{j} dk->i

i
di->j
j
BRANCHING CLUSTER TREE
B
E
ABE
A
J
AJ
AM
M
BRANCHING CLUSTER TREE
B
E
ABE
A
Query variable
J
AJ
AM
M
BRANCHING CLUSTER TREE
B
E
dB->ABE=B (=P(B))
ABE
A
J
AJ
AM
M
BRANCHING CLUSTER TREE
B
E
dB->ABE=B (=P(B))
ABE
A
J
AJ
AM
M
BRANCHING CLUSTER TREE
B
E
dB->ABE=B (=P(B))
ABE
dE->ABE=E (=P(E))
A
J
AJ
AM
M
BRANCHING CLUSTER TREE
B
E
dB->ABE=B (=P(B))
ABE
dE->ABE=E (=P(E))
A
J
AJ
AM
M
BRANCHING CLUSTER TREE
B
E
dB->ABE=B (=P(B))
ABE
dE->ABE=E (=P(E))
A
J
AJ
dM->AM=M (=1)
AM
M
BRANCHING CLUSTER TREE
B
E
dB->ABE=B (=P(B))
ABE
dE->ABE=E (=P(E))
A
J
AJ
dM->AM=M (=1)
AM
M
BRANCHING CLUSTER TREE
B
E
dB->ABE=B (=P(B))
dABE->A=B,E ABE dB->ABE dE->ABE
(=P(A))
ABE
dE->ABE=E (=P(E))
A
J
AJ
dM->AM=M (=1)
AM
M
BRANCHING CLUSTER TREE
B
E
dB->ABE=B (=P(B))
dABE->A=B,E ABE dB->ABE dE->ABE
(=P(A))
ABE
dE->ABE=E (=P(E))
A
J
AJ
dM->AM=M (=1)
AM
M
BRANCHING CLUSTER TREE
B
E
dB->ABE=B (=P(B))
dABE->A=B,E ABE dB->ABE dE->ABE
(=P(A))
ABE
dE->ABE=E (=P(E))
A
J
AJ
dM->AM=M (=1)
AM
dAM->A= M AM dM->AM
(=M P(M|A) = 1)
M
BRANCHING CLUSTER TREE
B
E
dB->ABE=B (=P(B))
dABE->A=B,E ABE dB->ABE dE->ABE
(=P(A))
ABE
dE->ABE=E (=P(E))
A
J
AJ
dA->AJ= A dABE->A dAM->A
(=P(A))
dM->AM=M (=1)
AM
dAM->A= M AM dM->AM
(=M P(M|A) = 1)
M
BRANCHING CLUSTER TREE
B
E
dB->ABE=B (=P(B))
dABE->A=B,E ABE dB->ABE dE->ABE
(=P(A))
ABE
dAJ->A= JAJ dA->AJ
(=JP(J|A) P(A) = P(J))
J
AJ
dA->AJ= A dABE->A dAM->A
(=P(A))
dE->ABE=E (=P(E))
A
dM->AM=M (=1)
AM
dAM->A= M AM dM->AM
(=M P(M|A) = 1)
M
CLUSTER TREE CALIBRATION
Run Message Passing until no more messages to
send
 For all nodes: bi=i PkAdj(i) dk->i is the
unconditional distribution over Ci


Much faster than running a separate query for all
nodes!
Uncalibrated
CALIBRATION
Calibrated
B
E
dB->ABE=P(B)
ABE
dE->ABE=P(E)
dABE->A=P(A)
A
dAJ->A= P(J)
J
bJ= P(J)
AJ
dAM->A= 1
dA->AJ= P(A)
dM->AM=1
AM
M
Uncalibrated
CALIBRATION
Calibrated
B
E
dB->ABE=P(B)
ABE
dE->ABE=P(E)
dABE->A=P(A)
A
dAJ->A= P(J)
J
bJ= P(J)
AJ
dJ->AJ= 1
dAM->A= 1
dA->AJ= P(A)
dM->AM=1
AM
M
Uncalibrated
CALIBRATION
Calibrated
B
E
dB->ABE=P(B)
ABE
dE->ABE=P(E)
dABE->A=P(A)
A
dAJ->A= P(J)
J
bJ= P(J)
AJ
dAM->A= 1
dA->AJ= P(A)
dJ->AJ= 1
bAJ= P(J|A)P(A)*1 = P(A,J)
dM->AM=1
AM
M
Uncalibrated
CALIBRATION
Calibrated
B
E
dB->ABE=P(B)
ABE
dE->ABE=P(E)
dABE->A=P(A)
dAJ->A= JAJ dJ->AJ
(=1)
dAJ->A= P(J)
J
bJ= P(J)
AJ
dJ->AJ= 1
A
dAM->A= 1
dA->AJ= P(A)
bAJ= P(A,J)
dM->AM=1
AM
M
Uncalibrated
CALIBRATION
Calibrated
B
E
dB->ABE=P(B)
ABE
dE->ABE=P(E)
dABE->A=P(A)
dAJ->A= P(J)
J
bJ= P(J)
dAJ->A=1
AJ
A
dM->AM=1
AM
dAM->A= 1
dJ->AJ= 1 dA->AJ= P(A)
bA= P(A)*1*1 = P(A)
bAJ= P(A,J)
M
Uncalibrated
CALIBRATION
Calibrated
B
E
dB->ABE=P(B)
ABE
dABE->A=P(A)
dAJ->A= P(J)
J
bJ= P(J)
dAJ->A=1
AJ
dE->ABE=P(E)
dA->ABE=1
A
dM->AM=1
AM
dAM->A= 1
dJ->AJ= 1 dA->AJ= P(A) bA= P(A)
bAJ= P(A,J)
M
Uncalibrated
CALIBRATION
Calibrated
B
E
dB->ABE=P(B)
ABE
dABE->A=P(A)
dAJ->A= P(J)
J
bJ= P(J)
dAJ->A=1
AJ
dE->ABE=P(E)
bABE= P(A|B,E)P(E)P(B)*1
=P(A,B,E)
=1
dA->ABE
A
dM->AM=1
AM
dAM->A= 1
dJ->AJ= 1 dA->AJ= P(A) bA= P(A)
bAJ= P(A,J)
M
Uncalibrated
CALIBRATION
Calibrated
B
E
dB->ABE=P(B)
ABE
dABE->A=P(A)
dAJ->A= P(J)
J
bJ= P(J)
dAJ->A=1
AJ
dE->ABE=P(E)
bABE= P(A|B,E)P(E)P(B)*1
=P(A,B,E)
=1
dA->ABE
A
dA->AM=P(A)
dM->AM=1
AM
dAM->A= 1
dJ->AJ= 1 dA->AJ= P(A) bA= P(A)
bAJ= P(A,J)
M
Uncalibrated
CALIBRATION
Calibrated
B
E
dB->ABE=P(B)
ABE
dABE->A=P(A)
dAJ->A= P(J)
J
bJ= P(J)
dAJ->A=1
AJ
dE->ABE=P(E)
bABE= P(A|B,E)P(E)P(B)*1
=P(A,B,E)
=1
dA->ABE
A
dA->AM=P(A)
dM->AM=1
AM
dAM->A= 1
dJ->AJ= 1 dA->AJ= P(A) bA= P(A)
bAJ= P(A,J)
M
bAM= P(M|A)P(A)*1
= P(M,A)
Uncalibrated
CALIBRATION
Calibrated
B
E
dB->ABE=P(B)
ABE
dABE->A=P(A)
dAJ->A= P(J)
J
bJ= P(J)
dAJ->A=1
AJ
dE->ABE=P(E)
bABE= P(A|B,E)P(E)P(B)*1
=P(A,B,E)
=1
dA->ABE
A
dA->AM=P(A)
dM->AM=1
AM
dAM->A= 1
dJ->AJ= 1 dA->AJ= P(A) bA= P(A)
bAJ= P(A,J)
M
dAM->M= AP(M|A)P(A) = P(M)
bAM= P(M,A)
INCORPORATING EVIDENCE?
B
E
ABE
A
Evidence J
J
AJ
AM
M
CLUSTER GRAPHS & BELIEF
PROPAGATION
Variables send “influence” to other variables
through messages
 Information about CiCj divides the tree into
conditionally independent pieces
 Exact inference when cluster graph is a tree
 All graphs can be converted into a cluster tree
through VE ordering (clique tree)
 Factors may be large: what about non-trees?

CLUSTER GRAPHS WITH LOOPS
A
AB
AC
B
C
BCD
D
CLUSTERGRAPHS WITH LOOPS
A
AB
AC
B
C
BCD
D
CLUSTER GRAPHS WITH LOOPS
A
Continue as if
messages…
AB
AC
B
C
BCD
D
CLUSTER GRAPHS WITH LOOPS
Do it again…
A
AB
AC
B
C
BCD
D
CLUSTER GRAPHS WITH LOOPS
A
Now send revised
messages from A
given the current
messages
AB
AC
B
C
BCD
D
CLUSTER GRAPHS WITH LOOPS
And repeat…
A
AB
AC
B
C
Does the product of messages into X approach
fBCD
P(X) as more iterations
are performed?
D
LOOPY BELIEF PROPAGATION
In many problems, yes!
 Can construct problems where it doesn’t
 Generalizes to other probabilistic graphical
models (used in physics & material science,
computer vision, sensor nets…)

APPLICATION TO ERROR-CORRECTING
CODES
Send a 3-bit message ABC through a noisy
channel (say p=0.9)
 3 checksums





D = A xor B
E = B xor C
F = D xor E
Observe 6 bits X1…X6
X1
X2
X3
A
B
C
XOR
XOR
D
E
XOR
X4
X5
F
X6
APPLICATION TO ERROR-CORRECTING
CODES
Probability of A,B,C?
 Clever checksums + clever
circuits that perform loopy BP
= turbo codes
 Used widely in communications
(3G, NASA, some Wifi
standards)
 Closer to Shannon limit than
all prior codes!

X1
X2
X3
A
B
C
XOR
XOR
D
E
XOR
X4
X5
F
X6
HAVE A GOOD SPRING BREAK!
```