### An Information-Maximization Approach to Blind Separation and

```An Information-Maximization
Approach to Blind Separation
and Blind Deconvolution
A.J. Bell and T.J. Sejnowski
Computational Modeling of
Intelligence
11.03.11.(Fri)
Summarized by Joon Shik Kim
Abstract
• Self-organizing learning algorithm that
maximizes the information transferred in a
network of nonlinear units.
• The nonlinearities are able to pick up higherorder moments of the input distribution and
perform true redundancy reduction between
units in the output representation.
• We apply the network to the source separaton
(or cocktail party) problem, successfully
separating unknown mixtures of up to 10
speakers.
Cocktail Party Problem
Introduction (1/2)
• The development of informationtheoretic unsupervised learning rules for
neural networks
• The use, in signal processing, of higherorder statistics for separating out
mixtures of independent sources (blind
separation) or reversing the effect of an
unknown filter (blind deconvolution)
Introduction (2/2)
• The approach we take to these problem is a
generalization of Linsker’s informax principle to
nonlinear units with arbitrarily distributed inputs.
• When inputs are to be passed through a
sigmoid function, maximum information
transmission can be achieved when the sloping
part of the sigmoid is optimally lined up with
the high density parts of the inputs.
• Generation of this rule to multiple units leads to
a system that, in maximizing information transfer,
also reduces the redundancy between the units
in the output layer.
Information Maximization
• The basic problem is how to maximize
the mutual information that the output Y
of a neural network processor contains
• I(Y,X)=H(Y)-H(Y|X)
Information Maximization
• Information between inputs and outputs
can be maximized by maximizing the
entropy of the outputs alone.


I (Y , X ) 
H (Y )
w
w
• H(Y|X) tends to minus infinity as the
noise variance goes to zero.
For One Input and One Output
• When we pass a single input x through a
transforming function g(x) to give an
output variable y, both I(y,x) and H(y) are
maximized when we align high density
parts of the probability density function
(pdf) of x with highly sloping parts of the
function g(x).
For One Input and One Output
For One Input and One Output
For an N→N Network
Inverse of a Matrix
i j

(1) Det ( Aij ) 
1
A  transposeof 



Det
(
A
)


1
a b
1  d b 

 


ad  bc  c a 
c d
Blind Separation and Blind
Deconvolution
Blind Separation Results
Different Aspects from Previous
work
• There is no noise, or rather, there is no
noise model in this system.
• There is no assumption that inputs or
outputs have Gaussian statistics.
• The transfer function is in general
nonlinear.
Conclusion
• The learning rule is decidedly nonlocal.
Each “neuron” must know the cofactor
either of all the weights entering it, or all
those leaving it. The network rule remains
unbiological.
• We believe that the information
maximization approach presented here
could serve as a unifying framework that
brings together several lines of research,
and as a guiding principle for further