An Information-Maximization Approach to Blind Separation and Blind Deconvolution A.J. Bell and T.J. Sejnowski Computational Modeling of Intelligence 11.03.11.(Fri) Summarized by Joon Shik Kim Abstract • Self-organizing learning algorithm that maximizes the information transferred in a network of nonlinear units. • The nonlinearities are able to pick up higherorder moments of the input distribution and perform true redundancy reduction between units in the output representation. • We apply the network to the source separaton (or cocktail party) problem, successfully separating unknown mixtures of up to 10 speakers. Cocktail Party Problem Introduction (1/2) • The development of informationtheoretic unsupervised learning rules for neural networks • The use, in signal processing, of higherorder statistics for separating out mixtures of independent sources (blind separation) or reversing the effect of an unknown filter (blind deconvolution) Introduction (2/2) • The approach we take to these problem is a generalization of Linsker’s informax principle to nonlinear units with arbitrarily distributed inputs. • When inputs are to be passed through a sigmoid function, maximum information transmission can be achieved when the sloping part of the sigmoid is optimally lined up with the high density parts of the inputs. • Generation of this rule to multiple units leads to a system that, in maximizing information transfer, also reduces the redundancy between the units in the output layer. Information Maximization • The basic problem is how to maximize the mutual information that the output Y of a neural network processor contains about its input X. • I(Y,X)=H(Y)-H(Y|X) Information Maximization • Information between inputs and outputs can be maximized by maximizing the entropy of the outputs alone. I (Y , X ) H (Y ) w w • H(Y|X) tends to minus infinity as the noise variance goes to zero. For One Input and One Output • When we pass a single input x through a transforming function g(x) to give an output variable y, both I(y,x) and H(y) are maximized when we align high density parts of the probability density function (pdf) of x with highly sloping parts of the function g(x). For One Input and One Output For One Input and One Output For an N→N Network Inverse of a Matrix i j (1) Det ( Aij ) 1 A transposeof Det ( A ) 1 a b 1 d b ad bc c a c d Blind Separation and Blind Deconvolution Blind Separation Results Different Aspects from Previous work • There is no noise, or rather, there is no noise model in this system. • There is no assumption that inputs or outputs have Gaussian statistics. • The transfer function is in general nonlinear. Conclusion • The learning rule is decidedly nonlocal. Each “neuron” must know the cofactor either of all the weights entering it, or all those leaving it. The network rule remains unbiological. • We believe that the information maximization approach presented here could serve as a unifying framework that brings together several lines of research, and as a guiding principle for further advances.