Report

Learning Multiplicative Interactions many slides from Hinton Two different meanings of “multiplicative” • If we take two density models and multiply together their probability distributions at each point in data-space, we get a “product of experts”. – The product of two Gaussian experts is a Gaussian. • If we take two variables and we multiply them together to provide input to a third variable we get a “multiplicative interaction”. – The distribution of the product of two Gaussiandistributed variables is NOT Gaussian distributed. It is a heavy-tailed distribution. One Gaussian determines the standard deviation of the other Gaussian. – Heavy-tailed distributions are the signatures of multiplicative interactions between latent variables. Learning multiplicative interactions • It is fairly easy to learn multiplicative interactions if all of the variables are observed. – This is possible if we control the variables used to create a training set (e.g. pose, lighting, identity …) • It is also easy to learn energy-based models in which all but one of the terms in each multiplicative interaction are observed. – Inference is still easy. • If more than one of the terms in each multiplicative interaction are unobserved, the interactions between hidden variables make inference difficult. – Alternating Gibbs can be used if the latent variables form a bi-partite graph. Higher order Boltzmann machines (Sejnowski, ~1986) • The usual energy function is quadratic in the states: − = + < • But we could use higher order interactions: − = + ℎ ℎ ,,ℎ • Hidden unit h acts as a switch. When h is on, it switches in the pairwise interaction between unit i and unit j. – Units i and j can also be viewed as switches that control the pairwise interactions between j and h or between i and h. Using higher-order Boltzmann machines to model image transformations (Memisevic and Hinton, 2007) • A global transformation specifies which pixel goes to which other pixel. • Conversely, each pair of similar intensity pixels, one in each image, votes for a particular global transformation. image transformation image(t) image(t+1) Using higher-order Boltzmann machines to model image transformations • For binary images, a simple energy function that captures all possible correlations between the components of , , is , ; = − ℎ (1) • Using this energy function, we can now define the joint distribution , | over outputs and hidden variables by exponentiating and normalizing: 1 (2) , | = exp(− , ; ) () 其中， = , exp(− , ; ) • From Eqs. 1 and 2, we get ℎ |, = ( |, = ( ) ℎ ) Making the reconstruction easier • Condition on the first image so that only one visible group needs to be reconstructed. – Given the hidden states and the previous image, the pixels in the second image are conditionally independent. image transformation image(t) image(t+1) The main problem with 3-way interactions • energy function: − = + ,,ℎ ℎ ℎ • There are far too many of them. • We can reduce the number in several straightforward ways: – Do dimensionality reduction on each group before the three way interactions. – Use spatial locality to limit the range of the threeway interactions. • A much more interesting approach (which can be combined with the other two) is to factor the interactions so that they can be specified with fewer parameters. – This leads to a novel type of learning module. Factoring three-way interactions • We use factors that correspond to 3-way outerproducts. = ℎ w jf unfactored E si s j sh wijh i, j,h factored E si s j sh wif w jf whf f i, j,h whf wif Factored 3-Way Restricted Boltzmann Machines For Modeling Natural Images (Ranzato, Krizhevsky and Hinton, 2010) • Joint 3-way model • Model the covariance structure of natural images. The visible units are two identical copies A powerful module for deep learning • Define energy function in terms of 3-way multiplicative interactions between two visible binary units, , , and one hidden binary unit ℎ : , = − ℎ • Model the three-way weights as a sum of “factors”, f, each of which is a three-way outer product = , • The factors are connected twice to the same image through matrices B and C, it is natural to tie their weights further reducing the number of parameters: = , A powerful module for deep learning • So the energy function becomes: , = − ( )2 ( ℎ ) • The parameters of the model can be learned by maximizing the log likelihood, whose gradient is given by: =< >model −< >data • The hidden units conditionally independent given the states of the visible units, and their binary states can be sampled using: ℎ |, = ( 2 + ) • However, given the hidden states, the visible units are no longer independent. Producing reconstructions using hybrid Monte Carlo • Integrate out the hidden units and use the hybrid Monte Carlo algorithm(HMC) on free energy: =− log(1 + exp( 2 + )) Modeling the joint density of two images under a variety of tranformations (Hinton et al., 2011) • describe a generative model of the relationship between two images • The model is defined as a factored three-way Boltzmann machine, in which hidden variables collaborate to define the joint correlation matrix for image pairs Model • Given two real-valued images and , define the matching score of triplets , , : S , , = ( =1 =1 )( )( =1 ℎ ) =1 • Add bias terms to matching score and get energy function: E , , = −S , , 1 1 2 2 − =1 ℎ + ( − ) + ( − ) 2 =1 2 =1 (1) • Exponentiate and normalize energy function: 1 p , , = exp(−E , , ) Model • Marginalize over to get distribution over an image pair , : p , = p , , ∈{0,1} • And the we can get ℎ|, = |, ℎ = (4) |, ℎ = (5) bernoulli( ( ( + + + ) ℎ ; 1.0) ; 1.0) (3) • This shows that among the three sets of variables, computation of the conditional distribution of any 先决条件：数据集( , ) =1 ，学习率 repeat 计算ℎ = ℎ for from 1 to do 计算 = ， = 令ℎ = (ℎ | , 令ℎ = (ℎ | , ) for each ) for each 执行正阶段更新 执行负阶段更新 = − , = − , = − ℎ ℎ , = + , = + , = − , = − , = − ℎ, = + ℎ ℎ , 重新正则化, , , , , = + , = + , = + ℎ, 从(ℎ| , )中采样ℎ end for until 达到收敛条件 从bernoulli(0: 5)中采样 if > 0.5 then 从(| , ℎ)中采样，令 = 从(| , ℎ)中采样，令 = else 从 , ℎ 中采样，令 = 从(| , ℎ)中采样，令 = End if Three-way contrastive Divergence Thank you