Report

Information complexity and exact communication bounds Mark Braverman Princeton University April 26, 2013 Based on joint work with Ankit Garg, Denis Pankratov, and Omri Weinstein 1 Overview: information complexity • Information complexity :: communication complexity as • Shannon’s entropy :: transmission cost 2 Background – information theory • Shannon (1948) introduced information theory as a tool for studying the communication cost of transmission tasks. communication channel Alice Bob 3 Shannon’s entropy • Assume a lossless binary channel. • A message is distributed according to some prior . • The inherent amount of bits it takes to transmit is given by its entropy = = log 2 (1/[ = ]). X communication channel 4 Shannon’s noiseless coding • The cost of communicating many copies of scales as (). • Shannon’s source coding theorem: – Let be the cost of transmitting independent copies of . Then the amortized transmission cost lim ()/ = . →∞ 5 Shannon’s entropy – cont’d • Therefore, understanding the cost of transmitting a sequence of ’s is equivalent to understanding Shannon’s entropy of . • What about more complicated scenarios? X communication channel Y • Amortized transmission cost = conditional entropy ≔ − (). A simple Easy and example complete! • Alice has uniform 1 , 2 , … , ∈ 1,2,3,4,5 . • Cost of transmitting to Bob is ≈ log 2 5 ⋅ ≈ 2.32. • Suppose for each Bob is given a unifomly random ∈ 1,2,3,4,5 such that ≠ then… cost of transmitting the ’s to Bob is ≈ log 2 4 ⋅ = 2. 7 Meanwhile, in a galaxy far far away… Communication complexity [Yao] • Focus on the two party randomized setting. Shared randomness R A & B implement a functionality (, ). X Y F(X,Y) A e.g. , = “ = ? ” B 8 Communication complexity Goal: implement a functionality (, ). A protocol (, ) computing (, ): Shared randomness R m1(X,R) m2(Y,m1,R) m3(X,m1,m2,R) X Y A Communication costF(X,Y) = #of bits exchanged. B Communication complexity • Numerous applications/potential applications (streaming, data structures, circuits lower bounds…) • Considerably more difficult to obtain lower bounds than transmission (still much easier than other models of computation). • Many lower-bound techniques exists. • Exact bounds?? 10 Communication complexity • (Distributional) communication complexity with input distribution and error : , , . Error ≤ w.r.t. . • (Randomized/worst-case) communication complexity: (, ). Error ≤ on all inputs. • Yao’s minimax: , = max (, , ). 11 Set disjointness and intersection Alice and Bob each given a set ⊆ 1, … , , ⊆ {1, … , } (can be viewed as vectors in 0,1 ). • Intersection , = ∩ . • Disjointness , = 1 if ∩ = ∅, and 0 otherwise. • is just 1-bit-ANDs in parallel. • ¬ is an OR of 1-bit-ANDs. • Need to understand amortized communication complexity (of 1-bit-AND). Information complexity • The smallest amount of information Alice and Bob need to exchange to solve . • How is information measured? • Communication cost of a protocol? – Number of bits exchanged. • Information cost of a protocol? – Amount of information revealed. 13 Basic definition 1: The information cost of a protocol • Prior distribution: , ∼ . Y X Protocol Protocol π transcript Π B A (, ) = (Π; |) + (Π; |) what Alice learns about Y + what Bob learns about X Mutual information • The mutual information of two random variables is the amount of information knowing one reveals about the other: (; ) = () − (|) • If , are independent, (; ) = 0. • (; ) = (). H(A) I(A,B) H(B) 15 Basic definition 1: The information cost of a protocol • Prior distribution: , ∼ . Y X Protocol Protocol π transcript Π B A (, ) = (Π; |) + (Π; |) what Alice learns about Y + what Bob learns about X Example • is “ = ? ”. • is a distribution where w.p. ½ = and w.p. ½ (, ) are random. Y X MD5(X) [128 bits] X=Y? [1 bit] A B (, ) = (Π; |) + (Π; |) ≈ 1 + 64.5 = 65.5 bits what Alice learns about Y + what Bob learns about X Information complexity • Communication complexity: , , ≔ min ℎ ≤ . • Analogously: , , ≔ inf ℎ ≤ (, ). 18 Prior-free information complexity • Using minimax can get rid of the prior. • For communication, we had: , = max (, , ). • For information , ≔ inf ℎ ≤ max (, ). 19 Connection to privacy • There is a strong connection between information complexity and (informationtheoretic) privacy. • Alice and Bob want to perform computation without revealing unnecessary information to each other (or to an eavesdropper). • Negative results through arguments. 20 Information equals amortized communication • Recall [Shannon]: lim ()/ = . →∞ • [BR’11]: lim ( , , )/ = , , , for > 0. →∞ • For = 0: lim ( , , 0+ )/ = , , 0 . →∞ • [ lim ( , , 0)/ an interesting open →∞ question.] 21 Without priors • [BR’11] For = 0: lim ( , , 0+ )/ = , , 0 . →∞ • [B’12] lim ( , 0+ )/ = , 0 . →∞ 22 Intersection • Therefore , 0+ = ⋅ , 0 ± () • Need to find the information complexity of the two-bit ! 23 The two-bit AND • [BGPW’12] , 0 ≈ 1.4922 bits. • Find the value of , , 0 for all priors . • Find the information-theoretically optimal protocol for computing the of two bits. 24 “Raise your hand when your number is reached” The optimal protocol for AND X {0,1} 1 Y {0,1} B A If X=1, A=1 If X=0, A=U[0,1] 0 If Y=1, B=1 If Y=0, B=U[0,1] “Raise your hand when your number is reached” The optimal protocol for AND X {0,1} 1 Y {0,1} B A If X=1, A=1 If X=0, A=U[0,1] 0 If Y=1, B=1 If Y=0, B=U[0,1] Analysis • An additional small step if the prior is not symmetric (Pr = 10 ≠ Pr[ = 01]). • The protocol is clearly always correct. • How do we prove the optimality of a protocol? • Consider the function (, , 0) as a function of . 27 The analytical view • A message is just a mapping from the current prior to a distribution of posteriors (new priors). Ex: “0”: 0.6 X=0 X=1 Y=0 0.4 0.3 0 Y=0 Y=1 X=0 2/3 1/3 X=1 0 0 1 Y=0 Y=1 X=0 0 0 X=1 0.75 0.25 Y=1 0.2 0.1 “1”: 0.4 Alice sends her bit 28 The analytical view “0”: 0.55 X=0 X=1 Y=0 0.4 0.3 0 Y=0 Y=1 X=0 0.545 0.273 X=1 0.136 0.045 1 Y=0 Y=1 X=0 2/9 1/9 X=1 1/2 1/6 Y=1 0.2 0.1 “1”: 0.45 Alice sends her bit w.p ½ and unif. random bit w.p ½. 29 Analytical view – cont’d • Denote Ψ ≔ (, , 0). • Each potential (one bit) message by either party imposes a constraint of the form: Ψ ≤ , + Pr = 0 ⋅ Ψ 0 + Pr = 1 ⋅ Ψ 1 . • In fact, Ψ is the point-wise largest function satisfying all such constraints (cf. construction 30 of harmonic functions). IC of AND • We show that for described above, Ψ ≔ , satisfies all the constraints, and therefore represents the information complexity of at all priors. • Theorem: represents the informationtheoretically optimal protocol* for computing the of two bits. 31 *Not a real protocol • The “protocol” is not a real protocol (this is why IC has an inf in its definition). • The protocol above can be made into a real protocol by discretizing the counter (e.g. into equal intervals). • We show that the -round IC: 1 , 0 = , 0 + Θ 2 . 32 Previous numerical evidence • [Ma,Ishwar’09] – numerical calculation results. 33 Applications: communication complexity of intersection • Corollary: , 0+ ≈ 1.4922 ⋅ ± . • Moreover: + , 0 ≈ 1.4922 ⋅ + Θ 2 . 34 Applications 2: set disjointness • Recall: , = 1∩=∅ . • Extremely well-studied. [Kalyanasundaram and Schnitger’87, Razborov’92, Bar-Yossef et al.’02]: , = Θ(). • What does a hard distribution for look like? 35 A hard distribution? 0 0 1 1 0 1 0 0 0 1 0 0 1 1 1 1 0 1 1 0 0 1 0 1 0 0 1 1 1 0 0 1 1 1 0 1 0 1 1 0 0 0 Very easy! Y=0 Y=1 X=0 1/4 1/4 X=1 1/4 1/4 36 A hard distribution 0 0 0 1 0 1 0 0 0 1 0 0 1 1 0 1 0 1 1 0 0 1 0 1 0 0 0 1 1 0 0 1 1 1 0 0 0 1 0 0 0 0 At most one (1,1) location! Y=0 Y=1 X=0 1/3 1/3 X=1 1/3 0+ 37 Communication complexity of Disjointness • Continuing the line of reasoning of BarYossef et. al. • We now know exactly the communication complexity of Disj under any of the “hard” prior distributions. By maximizing, we get: • , 0+ = ⋅ ± (), where ≔ max , ≈ 0.4827 … : 1,1 =0 • With a bit of work this bound is tight. 38 Small-set Disjointness • A variant of set disjointness where we are given , ⊂ {1, … } of size ≪ . • A lower bound of Ω is obvious (modulo = Ω()). • A very elegant matching upper bound was known [Hastad-Wigderson’07]: Θ . 39 Using information complexity • This setting corresponds to the prior distribution • • , Y=0 Y=1 X=0 1-2k/n k/n X=1 k/n 0+ 2 Gives information complexity ln 2 2 Communication complexity ⋅ ln 2 ⋅ ; ± . Overview: information complexity • Information complexity :: communication complexity as • Shannon’s entropy :: transmission cost Today: focused on exact bounds using IC. 41 Selected open problems 1 • The interactive compression problem. • For Shannon’s entropy we have → . • E.g. by Huffman’s coding we also know that ≤ < + 1. • In the interactive setting , , 0+ → , , 0 . • But is it true that (, , 0+ ) ≲ , , 0 ?? Interactive compression? • (, , 0+ ) ≲ , , 0 is equivalent to , ,0+ (, , 0+ ) ≲ , the “direct sum” problem for communication complexity. • Currently best general compression scheme [BBCR’10]: protocol of information cost and communication cost compressed to ( ⋅ ) bits of communication. 43 Interactive compression? • (, , 0+ ) ≲ , , 0 is equivalent to , ,0+ (, , 0+ ) ≲ , the “direct sum” problem for communication complexity. • A counterexample would need to separate IC from CC, which would require new lower bound techniques [Kerenidis, Laplante, Lerays, Roland, Xiao’12]. 44 Selected open problems 2 • Given a truth table for , a prior , and an ≥ 0, can we compute (, , )? • An uncountable number of constraints, need to understand structure better. • Specific ’s with inputs in 1,2,3 × {1,2,3}. • Going beyond two players. 45 External information cost • (, ) ~ . X C Y Protocol Protocol π transcript Π A B , = Π; ≥ Π; + (Π; |) what Charlie learns about (, ) External information complexity • , , 0 ≔ inf (, ). • Conjecture: Zero-error communication scales like external information: , ,0 lim →∞ = , , 0 ? • Example: for / this value is log 2 3 ≈ 1.585 > 1.492. 47 Thank You! 48