Report

Cool with a Gaussian: An ∗ (3 ) volume algorithm Ben Cousins and Santosh Vempala The Volume Problem Given a measurable, compact set K in n-dimensional space, find a number A such that: 1 − vol ≤ ≤ 1 + vol K is given by a point 0 ∈ , s.t. 0 + ⊆ a membership oracle: answers YES/NO to “ ∈ ? " Volume: first attempt Divide and conquer: Difficulty: number of parts grows exponentially in n. More generally: Integration Input: integrable function f: → + specified by an oracle, point x, error parameter ε . Output: number A such that: 1 − ∫ ≤ ≤ (1 + )∫ Volume is the special case when f is a 0-1 function. High-dimensional problems Integration (volume) Optimization Learning Rounding Sampling All hopelessly intractable in general, even to approximate. High-dimensional problems Input: A set of points S in n-dimensional space or a distribution in A function f that maps points to real values (could be the indicator of a set) What is the complexity of computational problems as the dimension grows? Dimension = number of variables Typically, size of input is a function of the dimension. Structure Q. What structure makes high-dimensional problems computationally tractable? (i.e., solvable with polynomial complexity) Convexity and its extensions appear to be the frontier of polynomial-time solvability. Volume: second attempt: Sandwiching Thm (John). Any convex body K has an ellipsoid E s.t. ⊆ ⊆ . E = maximum volume ellipsoid contained in K. Thm (KLS95). For a convex body K in isotropic position, +1 ⊆⊆ + 1 Also a factor n sandwiching, but with a different ellipsoid. Isotropic position and sandwiching For any convex body K (in fact any set/distribution with bounded second moments), we can apply an affine transformation so that for a random point x from K : = 0, = . Thus K “looks like a ball” up to second moments. How close is it really to a ball? K lies between two balls with radii within a factor of n. Volume via Sandwiching The John ellipsoid can be approximated using the Ellipsoid algorithm, s.t. ⊆ ⊆ 1.5 The Inertial ellipsoid can be approximated to within any constant factor (we’ll see how) Using either one, ⊆ ⊆ (1) Polytime algorithm, Can we do better? ≤ ≤ () . approximation Complexity of Volume Estimation Thm [E86, BF87]. For any deterministic algorithm that uses at most membership calls to the oracle for a convex body K and computes two numbers A and B such that A ≤ vol K ≤ B, there is some convex body for which the ratio B/A is at least log n 2 where c is an absolute constant. Thm [DF88]. Computing the volume of an explicit polytope ≤ is #P-hard, even for a totally unimodular matrix A and rational b. Complexity of Volume Estimation Thm [BF]. For deterministic algorithms: # oracle calls approximation factor Thm [Dadush-V.13]. Matching upper bound of 1 + in time 1 poly(). Randomized Volume/Integration [DFK89]. Polynomial-time randomized algorithm that estimates volume to within relative error 1 + with 1 1 probability at least 1 − in time poly(n, , log ). [Applegate-K91]. Polytime randomized algorithm to estimate integral of any (Lipshitz) logconcave function. Volume Computation: an ongoing adventure Dyer-Frieze-Kannan 89 23 Lovász-Simonovits 90 Applegate-K 90 L 90 DF 91 LS 93 KLS 97 5 LV 03,04 LV 06 Cousins-V. 13, 14 Power New aspects everything 16 localization 10 logconcave integration 10 ball walk 8 error analysis 7 multiple improvements speedy walk, isotropy 4 annealing, wt. isoper. 4 integration, local analysis 3 Gaussian cooling Does it work? [Lovász-Deák 2012] implemented [LV] 4 algorithm worked for cubes up to dimension 9 but too slow after that. [CV13] Matlab implementation of a new algorithm Rotated cubes Volume: third attempt: Sampling Pick random samples from ball/cube containing K. Compute fraction c of sample in K. Output c.vol(outer ball). Need too many samples! Volume via Sampling [DFK89] ⊆ ⊆ . Let = ∩ 2/ , = 0, 1, … , = log . vol(K1 ) vol(K 2 ) vol(K m ) vol K = vol B . … . vol(K 0 ) vol(K1 ) vol(K m−1 ) Estimate each ratio with random samples. Volume via Sampling = ∩ 2/ , = 0, 1, … , = log . vol(K1 ) vol(K 2 ) vol(K m ) vol K = vol B . … . vol(K 0 ) vol(K1 ) vol(K m−1 ) Claim. vol K i+1 ≤ 2. vol K i . Total #samples = . 2 But, how to sample? = ∗ 2 . Sampling Generate a uniform random point from a compact set S or with density proportional to a function f. Numerous applications in diverse areas: statistics, networking, biology, computer vision, privacy, operations research etc. Sampling Input: function f: → + specified by an oracle, point x, error parameter ε. Output: A point y from a distribution within distance ε of distribution with density proportional to f. Logconcave functions : → + is logconcave if for any , ∈ , + 1 − ≥ Examples: 1− Indicator functions of convex sets are logconcave Gaussian density function exponential function Level sets of f, = ∶ ≥ , are convex. Product, minimum, convolution preserve logconcavity Algorithmic Applications Given a blackbox for sampling logconcave densities, we get efficient algorithms for: Rounding Convex Optimization Volume Computation/Integration some Learning problems Rounding via Sampling Sample m random points from K; Compute sample mean and sample covariance matrix 1. 2. 3. = ( − − ). = 1 2 − Output = . B(K-z) is nearly isotropic. Thm. C().n random points suffice to get − 2 [Adamczak et al; improving on Bourgain, Rudelson] I.e., for any unit vector v, 1 + ≤ 2 ≤ 1 + . ≤ . How to Sample? Ball walk: At x, -pick random y from + -if y is in K, go to y Hit-and-Run: At x, -pick a random chord L through x -go to a random point y on L Complexity of Sampling Thm. [KLS97] For a convex body, the ball walk with an M-warm start reaches an (independent) nearly random point in poly(n, R, M) steps. 0 = sup = 0 0 Thm. [LV03]. Same holds for arbitary logconcave density functions. From a warm start, complexity is ∗ (2 2 ). Isotropic transformation makes R=O( ). KLS volume algorithm: × × 3 = 5 Markov chains State space K set of measurable subsets that form a -algebra, i.e., closed under finite unions and intersections A next step distribution . associated with each point u in the state space. A starting point. 0 , 1 , … , , … s.t. ∈ 0 , 1 , … , −1 ) = ( ∈ | −1 ) Convergence Stationary distribution Q, ergodic “flow” is: Φ = \A () For any subset , we have Φ = Φ(\A) Conductance: ∫ \A () = min , \A = inf () 1 Rate of convergence is bounded by 2 [LS93, JS86]. Conductance Arbitrary measurable subset S. How large is the conditional escape probability from S? Local conductance can be arbitrarily small for the ball walk. vol + ∩ ℓ = vol( ) Conductance Need: Nearby points have overlapping one-step distributions Large subsets have large boundaries [isoperimetry] 3 ≥ 1 , 2 min 1 , 2 where 2 = | | 2 Isoperimetry and the KLS conjecture 3 ≥ (1 , 2 ) min 1 , 2 A = (( − )( − ) ) : covariance matrix of 2 = − Thm. [KLS95]. 3 ≥ Conj. [KLS95]. 3 ≥ 2 1 = = () (1 , 2 ) min 1 , (2 ) (1 , 2 ) min 1 , (2 ) KLS hyperplane conjecture = ( ) Conj. [KLS95]. 3 ≥ 1 (1 , 2 ) min 1 , (2 ) • Could improve sampling complexity by a factor of n • Implies well-known conjectures in convex geometry: slicing conjecture and thin-shell conjecture • But wide open! KLS, Slicing, Thin-shell thin shell slicing KLS current bound 1/3 [Guedon-Milman] 1/4 [Bourgain, Klartag] ~ 1/3 [Bobkov; Eldan-Klartag] All are conjectured to be O(1). Conjectures are equivalent! [Ball, Eldan-Klartag]. Is rapid mixing possible? Ball walk can have bad starts, but Hit-and-run escapes from corners Min distance isoperimetry is too coarse Average distance isoperimetry How to average distance? ℎ ≤ , ∶ ∈ 1 , ∈ 2 , ∈ ℓ(, ) Thm.[LV04] 3 ≥ ℎ (1 )(2 ) Hit-and-run mixes rapidly Thm [LV04]. Hit-and-run mixes in polynomial time from any starting point inside a convex body. 1 Conductance = Ω Along with isotropic transformation, gives ∗ 3 sampling algorithm. Simulated Annealing [LV03, Kalai-V.04] To estimate ∫ consider a sequence 0 , 1 , 2 , … , = with ∫ 0 being easy, e.g., constant function over ball. Then, ∫ = ∫ 1 ∫ 2 ∫ ∫ 0 . . … . ∫ 0 ∫ 1 ∫ −1 Each ratio can be estimated by sampling: 1. Sample X with density proportional to 2. Compute = Then, = ∫ +1 +1 . ∫ = ∫ +1 . ∫ Annealing [LV06] Define: = − 0 = 2, +1 = / 1 + ~ log(2/) phases 1 , = 2 ∫ 1 ∫ 2 ∫ 0 . . … . ∫ 0 ∫ 1 ∫ −1 The final estimate could be Ω() , so each ratio could be or higher. How can we estimate it with a few samples?! Annealing [LV03, 06] = − 0 = 2, +1 = / 1 + Lemma. = +1 1 , = 2 < 4 2. Although expectation of Y can be large (exponential even), we need only a few samples to estimate it! LoVe algorithm: × × 3 = 4 Variance of ratio estimator Let , = Then, = ∫ 2 2 +1 , , = , +1 , , . , ∫ , − = ∫ +1 , ∫ , 2 = +1 , , ∫ , ⋅ 2 , ∫ , ∫ +1 , = ∫ 2+1 − , ∫ , ∫ +1 , 2 1− 1+ = 2 (would be at most 1, if F was logconcave…) 2 Variance of ratio estimator Lemma. For any logconcave f and a >0, the function = ∫ , is also logconcave. So () is logconcave and ( )2 ≥ 1− ( 1 − ) 1 + Therefore: 1− 1+ 2 for ≤ 1 . ≤ 1 1− 1+ ≤ 1 1−2 ≤ 4. ( 1 + ) Volume Computation: an ongoing adventure Dyer-Frieze-Kannan 89 23 Lovász-Simonovits 90 Applegate-K 90 L 90 DF 91 LS 93 KLS 97 5 LV 03,04 LV 06 Cousins-V. 13, 14 Power New aspects everything 16 localization 10 logconcave integration 10 ball walk 8 error analysis 7 multiple improvements speedy walk, isotropy 4 annealing, wt. isoper. 4 integration, local analysis 3 Gaussian cooling Gaussian sampling/volume Sample from Gaussian restricted to K Compute Gaussian measure of K Anneal with a Gaussian Define = − 2 22 1 , increase Start with 0 small ~ Compute ratios of integrals of consecutive phases: ∫ +1 ∫ in phases. Gaussian sampling KLS conjecture holds for Gaussian restricted to any convex body (via Brascamp-Lieb inequality). Thm. 3 ≥ (1 , 2 ) min 1 , (2 ) Not enough on its own, but can be used to show: Thm. [Cousins-V. 13]. For 2 = 1 , Ball walk applied to Gaussian (0, 2 ) restricted to any convex body containing the unit ball mixes in ∗ 2 time from a warm start. Speedy walk: a thought experiment Take sequence of points visited by ball walk: 0 , 1 , 2 , 3 , … , , +1 , +3 … Subsequence of “proper” attempts that stay inside K This subsequence is a Markov chain and is rapidly mixing from any point For a warm start, the total number of steps is only a constant factor higher Gaussian volume Theorem [Cousins-V.13] The Gaussian volume of a convex body K containing the unit ball can be estimated in time ∗ 3 . No need to adjust for isotropy! Each step samples a 1-d Gaussian from an interval Can we use this to compute the volume? Gaussian Cooling = 2 0 = − 1 2 , Estimate 2 2 22 = . ∫ +1 ∫ using samples drawn according to ≤ 1, set 2 = 2 −1 1+ For 2 For 2 > 1, set 2 = −1 1+ 1 −1 Gaussian Cooling = − 2 22 2 For 2 ≤ 1, we set 2 = −1 1+ 1 Sampling time: 2 , #phases, #samples per phase: So, total time = 2 × × = 3 Gaussian Cooling = For 2 − 2 22 > 1, we set 2 = Sampling time: 2 2 #phases to double is 2 −1 1+ −1 (too much??) #samples per phase is also So, total time to double is × × 2 2 = 3 !!! Variance of ratio estimator 2 Why can we set 2 as high as −1 1+ 2 − 2 2 −1 ? 2 , = for ∈ 2 = ∫ 2 , Lemma. = 2 , 2 1+, 2 2 for = . = 2 2 1+ 2 2 1+ 1− = = 2 2 2 2 = 1 Variance of ratio estimator 2 2 2 2 1+ 1− = = 2 2 First use localization to reduce to 1-d inequality, for a restricted family of logconcave functions: For ∈ ⋅ and − ≤ ≤ ≤ 2 = + t2 −1 −2 2 dt 2 2 1+ 1− = 2 2 2 2 2 2 2 Variance of ratio estimator 2 2 1+ 1− = 2 2 2 2 2 where 2 = 2 ∫ ∫ + + ⇔ 2 ≤ 2 d 2 −1 −2 2 2 −1 −2 2 Warm start With 2 = 2 −1 1 −1 + , random point from one distribution gives a warm start for the next. 2 , = = for = 2 − 2 2 +1 for ∈ 2 = ∫ 2 , ∫ +1 = ⋅ ⋅ +1 ∫ ∫ 1+ 2 (1 + ) 2 ⋅ 1 + 2 = 1 = 2 2 . Same lemma! Gaussian Cooling [CV14] Accelerated annealing 1+ Thm. The volume of any well-rounded convex body K can be estimated using ∗ (3 ) membership queries. is best possible rate CV algorithm: × × 2 2 × log = ∗ (3 ) Practical volume/integration Start with a concentrated Gaussian Run the algorithm till the Gaussian is nearly flat In each phase, flatten Gaussian as much as possible while keeping variance of ratio of integrals bounded Variance can be estimated with a small constant number of samples If covariance is skewed (as seen by SVD of O(n) points), scale down high variance subspace “Adaptive” annealing (also used in [Stefankovic-Vigoda-V.] for discrete problems) Open questions How true is the KLS conjecture? Open questions When to stop a random walk? (how to decide if current point is “random”?) Faster isotropy/rounding? How to get information before reaching stationary? To make isotropic: run for N steps; transform using covariance; repeat. Open questions How efficiently can we learn a polytope given only random points? With O(mn) points, cannot “see” structure, but enough information to estimate the polytope! Algorithms? For convex bodies: [KOS][GR] need 2Ω points [Eldan] need 2 even to estimate volume! Open questions Can we estimate the volume of an explicit polytope in deterministic polynomial time? ≤