### slides - Ilya Razenshteyn

```Sketching and
Embedding are
Equivalent for Norms
Alexandr Andoni (Simons Institute)
Robert Krauthgamer (Weizmann Institute)
Ilya Razenshteyn (CSAIL MIT)
1
Sketching
• Compress a massive object to a small sketch
• Rich theories: high-dimensional vectors, matrices, graphs
• Similarity search, compressed sensing, numerical linear algebra
d
• Dimension reduction (Johnson, Lindenstrauss 1984): random
projection on a low-dimensional subspace preserves distances
n
When is sketching possible?
2
Similarity search
• Motivation: similarity search
• Model similarity as a metric
• Sketching may speed-up computation
and allow indexing
• Interesting metrics:
•
•
•
•
Euclidean ℓ2: d(x, y) = (∑i|xi – yi|2)1/2
Manhattan, Hamming ℓ1: d(x, y) = ∑i|xi – yi|
ℓp distances d(x, y) = (∑i|xi – yi|p)1/p for p ≥ 1
Edit distance, Earth Mover’s Distance etc.
3
Sketching metrics
0 1 1 0 … 1
• Alice and Bob each hold a point from a
Alice x
metric space (say x and y)
• Both send s-bit sketches to Charlie
• For r > 0 and D > 1 distinguish
sketch(x)
• d(x, y) ≤ r
• d(x, y) ≥ Dr
• Shared randomness, allow 1%
probability of error
• Trade-off between s and D
y Bob
sketch(y)
Charlie
d(x, y) ≤ r or d(x, y) ≥ Dr?
4
Near Neighbor Search via sketches
• Near Neighbor Search (NNS):
• Given n-point dataset P
• A query q within r from some data point
• Return any data point within Dr from q
• Sketches of size s imply NNS with
space nO(s) and a 1-probe query
• Proof idea: amplify probability of
error to 1/n by increasing the size to
O(s log n); sketch of q determines the
5
Sketching real line
• Distinguish |x – y| ≤ 1 vs.
|x – y| ≥ 1 + ε
• Randomly shifted pieces of
size w = 1 + ε/2
• Repeat O(1 / ε2) times
• Overall:
0
x
1
0
y
• D=1+ε
• s = O(1 / ε2)
6
Sketching ℓp for 0 < p ≤ 2
• (Indyk 2000): can reduce sketching of ℓp with 0 < p ≤ 2 to sketching
reals via random projections
• If (G1, G2, …, Gd) are i.i.d. N(0, 1)’s, then ∑i xiGi – ∑i yiGi is distributed as
‖x - y‖2 • N(0, 1)
• For 0 < p < 2 use p-stable distributions instead
• Again, get D = 1 + ε with s = O(1 / ε2)
• For p > 2 sketching ℓp is hard: to achieve D = O(1) one needs sketch
size to be s = Θ~(d1-2/p) (Bar-Yossef, Jayram, Kumar, Sivakumar 2002),
(Indyk, Woodruff 2005)
7
Anything else?
• A map f: X → Y is an embedding with distortion D’, if for a, b from X:
dX(a, b) / D’ ≤ dY(f(a), f(b)) ≤ dX(a, b)
• Reductions for geometric problems
• If Y has s-bit sketches for approximation D, then for X one gets s bits
and approximation DD’
8
Metrics with good sketches
• A metric X admits sketches with s, D = O(1), if:
• X = ℓp for p ≤ 2
• X embeds into ℓp for p ≤ 2 with distortion O(1)
• Are there any other metrics with efficient sketches (D and s are O(1))?
• We don’t know!
• Some new techniques waiting to be discovered?
• No new techniques?!
9
The main result
If a normed space X admits sketches of size s and approximation D,
then for every ε > 0 the space X embeds (linearly) into ℓ1 – ε with
distortion O(sD / ε)
into
≤ 2 space, if
• A vector spaceEmbedding
X with ‖.‖: X →
R≥0ℓisp,apnormed
• (Kushilevitz,
‖x‖ = 0Ostrovsky,
iff x = 0
For norms
1998)
• Rabani
αx‖
=
|α|‖x‖
(Indyk 2000)
• ‖x + y‖ ≤ ‖x‖ + ‖y‖
Efficient
sketches
• Every norm gives rise
to a metric:
define d(x, y) = ‖x - y‖
10
Sanity check
• ℓp spaces: p > 2 is hard, 1 ≤ p ≤ 2 is easy, p < 1 is not a norm
• Can classify mixed norms ℓp(ℓq): in particular, ℓ1(ℓ2) is easy, while
ℓ2(ℓ1) is hard! (Jayram, Woodruff 2009), (Kalton 1985)
• A non-example: edit distance is not a norm, sketchability is largely
open (Ostrovsky, Rabani 2005), (Andoni, Jayram, Pătraşcu 2010)
ℓq
ℓp
11
No embeddings → no sketches
• In the contrapositive: if a normed space is non-embeddable into ℓ1 – ε,
then it does not have good sketches
• Can convert sophisticated non-embeddability results into lower
bounds for sketches
12
Example 1: the Earth Mover’s
Distance
• For x: R[Δ]×[Δ] → R with ∑i,j xi,j = 0, define the Earth Mover’s Distance
x‖EMD as the cost of the best transportation of the positive part of x
to the negative part (Monge-Kantorovich norm)
• Best upper bounds:
• D = O(1 / ε) and s = Δε (Andoni, Do Ba, Indyk, Woodruff 2009)
• D = O(log Δ) and s = O(1) (Charikar 2002), (Indyk, Thaper 2003), (Naor,
Schechtman 2005)
No embedding into ℓ1 – ε
with distortion O(1)
(Naor, Schechtman 2005)
No sketches with D = O(1)
and s = O(1)
13
Example 2: the Trace Norm
• For an n × n matrix A define the Trace Norm (the Nuclear Norm) ‖A‖
to be the sum of the singular values
• Previously: lower bounds only for certain restricted classes of
sketches (Li, Nguyen, Woodruff 2014)
Any embedding into ℓ1 requires
distortion Ω(n1/2) (Pisier 1978)
Any sketch must satisfy
sD = Ω(n1/2 / log n)
14
The plan of the proof
If a normed space X admits sketches of size s and approximation D,
then for every ε > 0 the space X embeds (linearly) into ℓ1 – ε with
distortion O(sD / ε)
Sketches
Weak embedding
into ℓ2
Information theory
Linear embedding
into ℓ1 – ε
Nonlinear functional analysis
A map f: X → Y is (s1, s2, τ1, τ2)-threshold, if
(1, O(sD), 1, 10)-threshold
• dX(x1, x2) ≤ s1 implies dY(f(x1), f(x2)) ≤ τ1
map from X to ℓ2
• dX(x1, x2) ≥ s2 implies dY(f(x1), f(x2)) ≥ τ2
15
Sketch → Threshold map
X has a sketch of size s
and approximation D
X has no sketches of size
s and approximation D
ℓk
∞(X)
has no sketches
of size Ω(k) and
approximation Θ(sD)
?
There is a (1, O(sD), 1, 10)threshold map from X to ℓ2
No (1, O(sD), 1, 10)threshold map from X to ℓ2
(Andoni, Jayram, Pătraşcu 2010)
(direct sum theorem for
information complexity)
Convex duality
Poincaré-type
inequalities on X
(x1, …, xk) = maxi ‖xi
16
Sketching direct sums
ℓk∞(X) has sketches of size O(s)
and approximation Dk
X has sketches of size s
and approximation D
(σ1, σ2, …, σk) — random ±1’s
Alice
Bob
(a1, a2, …, ak)
∑i σi ai
(b1, b2, …, bk)
∑i σi bi
sketch(∑i σi ai)
maxi ai - bi
≤ ‖∑i σi(ai – bi)‖
≤ ∑i ai - bi
≤ k maxi ai - bi
with probability 1/2
sketch(∑i σi bi) Crucially use the linear structure of X
(not enough to be just a metric!)
17
Threshold map → linear embedding
(1, O(sD), 1, 10)-threshold
map from X to ℓ2
?
Linear embedding into ℓ1 – ε
with distortion O(sD / ε)
Uniform embedding
into ℓ2
(Aharoni, Maurey, Mityagin 1985)
(Nikishin 1973)
g: X → ℓ2 s.t. L(‖x1 – x2‖) ≤ ‖g(x1) – g(x2)‖ ≤ U(‖x1 – x2‖)
where
• L and R are non-decreasing,
• L(t) > 0 for t > 0
• U(t) → 0 as t → 0
18
Threshold map → uniform
embedding
• A map f: X → ℓ2 such that
• ‖x1 - x2 ≤ 1 implies ‖f(x1) - f(x2)‖ ≤ 1
• ‖x1 - x2 ≥ Θ(sD) implies ‖f(x1) - f(x2)‖ ≥ 10
• Building on (Johnson, Randrianarivony 2006)
• 1-net N of X; f Lipschitz on N
• Extend f from N to a Lipschitz function on the whole X
19
Open problems
• Extend to as general class of metrics as possible
• Connection to linear sketches?
• Sketches of the form x → Ax
• Conjecture: sketches of size s and approximation D can be converted to linear
sketches with f(s) measurements and approximation g(D)
• Spaces that admit no non-trivial sketches (s = Ω(d) for D = O(1)): is there
anything besides ℓ∞?
• Can one strengthen our theorem to “sketchability implies embeddability
into ℓ1”?
• Equivalent to an old open problem from Functional Analysis.
• Sketches imply NNS, is there a reverse implication?
20
```