### short

```Approximating k-Median via PseudoApproximation
Shi Li
Princeton
Ola Svensson
EPFL
04/20/2013
Wal-mart Stores
in New Jersey
Question :
Suppose you have budget
for 50 stores, how will you
select 50 locations?
k-median
facilities
clients
F : potential facility locations
C : set of clients
k : number of facilities to open
d : metric over F C
find S F, |S | = k
minimize
connection cost
k-median clustering
Known Results: k-median
LP rounding
6.667 [CGTS99]
3.25 [CL12]
Primal-Dual
Local Search
6 [JV99]
3+ε [AGK+01]
4 [JMS03]
4 [CG99]
1+√3+ε ≈ 2.732 [LS13]
 (1+2/e)-hardness of approximation [JMS03]
 2 ≤ LP-GAP ≤ 3 (∃exp. time algorithm)
Uncapacitated Facility Location
k-median
(UFL)
facilities
clients
F : potential facility locations
C : set of clients
\$100
\$100
of facilities
to open
fki :, inumber
 F : cost
for opening
i
d : metric over F C
find S F, |S |= k
minimize
+
facility cost
\$30
\$20
\$100
\$100
connection cost
Known Results: UFL
 Studied in 1960’s in Operations Research







3.16 [STA98]
2.41 [GK99]
3
[JV99]
1.853 [CG99]
1.728 [CG99]
5+ε [Kor00]
1.861 [MMSV01]






1.736 [CS03]
1.61 [JMS02]
1.582 [Svi02]
1.52 [MYZ02]
1.50 [Byr07]
1.488 [Li11]
 1.463-hardness of approximation [GK98]
(1+√3+ε)-approximation on k-median
k-median and UFL
 f = cost of a facility
 f
#open facilities
Given a black-box α-approximation A for UFL
Naïve try : find an f such that A opens k facilities
α-approximation for k-median?
Proof : α ≈1.488 for UFL, α > 1.736 for k-median
k-median and UFL
Naïve try : find an f such that A opens k facilities
2 issues with naïve try :
1. need strong α-approximation for UFL
Normal α-approximation :
strong α-approximation :
F
+
C
F+
C
a a
a
£ OPT
£ OPT
k-median and UFL
Naïve try : find an f such that A opens k facilities
2 issues with naïve try :
1. need strong α-approximation for UFL
2. can not find f s.t. A opens exactly k facilities
S1 : set of k1 < k
facilities
S2 : set of k2 > k
facilities
bi-point solution
bi-point solution
S1
S2
|S1| < k ≤ |S2|
a, b : a|S1| + b|S2| = k, a + b = 1
bi-point solution : aS1+bS2
cost(aS1+bS2) = a cost(S1) + b
cost(S2)
k-median and UFL
2 issues with naïve try :
1. need strong α-approximation for UFL
2. can not find f s.t. A opens exactly k facilities
strong approx. factor
bi-point  integral
final ratio for k-median
[JV]
[JMS]
our result
3
x2
6
2
x2
4
2
dothis
not factor
know of
how
improve
2 istotight
!!
k-median and UFL
strong approx. factor
bi-point
 integral
bi-point
 pseudo-integral
final ratio for k-median
[JV]
[JMS]
our result
3
x2
6
2
x2
4
2
Main this
Lemma
1 : ofsuffice
to give
factor
2 is tight
!! an α-approximate
solution with k+O(1) facilities
Main Lemma 2 : bi-point solution of cost C 
solution of cost
with k+O(1/ε) facilities
Proof of Lemma 1
Main Lemma 1 : suffice to give an α-approximate
solution with k+O(1) facilities
clustering case : simpler proof due to anonymous reviewer
 k-median clustering is easy in practice
 reason : there is a “meaningful” clustering
[Awasthi-Blum-Sheffet] : ε, δ >0 constants,
OPTk-1 ≥ (1+δ)OPTk  can find (1+ε)-approx.
Proof of Lemma 1
 A : α-approx. with k + c facilities
 Apply A to (k-c, F, C, d): k centers, cost ≤ α OPTk-c
 Case 1 : OPTk-c ≤ (1+ε)OPTk , DONE!
 Case 2 : OPTk-c > (1+ε)OPTk
apply [Awasthi-Blum-Sheffet]
OPTk-c
OPTk-i-1
OPTk-i
OPTk
k-c
k-i-1 k-i
k
Main Lemma 2 : bi-point solution of cost C 
solution of cost
with k+O(1/ε) facilities
[JV] bi-point solution of cost C  solution of cost 2C
 based on improving [JV] algorithm
JV algorithm
S1
i
S2 Prob. of opening a facility in S1
τi = nearest facility of i
given : bi-point solution aS1+bS2
select S’2  S2 ,
|S’2| = |S1| = k1
withofprob.
a, open
S1 in S
Prob.
opening
a facility
2
with prob. b, open S’2
randomly open k-k1 facilities in S2 \ S’2
guarantee : either i is open, or τi is open
Analysis of JV algorithm
d1
i1
j d2
≤ d1+d2
i2
i1  S1 ,
i3
either i1 or i3 is open
: j  i2
b × d2
else if i1 open : j  i1
+ a2 × d1
if i2 open
else (i3 open)
E[cost of j] ≤
: j  i3
2
i3  S’2
+ ab × (2d1+d2)
× [cost of j in aS1+bS2]
Our Algorithm
i3
i1
≤ d1+d2
d1
j d2
≤ d1+d2
i2
on average, d1 >> d2
i3
: j  i2
b × d2
else if i1 open : j  i1
+ a2 × d1
if i2 open
else (i3 open)
E[cost of j] ≤
: j  i3
1+ 3
22
+ ab × (2d
(d1+2d
1+d2)
× [cost of j in aS1+bS2]
Our Algorithm
need to guarantee : either i is open, or τi is open
for a star :
either center open, (with prob. a)
τi
i
or all leaves open (with prob. b)
open
ideaeach
: star independently?
may
bighappen
stars: always
the center,
: #openopen
facilities
>k
open each leaf with prob. ≈b
 group of small stars of the same
size : dependent rounding
 each group, open 3 more
facilities than expected
Summary
strong approx. factor
bi-point
 integral
bi-point
 pseudo-integral
final ratio for k-median
[JV]
[JMS]
our result
3
x2
6
2
x2
4
2
Main Lemma 1 : suffice to give an α-approximate
solution with k+O(1) facilities
Main Lemma 2 : bi-point solution of cost C 
solution of cost
with k+O(1/ε) facilities
Open Problems
 gap between integral solution with k+1 open
facilities and LP value(with k open facilities)?
 tight analysis?
 algorithm works for k-means?
Questions?
```