### Private Empirical Risk Minimization

```Private Empirical Risk Minimization:
Efficient Algorithms and Tight Error
Bounds
Raef Bassily
Penn State
Penn State
(work done at
BU/Harvard)
Yahoo! Labs
(work done at
MSR/Stanford)
Privacy in Statistical Databases
x1
x2
.
.
.
xn
Trusted
curator
Users
(
A
queries
)
Government,
researchers,
(or)
malicious
• Two conflicting goals: Utility vs. Privacy
• Balancing these goals is tricky:
 No control over external sources of information
 Anonymization is unreliable:

[Narayanan-Shmatikov’08],

[Korolova’11],

[Calendrino et al.’12], …
social networks
Differential privacy [Dwork-McSherry-Nissim-Smith’06]
Two regimes:
x1
x2
xn
A
local random coins

A(x)

x1
-differential privacy
A(x')
x2’
A d >0
-differential privacy,
xn
local random coins
Def.:
A randomized
algorithm
A isif they differ
-differentially
private if, for
Datasets
x and x’ are called
neighbors
in one record.
x and
all
data sets
differ
one element,
for all events S,
x ' that
Require:
Neighbor
datasets
induce
closeindistributions
on outputs
Think of
Think independent
of
“Almost same” conclusions will be reached
of whether
any
d = negl(n)
individual opts into or opts out of the data set.
This work
Construct efficient
and
differentially private algorithms for
convex empirical risk minimization
with optimal excess risk
Convex empirical risk minimization
• Dataset x = ( x1,..., xn ) ÎU n.
• Convex set C Í » p.
• Loss function ℓ:C ´U ® » .
(× ; x )is convex for all x.
where ℓ
C
Convex empirical risk minimization
• Goal: Find a “parameter”q
ÎC
• Dataset x = ( x1,..., xn ) ÎU n. that minimizes the empirical risk:
n
1
L(q ;x) = å ℓ(q ; xi )
• Convex set C Í » p.
n i=1
• Loss function ℓ:C ´U ® » .
L(q ;x)
where ℓ × ; x is convex for all x.
(
)
q*
Actual
minimizer
C
Convex empirical risk minimization
• Goal: Find a “parameter”q
ÎC
• Dataset x = ( x1,..., xn ) ÎU n. that minimizes the empirical risk:
n
1
L(q ;x) = å ℓ(q ; xi )
• Convex set C Í » p.
n i=1
• Loss function ℓ:C ´U ® » .
L(q ;x)
where ℓ × ; x is convex for all x.
(
)
• Output qˆ such that
Excess
risk
q*
Actual
minimizer
qˆ
Output
C
Examples
• Median
• Linear regression
2
1 n
L (q ; X ) = å( yi - áxi , q ñ)
n i=1
• Support vector machine
where h(z) = (1- z)
+
Private convex ERM [Chaudhuri-Monteleoni 08 & -- Sarwate 11]
ℓ
Loss , Convex set C
Dataset
x = ( x1,..., xn )
Random coins
A
-diff. private
qˆ ÎC
• Utility
measured
by (worst-case)
risk:x )
Privacy:
A is differentially
privateexpected
in input xexcess
= ( x1,...,
n
( )
E é L qˆ;x ù - min L (q ;x )
ë
û q ÎC
1 n
(Recall that L(q ;x) = å ℓ(q ; xi ) )
n i=1
• Studied by [Chaudhuri-et-al ‘11, Rubinstein-et-al ’11, KiferSmith-Thakurta‘12, Smith-Thakurta ’13, …]
Why care about privacy in ERM?
• Median:
Dual formMinimizer
of SVM: typically
a subset of the exact data
is alwayscontains
a data point.
ℓ(q ; x1 ) + ℓ(q ; x2 ) + ℓ(q ; x3 )
points in the clear.
ℓ(q ; x3 )
ℓ(q ; x1 )
x1 x2
x3
q
x1 x2 x 3
q
q * = x2 is the minimizer
Contributions
1. New algorithms with optimal excess risk assuming:
•
2.
Loss function is Lipschitz.
 Non-smooth loss is common:
SVM, median, …
• Parameter set C is bounded.
 Applying their technique in
(Separate set of algorithms for strongly
convex
loss.)smoothing the
general
requires
loss, introducing extra error.
Matching lower bounds
• Best previous work [Chaudhuri-et-al’11, Kifer et al.’12]
additionally assumes ℓis smooth (bounded 2nd derivative)
• This work improves bounds by factor of
n
Results – upper bounds (dataset size = n, C Ì » )
p
Λ-strongly convex
Lipschitz
Privacy
Excess risk
Technique
Exponential sampling
(inspired by [McSherry-Talwar’07])
-DP
-DP
(rigorous analysis of & improvements
to [McSherry-Williams’10],
[Jain-Kothari-Thakurta’12] and
[Chaudhuri-Sarwate-Song’13])
Localization (new technique)
-DP
-DP
(or localization)
ℓis 1-Lipschitz on parameter set C of diameter £ 1.
Results – upper bounds (dataset size = n, C Ì » )
p
Λ-strongly convex
Lipschitz
Privacy
Excess risk
Technique
Exponential sampling
(inspired by [McSherry-Talwar’07])
-DP
-DP
(rigorous analysis of & improvements
to [McSherry-Williams’10],
[Jain-Kothari-Thakurta’12] and
[Chaudhuri-Sarwate-Song’13])
Localization (new technique)
-DP
-DP
(or localization)
ℓis 1-Lipschitz on parameter set C of diameter £ 1.
Results – upper bounds (dataset size = n, C Ì » )
p
Λ-strongly convex
Lipschitz
Privacy
Excess risk
Technique
Exponential sampling
(inspired by [McSherry-Talwar’07])
-DP
-DP
(rigorous analysis of & improvements
to [McSherry-Williams’10],
[Jain-Kothari-Thakurta’12] and
[Chaudhuri-Sarwate-Song’13])
Localization (new technique)
-DP
-DP
(or localization)
ℓis 1-Lipschitz on parameter set C of diameter £ 1.
Results – upper bounds (dataset size = n, C Ì » )
p
Λ-strongly convex
Lipschitz
Privacy
Excess risk
Technique
-DP
-DP
Localization (new technique)
-DP
-DP
(or localization)
ℓ is 1-Lipschitz on parameter set C of diameter £ 1.
Results – lower bounds
Strongly convex
Lipschitz
Privacy
-DP
-DP
-DP
-DP
Excess risk
Form of
ℓ used
Linear:
- áq , xñ,
Reduction from -DP
release of 1-way marginals
[HT’10]
C = unit ball
Reduction from
-DP
release of 1-way marginals
[BUV’13]
Results – upper bounds
Λ-strongly convex
Lipschitz
Privacy
Excess risk
Technique
Exponential sampling
(inspired by [McSherry-Talwar’07])
-DP
-DP
(rigorous analysis of & improvements
to [McSherry-Williams’10] and
[Chaudhuri-Sarwate-Song’13])
Localization (new technique)
-DP
-DP
Localization)
ℓis 1-Lipschitz on parameter set C of diameter £ 1.
Optimal
Noisy Stochastic
• Inputs: Data x = ( x ,..., x ), 1-Lipschitz loss ℓ, convex set C,
1
n
• Choose arbitrary
q1 ÎC
q1
C
• For t = 1 to n2:
q1
C
At iteration t :
R
1) Sample x ¬¾
¾ {x1,..., xn }.
ℓ(qt ; x)
qt
C
At iteration t :
R
1) Sample x ¬¾
¾ {x1,..., xn }.
ℓ(qt ; x)
qt
C
direction of the noisy gradient = -ht [ Ñℓ(qt ; x) + bt ]
At iteration t :
R
1) Sample x ¬¾
¾ {x1,..., xn }.
Learning rate
ℓ(qt ; x)
qt
C
direction of the noisy gradient = -ht [ Ñℓ(qt ; x) + bt ]
At iteration t :
R
1) Sample x ¬¾
¾ {x1,..., xn }.
ℓ(qt ; x)
qt
q t +1
C
direction of the noisy gradient = -ht [ Ñℓ(qt ; x) + bt ]
At iteration t+1 :
R
1) Sample x' ¬¾
¾ {x1,..., xn }.
ℓ(qt+1; x')
qt
q t +1
q t +2
C
-ht+1 [ Ñℓ(qt+1; x') + bt+1 ]
Fresh data sample
Repeat for
n 2 iterations, then output q 2 .
n
ℓ(qt+1; x')
qt
q t +1
q t +2
C
-ht+1 [ Ñℓ(qt+1; x') + bt+1 ]
Fresh data sample
Privacy of the noisy SGD
Noisy SGD algorithm is
-differentially private.
• Key point: Sampling amplifies privacy [KLNRS’08]:
After T = n 2 iterations, by strong composition [DRV’10], privacy
to
.
Optimal
Exponential Sampling
Algorithm
Exponential sampling algorithm
• Define a probability distribution over C :
•
Output a sample qˆ from f (× ;x)
An instance of the
exponential mechanism
[McSherry-Talwar’08]
L(q ;x)
4g
3g
Tight utility
analysis via
a “peeling”
argument
 Efficient
construction
based
on rapidly
mixing MCMC
Uses [Applegate-Kannan’91]
as a subroutine.
 Exploits
structure of convex functions:
f (q ;x)
2g
 A
Provides
multiplicative
convergence guarantee.
are decreasing
in volume
g
1 , A2 , …purely
Does not
follow
existing results.
 Shows
that
Pr éqˆdirectly
ÎA1 ù »from
1 when
ë
û
0
q*
A1 A2 A3 A4
q
Summary
1. New algorithms with optimal excess risk assuming:
•
Loss function is Lipschitz.
•
Parameter set C is bounded.
(Separate set of algorithms for strongly convex loss.)
2. Matching lower bounds
Not in this talk:
• New Localization technique: optimal algorithm for strongly convex loss.
• Generalization error guarantees: Not known to be tight in general.
```