Learning Measurement
Matrices for Redundant
Richard Baraniuk
Rice University
Chinmay Hegde
Sparse Recovery
• Sparsity rocks, etc.
• Previous talk focused mainly on signal inference
(ex: classification, NN search)
• This talk focuses on signal recovery
Compressive Sensing
• Sensing via randomized dimensionality reduction
• Recovery: solve an ill-posed inverse problem
exploit the geometrical structure
of sparse/compressible signals
General Sparsifying Bases
• Gaussian measurements incoherent with any fixed
orthonormal basis (with high probability)
• Ex: frequency domain:
Sparse Modeling: Approach 1
• Step 1: Choose a signal model with structure
– e.g. bandlimited, smooth with r vanishing moments, etc.
• Step 2: Analytically design a sparsifying
basis/frame that exploits this structure
– e.g. DCT, wavelets, Gabor, etc.
Sparse Modeling: Approach 2
• Learn the sparsifying basis/frame from
training data
• Problem formulation: given a large number of
training signals, design a dictionary D that
simultaneously sparsifies the training data
• Called sparse coding / dictionary learning
• Dictionary: an NxQ matrix whose columns are used
as basis functions for the data
• Convention: assume columns are unit-norm
• More columns than rows, so dictionary is
redundant / overcomplete
Dictionary Learning
• Rich vein of theoretical and algorithmic work
Olshausen and Field [‘97], Lewicki and Sejnowski [’00], Elad [‘06],
Sapiro [‘08]
• Typical formulation: Given training data
• Several efficient algorithms, ex: K-SVD
Dictionary Learning
• Successfully applied to denoising, deblurring,
inpainting, demosaicking, super-resolution, …
– State-of-the-art results in many of these problems
Aharon and Elad ‘06
Dictionary Coherence
• Suppose that the learned dictionary is normalized
to have unit -norm columns:
• The mutual coherence of D is defined as
• Geometrically,
represents the cosine of the
minimum angle between the columns of D, smaller
is better
• Crucial parameter in analysis as well as practice
(line of work starting with Tropp [04])
Dictionaries and CS
• Can extend CS to work with non-orthonormal,
redundant dictionaries
Holographic basis
• Coherence of
determines recovery success
Rauhut et al. [08], Candes et al. [10]
• Fortunately, random
guarantees low coherence
Geometric Intuition
• Columns of D: points on the unit sphere
• Coherence: minimum angle between the vectors
• J-L Lemma: Random projections approximately
preserve angles between vectors
Q: Can we do better than
random projections for
dictionary-based CS?
Q restated: For a given
dictionary D, find the best
CS measurement matrix
Optimization Approach
• Assume that a good dictionary D has been
• Goal: Learn the best
for this particular D
• As before, want the “shortest” matrix
such that
the coherence of
is at most some parameter
• To avoid degeneracies caused by a simple scaling,
also want that
does not shrink columns much:
A NuMax-like Framework
• Convert quadratic constraints in
into linear
constraints in
(via the “lifting trick”)
• Use a nuclear-norm relaxation of the rank
• Simplified problem:
Algorithm: “NuMax-Dict”
• Alternating Direction Method of Multipliers (ADMM)
- solve for P using spectral thresholding
- solve for L using least-squares
- solve for q using “squishing”
Convergence rate depends on the size of the
dictionary (since #constraints =
NuMax vs. NuMax-Dict
• Same intuition, trick, algorithm, etc;
• Key enabler is that coherence is intrinsically a
quadratic function of the data
• Key difference: the (linearized) constraints are
no longer symmetric
– We have constraints of the form
– This might result in intermediate P estimates having
complex eigenvalues, so the notion of spectral
thresholding needs to be slightly modified
Experimental Results
Expt 1: Synthetic Dictionary
• Generic dictionary: random w/ unit norm. columns
• Dictionary size: 64x128
• We construct different measurement matrices:
• Random
• NuMax-Dict
• Algorithm by Elad [06]
• Algorithm by Duarte-Carvajalino & Sapiro [08]
• We generate K=3 sparse signals with Gaussian
amplitudes, add 30dB measurement noise
• Recovery using OMP
• Measure recovery SNR, plot as a function of M
Exp 1: Synthetic Dictionary
Expt 2: Practical Dictionaries
• 2x overcomplete DCT dictionary, same parameters
• 2x overcomplete dictionary learned on 8x8 patches of a
real-world image (Barbara) using K-SVD
• Recovery using OMP
• Exact problem seems to be hard to analyze
• But, as in NuMax, can provide analytical bounds in
the special case where the measurement matrix is
further constrained to be orthonormal
Orthogonal Sensing of
Dictionary-Sparse Signals
• Given a dictionary D, find the orthonormal
measurement matrix that provides the best
possible coherence
• From a geometric perspective, ortho-projections
cannot improve coherence, so necessarily
Semidefinite Relaxation
• The usual trick: Lifting and trace-norm
Theoretical Result
• Theorem: For any given redundant dictionary D,
denote its mutual coherence by
Denote the optimum of the (nonconvex) problem
Then, there exists a method to produce a rank-2M
ortho matrix
such that the coherence of
at most
i.e., We can obtain close to optimal performance,
but pay a price of a factor 2 in the number of
• NuMax-Dict performance comparable to the best
existing algorithms
• Principled convex optimization framework
• Efficient ADMM-type algorithm that exploits the
rank-1 structure of the problem
• Upshot: possible to incorporate other structure
into the measurement matrix, such as positivity,
sparsity, etc.
Open Question
• Above framework assumes a two-step approach:
first construct a redundant dictionary (analytically
or from data) and then construct a measurement
• Given a large number of training data, how to
efficiently solve jointly for both the dictionary and
the sensing matrix?
(Approach introduced in DC-Sapiro [08])

similar documents