### Compressive Sensing:

```Compressive Sensing:
An Introduction and Survey of
Applications
Objectives
Description of theory
 Discussion of important results
 Study of relevant applications

Introduction to the Problem





CS is a new paradigm that makes possible fast
acquisition of data using few number of samples
It tries to bring the number of samples acquired
as close to the information content as possible
The classical Shannon-Whittaker sampling
theorem has monopolized signal acquisition
arena
It applies only to band-limited signals
Says number of samples ~ desired resolution





But band-limitedness not a universal assumption
Most real-life signals have huge frequency extent
Also it is not always a true measure of
information content (e.g. : train of spikes in
frequency)
Also for increased resolution we need to
increase the rate of sampling( but speed of
most devices is limited)
In some other situations(e.g. : MRI) no of
samples that can be acquired is inherently
limited
Our objective
We want to investigate the possibility of
reconstructing a signal from fewer no. of
samples than dictated by ST
 Signal model assumed: Sparsity
l
 Much broader class than BL signals
 A nonlinear class
 Sparsity of a signal=no. of non-zero coefficients
= l 0 norm of the signal vector

Sparsity model holds for most known signals
 Most naturally-occurring signals are sparse in a
certain basis
 This property has been used in compression
using transform coding:

f
Bandlimit
and Sample
at Nyquist
Rate
H
~
f
Thresholding




Even though samples are acquired at max possible
rate we get rid of most of them
This strategy is wasteful in many ways
CS combines the first and second stages to acquire
signals in the compressed form

f

H
Some algorithm
for
reconstruction
Sparse
result
The cost incurred is increased computational
requirement
Questions:





What is the minimum no of measurements we
require?
How can reconstruction be done from the
reduced no of samples?
The problem can be shown to be reducible to a
problem of solving an underdetermined system
of linear equations
Reconstruction is possible because of sparsity
assumption
Some Notation
{i }sensing basis


{ i }
sparsifying basis
1  i  N
This is a set of orthonormal vectors where the signal is known
to have a sparse representation with S coefficients
• Measurements are linear functionals of the form
where i ∈ T s.t.
 i , f 
•
| T | M  N
It can be shown measurements are of the form
Ux  b
where x is a sparse vector





Effectively we are doing a projection of an ndimensional vector into an M dimensional space
Need to make sure unique recovery is possible
i.e.; A( x1  x2 )  0 if x1  x2
Stability: RIP condition= Sub-matrix is wellconditioned with a good condition
number(necessary and sufficient)
For the sensing matrix and number of
measurements these conditions should be
satisfied
Contd






RIP condition is related to the uncertainty principle
for the two orthobases
Another property of interest in incoherence
The less the coherence the better the results
It’s like saying the sensing matrix should be as
different from the sparsifying basis or the
measurements should be holographic (nonconcentrated).
Should convey the least amt of info abt the
signal(20 weights problem)
Possibly random(or complementary bases like time
frequency)
Method of reconstruction








Various possibilities exist
We need to pick the least sparse solution
The RIP condition ensures it is the unique solution
Could go for a combinatorial optimisation(directly
sift thru all sparse solutions to pick the actual one)
Then only require S+1 samples
But it is NP hard –highly intractable
Another choice-l2 norm.Very easy to analyse. But
does not give Sparse solutions
So a compromise is to go for l1 norm minimisation
Geometrical Argument

L1 norm

L2 Norm
Why RIP is necessary?
Result


Let’s consider sets of orthogonal bases  and
If M  S 2 (U ) log(n /  ) then with probability
exceeding 1   x supported on a fixed set can
be recovered using the following optimisation
problem:
Ux  y
min | x |l
1


Imp requirements : incoherence ,randomness ,
RIP
This ensures that the set of measurements for
which reconstruction fails occurs with very small
prob: to take adv take random samples
Non-Uniform Sampling Theorem






Signal composed of S discrete frequencies
Take M random measurements in time domain
From above theorem M>Slog(n) gives perfect
reconstruction
Time and frequency domains are maximally
incoherent
This is also fundamental i.t.s.t. fewer samples
and reconstruction is virtually impossible
Eg Dirac Comb
More on RIP

Under the assumptions for which the above
result holds it can be shown that for x
belonging to a fixed set T :
m
3
2
2
2
| x |2  | Ux |2  m | x |2
2
2
Ensures any such signal is recovered uniquely
recoverable
• To include all sets T we need to strengthen the
above condition (UUP). But then the number
of samples required increases; becomes 4-5th
power of log(n)
•
Definition

Restricted Isometry Constant : It’s smallest
constant such that for every x belonging to T
with size S
(1  S ) | x |2 | Ax |2  (1  S ) | x |2
2
S
2
<1 for condition to hold
 This is an approximate orthogonality
condition
• UUP=should hold for all T’s with the same
size

2
Result
Assuming the existence of a matrix that satisfies
the above property with  2 S  1 sufficiently
then

| x*  x |l2  C | x  xs |l1 / S
| x*  x |l1  C | x  xs |l1
Here the condition on M depends on the type
of matrix we choose
• This is not probabilistic
•
Two concerns






Signals are only approximately sparse
Measurements are noisy
Our reconstruction procedure should be
robust against these two cases
RIP to the rescue!
Has been shown if  2S  2 1 the solution to
min | x |l sub to | Ax  y |l   satisfies
1
2
| x*  x |l2  C | x  xs |l1 / S  C1.
Summary





We need to find sensing matrices that satisfy
the isometry property
Simple choice: random matrices
If m>CSlog(n/δ) random matrices satisfy RIP
For orthobases we need 4-5th power of log(n)
Random matrices present storage difficulty
Application: Spectrum Sensing





Spectrum sensing is the task of detecting the presence
or absence of a carrier in a wideband of freqs
Cognitive Radios should be equipped with such
a mechanism to enable efficient utilisation of channel
A major implementation challenge lies in the very high
sampling rates required by conventional spectral
estimation methods which have to operate at or above
the Nyquist rate.
Because of high rates no of samples is limited
May not provide sufficient statistic
The situation is appropriate for deployment of CS
 The spectrum is sparse because only a relatively small
no of users are transmitting
 Let  f o f N  be the frequency range in use
 Bandwidth=B






Our job is to detect the N frequency bands and classify
them as black, grey or white regions based on the PSD
level
In the analysis we use a vector of time samples sampled
at Nyquist rate of To. In the actual implementation only
sub-Nyquist sampling is done
So let x  S T rt where rt is a vector of M values in
the duration [0 MTo]. And x  R K
This is a generic model. S can be any matrix of basis
vectors
Since sparsity under consideration is in freq, time
domain is the best sampling domain. then S = I M
Steps involved







Reconstruct rt from M measurements
Find high resolution fourier transform rf  FM rt
Obtain the frequency edges from rf
Estimate the PSD in each band
Later we’ll see it is not necessary to reconstruct the
frequency vector from rt
Only course sensing is done so noise is not a key
factor
To reduce noise effects a wavelet smoothing
operation is done
Edge Detection
Problem similar to edge detection in images
 Let  ( f )be a wavelet smoothing function with a
compact support. The dilation of  ( f ) by a scale
factor s is given by:

For dyadic scales s is a power of two
 Continuous wavelet transform of R(f) is given by:

•
Then we do a simple differencing to this function

From above:

Replacing rt
by its estimate found below:
We get the estimated wavelet transform:
Then we take the derivative of the above vector and
find local minima

Below is the vector with derivative values:
Take local peaks and nous sommes done
 Also in each band we can estimate the average PSD
by just averaging the frequency vector in that band


One major simplification can be done by noting zs is
itself a sparse vector. Thus we can eliminate the need
to reconstruct rf

To do this we define:
which is the differentiation matrix
• Then we rewrite the above equation:
And finally:

Recovered frequency response
Channel Estimation






CS also has applications in sparse channel
estimation
Every delay-dominant channel can be represented
as a superposition of the pilot signal sent
It can be shown we can divide the total delay into
bins each 1/W in width
Not all these bins are always occupied
Each bin represents a dominant scatterer
No of bins that are non-zero<<p(total
bins)=floor(Tm*W)







Two ways to make use of CS
Use proper pilot signals
Can be either random bits 1 or -1
Or could be a sum of exponentials with random
frequencies.
The first corresponds to using a random sensing
basis
The second is random sampling in frequency using a
subset of FT which is maximally incoherent
In both cases we obtain a better MSE less than LS

Another app is channel coding
References

•





Channel Estimation:
.Bajwa, U.W., Haupt, J., Raz, G. and Nowak, R. (2008). “Compressed
Channel Sensing”, Proceedings of the Annual Conference on
Information Sciences and Systems, March 2008, Pages 1-6
Spectrum Sensing:
Z. Tian and G. B. Giannakis, “Compressed sensing for
wideband cognitive radios,” in Acoustics, Speech and Signal
Processing,2007. ICASSP 2007. IEEE International
Conference on, vol.4,Honolulu, HI, Apr. 2007, pp. 1357–1360.
CS
E. Candès and J. Romberg, “Sparsity and incoherence in compressive
sampling,” Inverse Prob., vol. 23, no. 3, pp. 969–985, 2007.
An Introduction To Compressive Sampling: Emmanuel Candes
Michael B Wakin
A Lecture on CS Richard Baraniuk
```