slides_2015 - Computer Science

```DCSP-5: Noise
Jianfeng Feng
Department of Computer Science Warwick Univ., UK
[email protected]
http://www.dcs.warwick.ac.uk/~feng/dcsp.html
Assignment:2015
• Q1: you should be able to do it after last
week seminar
• Q2: need a bit reading (my lecture notes)
• Q3: standard
• Q4: standard
Assignment:2015
• Q5: standard
• Q6: standard
• Q7: after today’s lecture
plot soundsc
plot
plot
Recap
• Fourier Transform for a periodic signal
{ sim(n w t), cos(n w t)}
• For general function case,
ì X(F) = ¥ x(t)exp(-2p jFt)dt
ò -¥
ï
í
¥
ï x(t) = ò X(F)exp(2p jFt)dF
î
-¥
Recap:
this is all you have to remember (know)?
• Fourier Transform for a periodic signal
{ sim(n w t), cos(n w t)}
• For general function case,
ì X(F) = ¥ x(t)exp(-2p jFt)dt
ò -¥
ï
í
¥
ï x(t) = ò X(F)exp(2p jFt)dF
î
-¥
Dirac delta function
Can you do FT for cos(2 p t)?
ì X(F) = ¥ x(t)exp(-2p jFt)dt
ò -¥
ï
í
¥
ï x(t) = ò X(F)exp(2p jFt)dF
î
-¥
Dirac delta function
X(F) =
ò cos(2p t)exp(-2p jtF)dt
1
[exp(-2p jt) + exp(2p jt)]exp(-2p jtF)dt
ò
2
1
= ò [exp(-2p jt(1+ F)) + exp(2p jt(1- F))]dt
2
ù¥
1é
1
1
= êexp(-2p jt(1+ F)) +
exp(2p jt(1- F))ú |-¥
2 ë 2p j(1+ F)
2p j(1+ F)
û
= ?????
=
For example, take F=0 in the equation above, we have
ù¥
1é
1
1
exp(-2
p
jt)
+
exp(2
p
jt)
ê
ú |-¥
2 ë 2p j
2p j
û
1
=
sin(2 p t) |¥
-¥
2p
X(0) =
It makes no sense !!!!
Dirac delta function:
Ehrenfest
Bragg
A photo with the highest IQ (15 NL)
Shordiger
Heisenberg
Dirac Compton
Pauli
Bone
Debije
De Boer
M Curie
Planck
Einstein
Lorentz
Langevin
Dirac delta function:
Ehrenfest
Bragg
A photo with the highest IQ (15 NL)
Shordiger
Heisenberg
Dirac Compton
Pauli
Bone
Debije
De Boer
M Curie
Planck
Einstein
Lorentz
Langevin
Dirac delta function
The (digital) delta function, for a given n0
¥
å
d[n - n0 ] f [n] = f (n0 )
n0=0 here
n=-¥
ì1, n = 0
d[n] = í
î0, n ¹ 0
d(t)
Dirac delta function
The (digital) delta function, for a given n0
¥
å
d[n - n0 ] f [n] = f (n0 )
n0=0 here
n=-¥
ì1, n = 0
d[n] = í
î0, n ¹ 0
Dirac delta function d(x) (you could find a
nice movie in Wiki);
ò
¥
d
(t)
f
(t)dt
=
f
(0)
-¥
ì¥, t = 0
d (t) = í
î0, t ¹ 0
d(t)
Dirac delta function
Dirac delta function d(x);
[d (F -1) + d (F +1)]
exp(2p Fjt) dF
ò -¥
2
[exp(2p jt) + exp(-2p jt)]
=
2
= cos(2p t)
FT -1 (X(F)) =
¥
The FT of cos(2pt) is
FT (cos(2p t)) =
-1
0
1
d (F -1) + d (F +1)
2
Frequency
A final note (in exam or future)
• Fourier Transform for a periodic signal
{ sim(n w t), cos(n w t)}
• For general function case （it is true, but
need a bit further work),
ì X(F) = ¥ x(t)exp(-2p jFt)dt
ò -¥
ï
í
¥
ï x(t) = ò X(F)exp(2p jFt)dF
î
-¥
Summary
Will come back to it soon (numerical)
This trick (FT) has changed our life
and
will continue to do so
This Week’s Summary
• Noise
• Information Theory
Noise
Noise in communication systems:
probability and random signals
imshow(I);
noise = 1*randn(size(I));
imshow(Noisy);
Noise
Noise in communication systems:
probability and random signals
imshow(I);
noise = 1*randn(size(I));
imshow(Noisy);
Noise
Noise is a random signal (in general).
By this we mean that we cannot predict its value.
We can only make statements about the
probability of it taking a particular value
pdf
The probability density function (pdf) p(x) of a random variable
x is the probability that x takes a value between x0 and x0 +dx.
We write this as follows:
p(x0 )dx =P(x0 <x< x0 +dx)
P(x)
[x0 x0+ dx]
pdf
Probability that x will take a value lying
between x1 and x2 is
P(x1 < x £ x2 ) =
ò
x2
x1
p(x)dx
The probability is unity. Thus
ò
¥
-¥
p(x)dx =1
IQ distribution
Mentally
23%
Low
Intelligence Average
34.1%
13.6%
Above
Average
34.1%
High
Superior
Intelligence Intelligence
13.6%
2.1%
Exceptionally
0.13%
pdf
A density satifying the equation is termed
normalized.
The cumulative distribution function (CDF) F(x) is
the probability x is less than x0
F(x0 ) = P(x £ x0 ) =
ò
x0
-¥
p(x)dx
My IQ is above 85% (F(my IQ)=85%).
pdf
From the rules of integration:
P(x1<x<x2) = P(x2) --P(x1)
pdf has two classes: continuous and discrete
Continuous distribution
An example of a continuous distribution is the
Normal, or Gaussian distribution:
p(x ) 
 (x  m )2 

exp 
2

s
2p s


1
where m, s is the mean and standard variation
value of p(x).
The constant term ensures that the distribution is normalized.
Continuous distribution.
This expression is important as many actually occurring noise source can be
described by it, i.e.
white noise or coloured noise.
Generating f(x) from matlab
X=randn(1,1000);
Plot(x);
• X, x, …. X,
• Each x[i] is independent
• Histogram
Discrete distribution.
If a random variable can only take discrete
value, its pdf takes the forms of lines.
An example of a discrete distribution is the
Poisson distribution
P ( n) 

n
n!
exp( )
Discrete distribution.
We
cannot
predicate
value a random variable
Mean
and
variance
We can introduce measures that summarise what
we expect to happen on average.
The two most important measures are the mean
(or expectation) and the standard deviation.
The mean of a random variable x is defined to be
Mean and variance
EX =
ò
xp(x) dx
or
EX = å np(n)
• In the examples above we have assumed
that the mean of the Gaussian distribution
to be 0, the mean of the Poisson
distribution is found to be .
Mean and variance
• The mean of a distribution is, in common
sense, the average value.
• Can be estimated from data
• Assume that {x1, x2, x3, …,xN} are sampled
from a distribution
• Law of Large Numbers:
EX ~ (x1+x2+…+xN)/N
Mean and variance
mean
• The more data we have, the more accurate we can estimate the mean
• (x1+x2+…+xN)/N against N for randn(1,N)
Mean and variance
• The variance is defined as The variance s is defined to be
s 2 = E(X - EX)2 = ò (x - Ex)2 p(x)dx
• The square root of the variance is called standard deviation.
• Again, it can be estimated from data
N
s 2 = E(X - EX)2 ~
2
(x
EX)
å i
i=1
N
Mean and variance
• The standard deviation is a measure of the
spread of the probability distribution around
the mean.
• A small standard deviation means the
distribution are close to the mean.
• A large value indicates a wide range of
possible outcomes.
• The Gaussian distribution contains the
standard deviation within its definition (m,s)
Mean and variance
• Communication signals can be modelled as a
zero-mean, Gaussian random variable.
• This means that its amplitude at a particular
time has a PDF given by Eq. above.
• The statement that noise is zero mean says
that, on average, the noise signal takes the
values zero.
Mean and variance
http://en.wikipedia.org/wiki/Nations_and_intelligence
Einstein’s IQ
Einstein’s IQ=160+
Mentally
23%
Low
Intelligence Average
34.1%
13.6%
Above
Average
34.1%
High
Superior
Intelligence Intelligence
13.6%
2.1%
Exceptionally
0.13%
SNR
• Signal to noise ratio is an important quantity in
determining the performance of a communication
channel.
• The noise power referred to in the definition is the
mean noise power.
• It can therefore be rewritten as
SNR= 10 log10 ( S / s2)
Correlation or covariance
• Cov(X,Y) = E(X-EX)(Y-EY)
• correlation coefficient is normalized covariance
Coef(X,Y) = E(X-EX)(Y-EY) / [s(X)s(Y)]
• Positive correlation, Negative correlation
• No correlation (independent)
Stochastic process = signal
• A stochastic process is a collection of random
variables x[n], for each fixed [n], it is a random
variable
• Signal is a typical stochastic process
• To understand how x[n] evolves with n, we will
look at auto-correlation function (ACF)
g (k) =
E(x[n + k]- Ex[n + k]) (x[n]- Ex[n])
s (x[n + k])s (x[n])
• ACF is the correlation between k steps
Stochastic process
>> clear all
close all
n=200;
for i=1:10
x(i)=randn(1,1);
y(i)=x(i);
end
for i=1:n-10
y(i+10)=randn(1,1);
x(i+10)=.8*x(i)+y(i+10);
end
plot(xcorr(x)/max(xcorr(x)));
hold on
plot(xcorr(y)/max(xcorr(y)),'r')
g (k) =
•
E(x[n + k]- Ex[n + k]) (x[n]- Ex[n])
s (x[n + k])s (x[n])
two signals are generated: y (red) is simply randn(1,200)
x (blue) is generated x[i+10]=.8*x[i] + y[i+10]
• For y, we have g(0)=1, g(n)=0, if n is not 0 : having no memory
• For x, we have g (0)=1, and g (n) is not zero, for some n: having memory
white noise w[n]
• White noise is a random process we can not
predict at all (independent of history)
g(k ) 
E(x[n  k ]  Ex[n  k ])(x[n ]  Ex[n ])
 0,if k  0
s(x[n  k ])s(x[n ])
• In other words, it is the most ‘violent’ noise
• White noise draws its name from white light which will become
clear in the next few lectures
white noise w[n]
• The most ‘noisy’ noise is a white noise
since its autocorrelation is zero, i.e.
corr(w[n], w[m])=0 when n  m
• Otherwise, we called it colour noise since
we can predict some outcome of w[n],
given w[m], m<n
Sweety
Gaussian
Why do we love Gaussian?
Sweety Gaussian
+
Herr Gauss
+
Yes, I am
junior
Gaussian
=
Frau Gauss
=
Juenge Gauss
• A linear combination of two Gaussian random variables is Gaussian again
• For example, given two independent Gaussian variable X and Y with mean
zero
• aX+bY is a Gaussian variable with mean zero and variance a2 s(X)+b2s(Y)
• This is very rare (the only one in continuous distribution) but extremely useful:
panda in the family of all distributions
DCSP-6: Information Theory
Jianfeng Feng
Department of Computer Science Warwick Univ., UK
[email protected]
http://www.dcs.warwick.ac.uk/~feng/dcsp.html
Data Transmission
Data Transmission
How to deal with noise?
How to transmit signals?
Data Transmission
Data Transmission
Transform I
•Fourier Transform
(skipped, but common knowledge)
•Noise
•Signal Transmission
Today
• Data transmission:
• Shannon Information and Coding:
Information theory,
• coding of information for efficiency and error
protection;
Information and coding theory
Information theory is concerned with
• description of information sources
• representation of the information from a source
(coding) ,
• transmission of this information over channel.
Information
and
coding
theory
Information and coding theory
The best example
how a deep mathematical theory
could be successfully applied to
solving engineering problems.
Information and coding theory
Information theory is a discipline in applied mathematics involving the
quantification of data
with the goal of enabling as much data as possible to be reliably
stored
on a medium and/or
communicated
over a channel.
Information and coding theory
The measure of data, known as
information entropy,
is usually expressed by the average number of bits needed
for storage or communication.
Information and coding theory
The field is at the crossroads of
•
•
•
•
•
•
mathematics,
statistics,
computer science,
physics,
neurobiology,
electrical engineering.
Information and coding theory
Impact has been crucial to success of
• voyager missions to deep space,
•
invention of the CD,
•
feasibility of mobile phones,
•
development of the Internet,
•
the study of linguistics and of human perception,
•
understanding of black holes,
and numerous other fields.
Information and coding theory
Founded in 1948 by Claude Shannon in his seminal work
A Mathematical Theory of Communication
Information and coding theory
• The ‘bible’ paper: cited more than 60,000
Information
and
coding
theory
The most fundamental results of this theory are
1. Shannon's source coding theorem
the number of bits needed to represent the result of
an uncertain event is given by its entropy;
2. Shannon's noisy-channel coding theorem
reliable communication is possible over noisy
channels if the rate of communication is below a
certain threshold called the channel capacity.
The channel capacity can be approached by using appropriate encoding and decoding systems.
Information
and
coding
theory
The most fundamental results of this theory are
1. Shannon's source coding theorem
the number of bits needed to represent the result of
an uncertain event is given by its entropy;
2. Shannon's noisy-channel coding theorem
reliable communication is possible over noisy
channels if the rate of communication is below a
certain threshold called the channel capacity.
The channel capacity can be approached by using appropriate encoding and decoding systems.
Information and coding theory
Consider to predict the activity of Prime minister tomorrow.
This prediction is an information source.
Information and coding theory
Consider to predict the activity of Prime Minister tomorrow.
This prediction is an information source X.
The information source X ={O, R} has two outcomes:
• He will be in his office (O),
• he will be naked and run 10 miles in London (R).
Information and coding theory
Clearly, the outcome of 'in office' contains
little information;
it is a highly probable outcome.
The outcome 'naked run', however contains
considerable information;
it is a highly improbable event.
Information and coding theory
An information source is a probability distribution, i.e. a
set of probabilities assigned to a set of outcomes
(events).
This reflects the fact that the information contained in an
outcome is determined not only by the outcome, but
by how uncertain it is.
An almost certain outcome contains little information.
A measure of the information contained in an
outcome was introduced by Hartley in 1927.
Information
Defined the information contained in an outcome
xi in x={x1, x2,…,xn}
I(xi) = - log2 p(xi)
Information
The definition above also satisfies the requirement
that the total information in in dependent events
Clearly, our prime minister prediction for two days
contain twice as much information as for one
day.
Information
The definition above also satisfies the requirement
that the total information in in dependent events
Clearly, our prime minister prediction for two days
contain twice as much information as for one
day X={OO, OR, RO, RR}.
For two independent outcomes xi and xj,
I(xi and xj) = - log2 P(xi and xj)
= - log2 P(xi) P(xj)
= - log2 P(xi) - log2P(xj)
Entropy
The measure entropy H(X) defines the information
content of the source X as a whole.
It is the mean information provided by the source.
We have
H(X)= Si P(xi) I(xi) = - Si P(xi) log2 P(xi)
A binary symmetric source (BSS) is a source with
two outputs whose probabilities are p and 1-p
respectively.
Entropy
The prime minister discussed is a BSS.
The entropy of the BBS source is
H(X) = -p log2 p - (1-p) log2 (1-p)
Entropy
.
When one outcome is certain, so is the other, and
the entropy is zero.
As p increases, so too does the entropy, until it
reaches a maximum when p = 1-p = 0.5.
When p is greater than 0.5, the curve declines
symmetrically to zero, reached when p=1.
Next Week
• Application of Entropy in coding
• Minimal length coding
Entropy
We conclude that the average information in
BSS is maximised when both outcomes
are equally likely.
Entropy is measuring the average
uncertainty of the source.
(The term entropy is borrowed from thermodynamics. There too it is a measure of the uncertainly of disorder of a system).
Shannon:
•
My greatest concern was what to call it.
•
I thought of calling it ‘information’, but the word was overly used, so I decided to call it ‘uncertainty’.
•
When I discussed it with John von Neumann, he had a better idea.
•
Von Neumann told me, ‘You should call it entropy, for two reasons.
•
In the first place your uncertainty function has been used in statistical mechanics under that name, so it already has a name.
•
In the second place, and more important, nobody knows what entropy really is, so in a debate you will always have the advantage.
Entropy
In Physics: thermodynamics
The arrow of time (Wiki)
•Entropy is the only quantity in the
physical sciences that seems to imply a
particular direction of progress,
sometimes called an arrow of time.
•As time progresses, the second law of
thermodynamics states that the entropy
of an isolated systems never decreases
•Hence, from this perspective, entropy
measurement is thought of as a kind of
clock
Entropy
```