### Computer Engineering Program - The School of Electrical

```School of Electrical Engineering
and Computer Science
Time Series Analysis
Topics in Machine Learning
Fall 2011
Time Series Discussions
•
•
•
•
•
•
Overview
Basic definitions
Time domain
Forecasting
Frequency domain
State space
Why Time Series Analysis?
• Sometimes the concept we want to learn is
the relationship between points in time
What is a time series?
Time series:
a sequence of
measurements over
time
A sequence of random variables
x1, x2, x3, …
Time Series Examples
Definition: A sequence of measurements over time
 Finance
 Social science
 Epidemiology
 Medicine
 Meterology
 Speech
 Geophysics
 Seismology
 Robotics
Three Approaches
• Time domain approach
– Analyze dependence of current value on past values
• Frequency domain approach
– Analyze periodic sinusoidal variation
• State space models
– Represent state as collection of variable values
– Model transition between states
Sample Time Series Data
Johnson & Johnson quarterly earnings/share, 1960-1980
Sample Time Series Data
Yearly average global temperature deviations
Sample Time Series Data
Speech recording of “aaa…hhh”, 10k pps
Sample Time Series Data
NYSE daily weighted market returns
Not all time data will exhibit
strong patterns…
LA annual rainfall
…and others will be apparent
Time Series Discussions
•
•
•
•
•
•
Overview
Basic definitions
Time domain
Forecasting
Frequency domain
State space
Definitions
• Mean
• Variance
variance 
mean 
Definitions
• Covariance
Cov ( X , Y ) 
N
( x i   x )( y i   y )
i 1
N

• Correlation
Cor ( X , Y )  r 
Cov ( X , Y )
 X Y
Correlation
Y
Y
Y
r = +1
X
X
r=0
r = +.3
Y
Y
Y
X
r = -1
X
X
r = -.6
X
r=0
Redefined for Time
• Mean function
 X (t )  E ( X t )
for t  0 ,  1,  2 ,...
Ergodic?
• Autocovariance
 X ( h )  Cov ( X t  h , X t )
lag
• Autocorrelation
 X (h) 
 X (h)
 X (0)
 Cor ( X t  h , X t )
Autocorrelation Examples
lag
Positive
lag
Negative
Stationarity – When there is no relationship
• {Xt} is stationary if
– X(t) is independent of t
– X(t+h,t) is independent of t for each h
• In other words, properties of each section are
the same
• Special case: white noise
Time Series Discussions
•
•
•
•
•
•
Overview
Basic definitions
Time domain
Forecasting
Frequency domain
State space
Linear Regression
• Fit a line to the data
• Ordinary least squares
– Minimize sum of squared
distances between points
and line
y = x + 
• Try this out at
http://hspm.sph.sc.edu/courses/J716/demos/LeastSquares/LeastSquaresDemo.html
R2: Evaluating Goodness of Fit
• Least squares minimizes
the combinedresidual

i
(Y  Y i )
2
• Explained sum of squares
is difference between line
and mean

ESS 

i
(Y  Y )
y = x + 
2
• Total sum of squares is the total of these two
TSS  ESS  RSS 

i


2
2
( Y  Y i )   (Y  Y )
i
R2: Evaluating Goodness of Fit
• R2, the coefficient
of determination
R 
2
ESS
TSS
 1
TSS
• 0  R2  1
• Regression minimizes RSS and so
maximizes R2
y = x + 
R2: Evaluating Goodness of Fit
R 
2
ESS
TSS
 1
TSS
R2: Evaluating Goodness of Fit
R 
2
ESS
TSS
 1
TSS
R2: Evaluating Goodness of Fit
R 
2
ESS
TSS
 1
TSS
Linear Regression
• Can report:
– Direction of trend (>0, <0, 0)
– Steepness of trend (slope)
– Goodness of fit to trend (R2)
Examples
What if a linear trend does not
fit my data well?
• Could be no relationship
• Could be too much local variation
– Want to look at longer-term trend
– Smooth the data
• Could have periodic or seasonality effects
X t  a  b1t  b2 Q1  b3 Q 2  b 4 Q 3  b5 Q 4
• Could be a nonlinear relationship
Moving Average
• Compute an average of the last m
consecutive data points
• 4-point moving average is
x MA ( 4 ) 
( x t  x t 1  x t  2  x t  3 )
4
• Smooths white noise
• Can apply higher-order MA
• Exponential smoothing
• Kernel smoothing
k
mt 
a
jk
j
xt  j
53 week
5 week
Piecewise Aggregate
Approximation
• Segment the data into linear pieces
Interesting paper
Nonlinear Trend Examples
Nonlinear Regression
Fit Known Distributions
ARIMA: Putting the pieces together
• Autoregressive model of order p: AR(p)
• Moving average model of order q: MA(q)
• ARMA(p,q)
ARIMA: Putting the pieces together
• Autoregressive model of order p: AR(p)
x t  1 x t 1   2 x t  2  ..   p x t  p  w t
• Moving average model of order q: MA(q)
• ARMA(p,q)
-2
0
2
4
AR(1),   0 . 9
0
20
40
60
80
100
-4
-2
0
2
4
AR(1),    0 . 9
0
20
40
60
80
100
ARIMA: Putting the pieces together
• Autoregressive model of order p: AR(p)
x t  1 x t 1   2 x t  2  ..   p x t  p  w t
• Moving average model of order q: MA(q)
x t   1 w t 1   2 w t  2  ..   q w t  q  w t
• ARMA(p,q)
ARIMA: Putting the pieces together
• Autoregressive model of order p: AR(p)
x t  1 x t 1   2 x t  2  ..   p x t  p  w t
• Moving average model of order q: MA(q)
x t   1 w t 1   2 w t  2  ..   q w t  q  w t
• ARMA(p,q)
– A time series is ARMA(p,q) if it is stationary and
x t   1 x t 1   2 x t  2  ..   p x t  p  w t 
 1 w t 1   2 w t  2  ..   q w t  q
ARIMA (AutoRegressive
Integrated Moving Average)
• ARMA only applies to stationary process
• Apply differencing to obtain stationarity
– Replace its value by incremental change from last value
Differenced x1
x2
x3
x4
1 time
x2-x1’
x3’-x2’
x4’-x3’
2 times
x3’-2x2’+x1’ x4’-2x3’+x2’
• A process xt is ARIMA(p,d,q) if
– AR(p)
– MA(q)
– Differenced d times
• Also known as Box Jenkins
Time Series Discussions
•
•
•
•
•
•
Overview
Basic definitions
Time domain
Forecasting
Frequency domain
State space
Express Data as Fourier Frequencies
• Time domain
– Express present as function of the past
• Frequency domain
– Express present as function of oscillations, or
sinusoids
Time Series Definitions
• Frequency, , measured at cycles per time point
• J&J data
– 1 cycle each year
– 4 data points (time points) each cycle
– 0.25 cycles per data point
• Period of a time series, T = 1/
– J&J, T = 1/.25 = 4
– 4 data points per cycle
– Note: Need at least 2
Fourier Series
• Time series is a mixture of oscillations
– Can describe each by amplitude, frequency and
phase Take a look
– Can also describe as a sum of amplitudes at all
time points
– (or magnitudes at all frequencies)
x t   cos( 2 t )   sin( 2 t )
– If we allow for mixtures of periodic series then
q
xt 
 [
i 1
i
cos( 2  i t )   i sin( 2  i t )]
Example
x t 1  2 cos( 2 t 6 / 100 )
x t 2  4 cos( 2 t10 / 100 )
 3 sin( 2 t 6 / 100 )
 5 sin( 2 t10 / 100 )
x t 3  6 cos( 2 t 40 / 100 )
x t 4  x t1  x t 2  x t 3
 7 sin( 2 t 40 / 100 )
How Compute Parameters?
n/2
xt 
 [
i
( j / n ) cos( 2  tj / n )   i ( j / n ) sin( 2  tj / n )]
j 1
• Regression
• Discrete Fourier Transform
d ( )  d ( j / n )  n
1
n
2

xt e
 i 2  tj / n
t 1
• DFTs represent amplitude and phase of
series components
• Can use redundancies to speed it up (FFT)
Breaking down a DFT
• Amplitude
A ( j / n )  | d ( j / n ) |
R ( d ( j / n ))  I ( d ( j / n ))
2
2
• Phase
 ( j / n )  tan
1
( I ( d ( j / n )) / R ( d ( j / n )))
Example
GBP
2
1
0
-1
1 frequency
GBP
2
1
0
-1
2 frequencies
GBP
2
1
0
-1
3 frequencies
GBP
2
1
0
-1
5 frequencies
GBP
2
1
0
-1
10 frequencies
GBP
2
1
0
-1
20 frequencies
Periodogram
• Measure of squared correlation between
– Data and
– Sinusoids oscillating at frequency of j/n
P ( j / n)  (
2
n
n

x t cos( 2  tj / n ))  (
2
t 1
– Compute quickly using FFT
2
n
n

t 1
x t sin( 2  tj / n ))
2
Example
P(6/100) = 13, P(10/100) = 41, P(40/100) = 85
Wavelets
• Can break series up into segments
– Called wavelets
– Analyze a window of time separately
– Variable-sized windows
Time Series Discussions
•
•
•
•
•
•
Overview
Basic definitions
Time domain
Forecasting
Frequency domain
State space
State Space Models
• Current situation represented as a state
– Estimate state variables from noisy observations
over time
– Estimate transitions between states
• Kalman Filters
– Similar to HMMs
• HMM models discrete variables
• Kalman filters models continuous variables
Conceptual Overview
x
• Lost on a 1-dimensional line
– Possibly incorrect
• Position x(t), Velocity x’(t)
Conceptual Overview
0.16
0.14
0.12
0.1
0.08
0.06
0.04
0.02
0
•
•
•
•
•
0
10
20
30
40
50
60
70
80
90
100
Current location distribution is Gaussian
Transition model is linear Gaussian
Noisy information
The sensor model is linear Gaussian
Sextant Measurement at ti: Mean = i and Variance = 2i
Measured Velocity at ti: Mean = ’i and Variance = ’2i
Kalman Filter Algorithm
• Predict next location
– Use current location
– Use transition function (linear Gaussian)
– Result is Gaussian
• Get next sensor measurement (Gaussian)
• Correct prediction
– Weighted mean of previous prediction and
measurement
Conceptual Overview
0.16
0.14
0.12
prediction
Measurement
at i
0.1
0.08
0.06
0.04
0.02
0
0
10
20
30
40
50
60
70
80
90
100
• We generate the prediction for time i+, prediction is Gaussian
• GPS Measurement: Mean = i+ and Variance = 2i +
• They do not match
Conceptual Overview
prediction
0.16
corrected
estimate
0.14
0.12
0.1
0.08
measurement
at i+
0.06
0.04
0.02
0
0
10
20
30
40
50
60
70
80
90
100
• Corrected mean is the new optimal estimate of position
• New variance is smaller than either of the previous two variances
Updating Gaussian Distributions
• One-step predicted distribution is Gaussian
P ( X t 1 | e1:t ) 

xt
P ( X t  1 | x t ) P ( x t | e1:t ) dx t
Transition
Prior
• After new (linear Gaussian) evidence,
updated distribution is Gaussian
P ( X t 1 | e1:t 1 )   P ( e t 1 | X t 1 ) P ( X t 1 | e1:t )
New
measurement
Previous
step
Why Is Kalman Great?
• The method, that is…
• Representation of state-based series with
general continuous variables grows without
bound
Why Is Time Series Important?
• Time is an important component of many
processes
• Do not ignore time in learning problems
• ML can benefit from, and in turn benefit,
these techniques
–
–
–
–
–
–
Dimensionality reduction of series
Rule discovery
Cluster series
Classify series
Forecast data points
Anomaly detection
```