### slides

Geophysical Inverse
Problems
with a focus on seismic tomography
CIDER2012- KITP- Santa Barbara
Seismic travel time tomography
Principles of travel time tomography
1) In the background, “reference” model: Travel
time T along a ray g:
1
T=ò
ds =
g v (s)
0
ò g u (s)ds
0
v0(s) velocity at point s on
the ray
u= 1/v is the “slowness”
The ray path g is determined by the velocity structure using
Snell’s law. Ray theory.
2) Suppose the slowness u is perturbed by an amount du small enough
that the ray path g is not changed.
The travel time is changed by:
1
1 dv
dT = ò d u ds = - ò 2 d vds = - ò
ds
g
g v0
g v0 v0
j=M
æ dv ö
1 dv
dTi = - ò
(s)ds = å Gij ç ÷
è v0 ø j
g v0 (s) v0
j=1
where:
G ij = -
lij
v 0j
lij is the distance travelled by ray i in block j
v0j is the reference velocity (“starting model”) in block j
Solving the problem: “Given a set of travel time perturbations dTi on an
ensemble of rays {i=1…N}, determine the perturbations (dv/v0)j in a 3D
model parametrized in blocks (j=1…M}” is solving an inverse problem of
the form:
d d = Gd m
or
M
d di = åGijd m j
j=1
d= data vector= travel time pertubations dT
m= model vector = perturbations in velocity
i = 1, N
d d = Gd m
G has dimensions M x N
or
Usually N (number of rays) > M (number of blocks):
“over determined system”
M
d di = åGijd m j i = 1, N
j=1
We write:
G T d d = G T Gd m
GTG is a square matrix of dimensions MxM
If it is invertible, we can write the solution as:
dmˆ = (G G) G dd
T
-1
T
where (GTG)-1 is the inverse of GTG
In the sense that (GTG)-1(GTG) = I,
I= identity matrix
“least squares solution” – equivalent to minimizing ||d-Gm||2
dmˆ = (G G) G dd
T
-1
T
“””least squares solution”
Minimizes ||d-Gm||2
- G contains assumptions/choices:
- Theory of wave propagation (ray theory)
- Parametrization (i.e. blocks of some size)
In practice, things are more complicated because GTG, in general,
is singular:
M
d di = åGijd m j i =1, N
j=1
Some Gij are null ( lij=0)->
infinite elements in the inverse matrix
How to choose a solution?
• Special solution that maximizes or minimizes some
desireable property through a norm
• For example:
– Model with the smallest size (norm):
mTm=||m||2=(m12+m22+m32+…mM2)1/2
– Closest possible solution to a preconceived model
<m>: minimize ||m-<m>||2
 regularization
• Minimize some combination of the misfit and
the solution size:
(m)  e e   m m
T
2
T
e=d-Gm
• Then the solution is the “damped least squares
solution”:


1
ˆ  G G  I G d
m
T
2
T
Tikhonov
regularization
• We can choose to minimize the model size,
– eg ||m||2 =[m]T[m] - “norm damping”
• Generalize to other norms.
– Example: minimize roughness, i.e. difference
between adjacent model parameters.
– Consider ||Dm||2 instead of ||m||2 and minimize:
Dm Dm  mT DT Dm  mTWmm
T
– More generally, minimize:
(m  m )T Wm (m  m )
é-1 1
ùé m1 ù
ê
úê ú
-1
1
m2 ú
ê
ú
ê
Dm =
ê
-1 1 úê ú
ê
úê ú
-11
ë
ûë mM û
<m> reference model
Weighted damped least squares
• More generally, the solution has the form:
m est  m  [G T WeG   2Wm ]1 G TWe [d  G  m  ]
or,
m
est
equivalently :
 m  W G [GW G   W ] [d  G  m ]
1
m
T
1
m
T
2
1 1
e
For more rigorous and complete treatment (incl. non-linear):
See Tarantola (1985) Inverse problem theory
Tarantola and Valette (1982)
Concept of ‘Generalized Inverse’
• Generalized inverse (G-g) is the matrix in the
linear inverse problem that multiplies the data to
provide an estimate of the model parameters;
g
ˆ G d
m
– For Least Squares
G
g
– For Damped Least Squares
– Note : Generally G-g ≠G-1


1
G G G
G
T
g

T

1
 G G   I GT
T
2
• How to choose  so that model is
not overly biased?
analysis.
“L curve”
η
m  m  2
• As you increase the damping
parameter , more priority is given
to model-norm part of functional.
– Increases Prediction Error
– Decreases model structure
– Model will be biased toward
smooth solution
Gm d
2
Model Resolution Matrix
• How accurately is the value of an inversion parameter
recovered?
• How small of an object can be imaged ?
• Model resolution matrix R:
-g
mˆ = G d
obs
-g
= G Gmtrue = Rmtrue
– R can be thought of as a spatial filter that is applied to
the true model to produce the estimated values.
• Often just main diagonal analyzed to determine how spatial resolution
changes with position in the image.
• Off-diagonal elements provide the ‘filter functions’ for every
parameter.
Masters, CIDER 2010
Checkerboard test
80%
mˆ = Rmtrue
After Masters, CIDER 2010
R = G -gG
R contains theoretical assumptions
on wave propagation, parametrization
And assumes the problem is linear
Ingredients of an inversion
• Importance of sampling/coverage
– mixture of data types
• Parametrization
– Physical (Vs, Vp, ρ, anisotropy, attenuation)
– Geometry (local versus global functions,
size of blocks)
• Theory of wave propagation
– e.g. for travel times: banana-donut
kernels/ray theory
Surface
waves
P
S SS
P, PP
S, SS
Arrivals well separated on the
seismogram, suitable for travel
time measurements
Generally:
-Ray theory
-Iterative back projection
techniques
- Parametrization in blocks
50 mn
P velocity tomography
Slabs……
Van der Hilst et al., 1998
...and plumes
Montelli et al., 2004
P Travel
Time
Tomography:
Ray
Density
maps
Vasco and
Johnson,1998
Checkerboard tests
Karason and
van der Hilst,
2000
05
Honshu
Fukao and
Obayashi
2011
±1.5 %
0
6
12
410
660
410
660
1000
07
±1.5 %
13
northern Bonin
08
14
09
11
15
15
06
11
07
12
Tonga
410
660
1000
±1.5%
13
08
±1.5%
09
14
15
10
Fukao and
Obayashi
2011
400
660
1000
Fukao and
Obayashi,
2011
Tonga
S40RTS
Ritsema et al., 2011
PRI-S05
Montelli et al., 2005
South Pacific
superswell
EPR
Rayleigh wave
overtones
By including overtones, we can
see into the transition zone and
the top of the lower mantle.
after Ritsema et al, 2004
Models from different data subsets
120 km
600 km
1600 km
2800 km
After Ritsema et al., 2004
The travel time dataset in this model includes:
Sdiff
ScS2
Multiple ScS: ScSn
Coverage of S and P
After Masters, CIDER 2010
P
S
SS
Surface
waves
Full Waveform Tomography
 Long period (30s-400s) 3- component seismic waveforms
 Subdivided into wavepackets and compared in time
domain to synthetics.
 u(x,t) = G(m)  du = A dm
A= ∂u/∂m contains Fréchet derivatives of G
UC B e r k e l e y
SS
Sdiff
PAVA
NACT
Li and Romanowicz , 1995
PAVA
NACT
2800 km depth
from Kustowski, 2006
Waveforms only, T>32 s!
20,000 wavepackets
NACT
To et al, 2005
Indian Ocean Paths - Sdiffracted
Corner frequencies: 2sec,
5sec,
18 sec
To et al, 2005
To et al., EPSL, 2005
Full Waveform Tomography using
SEM:
Data
Synthetics
Replace mode synthetics by numerical synthetics
computed using the Spectral Element Method (SEM)
UC B e r k e l e y
SEMum (Lekic and Romanowicz, 2011)
-12%
S20RTS (Ritsema et al. 2004)
-7%
70 km
+8%
+6%
-7%
-6%
125 km
+9%
+8%
-6%
-4%
180 km
+8%
+6%
-5%
-3.5%
250 km
+5%
+3%
French et al,
2012, in prep.
Courtesy of Scott French
Fukao and
Obayashi, 2011
Tonga
South Pacific
superswell
Macdonald
Samoa
SEMum2
French et al., 2012
S40RTS
Ritsema et al., 2011
EPR
Easter Island
Summary: what’s important in global
mantle tomography
• Sampling: improved by inclusion of different types of
data: surface waves, overtones, body waves,
diffracted waves…
• Theory: to constrain better amplitudes of lateral
variations as well as smaller scale features (especially
in low velocity regions)
• Physical parametrization: effects of anisotropy!!
• Geographical parametrization: local/global basis
functions
• Error estimation