Jeremy Knopp (AFRL)

Report
Anomalous Events in Non-Destructive
Inspection Data
18 Dec 2012
Jeremy S. Knopp
AFRL/RXCA
Air Force Research Laboratory
Integrity  Service  Excellence
1
Disclaimer
• The views expressed in this presentation are those of
the author and do not reflect the official policy or
position of the United States Air Force, Department of
Defense, or the United States Government
2
Outline
• Historical Perspective of Aircraft Structural Integrity
Program (ASIP)
• Probability of Detection (POD)
• Nondestructive Evaluation System Reliability
Assessment Handbook (MIL-HDBK-1823A) Revision
• Research Objectives to Improve State-of-the-Art POD
Evaluation
3
Aircraft Management Strategies
Safe Life – No Periodic Inspection Required.
–
–
–
Fly a certain number of hours and retire.
Considers the effects of cyclic loading on the airframe with full-scale fatigue test.
For example, testing to 40,000 hours ensures safe life of 10,000 hours.
•
Used by US Navy.
Damage Tolerance Assessment (DTA) – Periodic Inspection to Detect Damage
– Fly and inspect, reassess time to next inspection based on fatigue crack growth
analysis, usage, and results of inspection.
• Assumes imperfections are present in the early stages of aircraft service.
• REQUIRES RELIABLE AND VALIDATED NDI
• Used by US Air Force.
Condition-based Maintenance (CBM) – Periodic Inspection and/ or onboard monitoring to
Characterize Damage.
– Perform repairs only when needed.
• Will minimize maintenance costs.
• Requires damage characterization, not just detection.
• Desired by US Air Force to maximize availability of assets while minimizing
sustainment costs.
Condition-based Maintenance (CBM+) – Periodic Inspection to Characterize Damage
–
CBM plus prognosis to estimate capability and remaining life for optimal
maintenance scheduling.
4
The USAF Aircraft Structural Integrity
Program (ASIP)
• Provides the engineering discipline and management
framework …
– associated with establishing and maintaining structural safety …
– in the most cost-effective manner …
– through a set of defined inspections, repairs, modifications and
retirement actions
• Based on a preventative maintenance strategy that starts
in acquisition and continues until retirement
5
“Wright” approach to Structural Integrity
• Approach used by Wright brothers
began in 1903.
• Essentially the same approach used
by USAF for over 50 years.
• They performed stress analysis and
conducted static tests far in excess
of the loads expected in flight.
• Safety factor applied to forces that
maintained static equilibrium with
weight.
6
B-47 Experience, 1958
• Air Force Strategic Air Command lost
two B-47 Bombers on the same day!
• Metal fatigue caused the wings on two
aircraft to fail catastrophically in flight.
• Standard static test and abbreviated
flight load survey proved structure
would support at least 150% of its
design limit load.
• No assurance that structure would
survive smaller cyclic loads in actual
flight.
7
ASIP Initiated
• Aircraft Structural Integrity Program (ASIP) initiated on
12 Jun 1958 with 3 primary objectives:
– Control structural fatigue in aircraft fleet.
– Develop methods to accurately predict service life.
– Establish design and testing methods to avoid structural
problems in future aircraft systems.
• Led to the “safe-life” approach.
– Probabilistic approach to establishing the aircraft service life
capability.
– Safe-life established by conducting a full-scale airframe fatigue
test and dividing the number of successfully test simulated flight
hours by a scatter factor (usually 4).
8
F-111 Experience, 1969
• Wing separation at ~100 hours (safe-life qualified 4000
hours). Crack initiated from a manufacturing defect.
• Two-phase program initiated.
• Phase 1 (allow operations at 80% of
designed capability)
– Material crack growth data collected to
develop flaw growth model.
– Cold proof test to demonstrate that
critical size flaws not present in critical
forgings
– Improved NDI for use in reinspection
• Phase 2 (allow operations at 100% of
designed capability)
– Incorporated NDI during production.
– Used fracture mechanics to determine
inspection intervals.
9
Damage Tolerance Update, 1974
• In response to F-111 mishap, ASIP incorporated Damage
Tolerance requirements.
– Objective was to prevent airframe failures resulting from the safe
life approach .
• ASIP provides 3 options to satisfy damage tolerance
requirement
– Slow crack growth (most common option)
– Fail-safe multiple load path
– Fail-safe crack-arrest
• Primary basis for aircraft structure maintenance program
for last 30+ years.
– Inspection requirements based on initial flaw assumptions (slow
crack growth) and NDI capability.
• Today - Inspection burden is increasing due to age of fleet!
– NDE Research needed to reduce the future maintenance burden.
10
Evolution of Structural Integrity
Approaches
Timeframe Associated with ASIP Approach
ASIP Approach
Prevent Structural Failures
Cost-Effectively
1950 1960 1970 1980 1990 2000 2010 2020
Prevent Static Load Failures
Prevent Fatigue Failures
Protect for Potential Damage
Risk Assessment/Management
MIL-STD-1530C
Each change was made to enhance our ability to protect structural
integrity (prevent structural failures)
Today, preventing structural failures requires anticipating events that
ensure continuing airworthiness, reliability, availability, and costeffectiveness
11
USAF Structural Reliability
• USAF aircraft losses since 1971:
– 18 due to a structural failure
– 19 due to a structural failure that was caused by maintenance,
pilot error, flight control failures, etc.
• Next chart plots overall USAF aircraft loss rate from
1947 – 2002 and structures contribution since 1971
– Overall loss rate calculated for each year (total losses per year /
total fleet flight hours per year)
– Loss rate due to structures is cumulative since many years
without losses due to structural failure
12
USAF Structural Reliability
USAF Aircraft Loss Rate (Destroyed Aircraft)
Number of Aircraft Losses / Flight Hours
1.E-03
All Causes
Structures = 37
1.E-04
1.E-05
1.E-06
1.E-07
1.E-08
1940
1 C.
Structures = 18
1950
1960
1970
1980
1990
2000
2010
Babish, “USAF ASIP: Protecting Safety for 50 Years”, Aircraft Structural Integrity Program Conference (2008)
13
Rare Events
• Nov 2, 2007 – Loss of F-15C
airplane, 0 casualties
• Aircraft operated within limits
• Mishap occurred due to a
fatigue failure in a forward
fuselage single-load-path.
• Hot spot missed during design
and testing and aggravated by
rogue flaw.
• NDI can be used to prevent
fracture at this hot spot.
14
Reliability of NDT
• Probability of Detection1
• Given a population of cracks of size ‘a’
– geometry, material, orientation, location, …
• Given a defined inspection system
• POD(a) = Probability that selected cracks of size ‘a’
from the population will be detected
– POD(a) = Proportion of all size ‘a’ cracks from the population
that would be detected
1 A.
P. Berens, NDE Reliability Data Analysis. In American Society for Metals Handbook Vol 17 Nondestructive Evaluation and Quality Control,
pp. 689-701. ASM International, 1989.
15
Reliability of NDT
• POD curve
1
0.9
• Two parameters
0.8
– (μ and σ)
• σ describes slope of
the curve. Steep
curve is ideal.
a50
0.6
POD
• μ is a50
a90
0.7
a90/95 or ande
0.5
0.4
0.3
0.2
0.1
0
0
0.5
1
1.5
2
2.5
3
flaw size (mm)
3.5
4
4.5
5
16
Inspection Intervals
ASIP Damage Tolerance Inspection Intervals
Tf
T3
aCR
Crack size - a
Inspections occur at 1/2 the time
associated with the time it takes
for a crack to grow from initial size
to failure, e.g., T2 = 0.5*(T3 - T1)
acr-miss
aNDE
a0
T1
T2
T3
Equivalent (standard spectrum) or Flight hours
17
Reliability in NDT
• What is ande?
1
• aNDE is the “reliably” detected
crack size for the applied
inspection system.
• Variations of this can be
investigated.
0.8
a90
0.7
a50
0.6
POD
• Traditionally, reliably detected
size has been considered to
be the a90 or a90/95 crack size
from the estimate of the NDE
system POD(a).
0.9
a90/95 or ande
0.5
0.4
0.3
0.2
0.1
0
0
0.5
1
1.5
2
2.5
3
flaw size (mm)
3.5
4
4.5
5
18
Reliability of NDE
• Development of POD was a very important contribution
to quantifying performance of NDE
• Necessary for effective ASIP program. Damage
Tolerance approach requires validated NDE capability.
• Quantifying largest flaw that can be missed is
important.
• Capability of detecting small flaws less important.
• First serious investigation
– Packman et al 19671
– Four NDI methods (X-ray, dye penetrant, magnetic particle,
and ultrasonics)
Packman et al. The applicability of a fracture mechanics – nondestructive testing design criterion. Technical Report AFML-TR-68-32,
Air Force Materials Laboratory, USA, May 1968.
1 P.F.
19
Reliability of NDT
• Rummel et al 19741
– NASA Space Shuttle Program
– Five NDI methods (X-ray, fluorescent penetrant, eddy
current, acoustic emission, and ultrasonics)
• Lewis et al 19782 (a.k.a – “Have Cracks Will Travel”)
– Major US Air Force program to determine reliability.
– Perhaps the largest program of this kind in history.
– Disappointing results concerning NDI capabiliity.
• Both studies inspired more advanced statistical analysis
1 W.D.
Rummel et al, The detection of fatigue cracks by nondestructive testing methods. Technical Report NASA CR 2369, NASA Martin
Marietta Aerospace, USA, Feb 1974.
2
W.H. Lewis et al, Reliability of nondestructive inspection – final report. Technical Report SA-ALC/MME 76-6-38-1, San Antonio Air Logistics
Center, USA, Dec 1978.
20
Statistical Analysis – POD
• Two types of data collected
– “Hit/Miss” – binary data in terms of whether or not a flaw is found
– “â vs a” – continuous response data has more information
(â = signal magnitude, a = size)
• Statistical rigor introduced in USAF study conducted by
Berens and Hovey in 19811.
– Previous analysis methods grouped “hit/miss” data into bins and
used binomial statistics to evaluate POD.
– Berens and Hovey introduced mathematical model based on loglogistic cumulative distribution function to evaluate POD. This is
still standard practice.
1 A.P.
Berens and P.W. Hovey, “Evaluation of NDE Reliability Characterization,” AFWAL-TR-81-4160, Vol 1, Air Force WrightAeronautical Laboratories, Wright-Patterson Air Force Base, Dec 1981.
21
Statistical Analysis – POD
• Hit/Miss analysis
– Sometimes only detection information available
(i.e.
penetrant testing). Can also be used if constant variance
assumption is violated.
– Model assumes POD is a function of flaw size.
 log(a)   
POD(a)   
    0  1 log(a) 



– For logit model (logistic)
 ( z) 
exp(z )
1  exp(z )
– For probit model (lognormal)  (z) is the standard normal
cumulative distribution function.
– Maximum likelihood estimates
0 


and
1 
1

1 A.
P. Berens, NDE Reliability Data Analysis. In American Society for Metals Handbook Vol 17 Nondestructive Evaluation and Quality Control,
pp. 689-701. ASM International, 1989.
22
Statistical Analysis – POD
• Hit/Miss analysis
– Unchanged since Berens and Hovey except for confidence
bound calculations.
– Confidence bound calculations are not available in any
commercial software package.
– Traditional Wald method for confidence bound calculation is
anti-conservative with hit/miss data.
– Likelihood ratio method for confidence bound calculation is
used in the revised MIL-HNBK-1823A. This is a very
complicated calculation. See Annis and Knopp for details1.
1 C.
Annis and J.S. Knopp, “Comparing the Effectiveness of a90/95 calculations”, Rev. Prog. Quant. Nondestruct. Eval. Vol 26B pp. 1767–1774, 2007
23
Statistical Analysis – POD
• Hit/Miss analysis
– example
a90
a90
95
a50
0.040
1.0
0.035
0.8
0.7
a50 0.1156
a90
0.1974
95
0.5
a
POD a
1
link function =
logit
^ 0.1156
^ 0.025147
0.2
n hits
0.1
0.020
92
n total 134
0.0
0.015
EXAMPLE 3 hm.xls
0.0
0.1
0.2
Size, a (inches)
1 MIL-HDBK-1823A,
0.3
+
0.025
0.4
0.3
0.030
1
a90 0.1709
0.6
1
Probability of Detection, POD | a
0.9
loglikelihood ratio
Cheng & Iles approx
EXAMPLE 3 hm.xls
0.4
mh1823
Non-Destructive Evaluation System Reliability Assessment (2009).
0.10
0.11
0.12
0
1
0.13
mh1823
24
Statistical Analysis – POD
• “â vs a” analysis (â = signal strength, a = flaw size)
– Magnitude of signal contains information.
– More information results in more statistical confidence, which
ultimately reduces sample size requirements.
– Again, regression model assumes POD is function of flaw size.
– Censored regression almost always involved, so commercial
package such as SAS or S-Plus necessary.
where,
 log(a)   
POD(a)   





1 MIL-HDBK-1823A,
(log(athreshold )   0 )
1


1
Note :  2 
Non-Destructive Evaluation System Reliability Assessment (2009).
Regression variance
25
Statistical Analysis – POD
• â vs a analysis
1400
• Basically a linear
model.
POD(a)
1200
• Delta method used
to generate
confidence intervals
on POD curve.
0.6
+
0.4
0.2
1000
response, â
• Wald confidence
intervals sufficient.
+
0.8
+
800
600
a50
8.8
a90
12.69
a90
95
13.68
400
200
P false call
-----
0.11
-
EXAMPLE 1 â vs a.xls
0
2
10
0
3
4
5
6 7
9
10
2
1
Size, a (mils)
3
4
5
6 7
9
10
2
mh1823
26
MIL-HDBK-1823A
27
MIL-HDBK-1823A
Summary
• Completed in 2007; released in 2009
• 132 pages
• All new figures (65)
• Approximately 70% new text
• Based on best-practices for NDE and statistical
analysis
• 100% new software available
– â vs. a
– hit/miss
28
MIL-HDBK-1823A
Support Website
• Download the Handbook
• Request the mh1823 POD software
http://mh1823.com/mh1823
29
Addressing Deficiencies (1)
• Concern exists on performing a POD calculation on poor data sets
– Poor data sets can be defined as:
• Limited in sample size
• Data does not follow typical POD model fits
– Problem when wrong model used for statistical inference
–[loop:Worst
case scenario:
database –a fictitious
database – a90/95 may be obtained.
j=1:P]
feature
vector (i,j, kl')
damage state
measures(i,j,m)

   ln a  ln   
POD(a)       1  exp

 

3
 


• Onedistributed
possible signal
remedy issignal
a ‘4 parameter
model’:
damage
maintenance
sensor data
(i,j,k,l)
processing /
feature
extraction
classification
decision
criteria
action
(i,m)
– Proposed by Moore and Spencer in 1999,
database –
damage state
measures(i,j,m)
damage
decision
criteria
e state
– â(i,j,m)
raw
PODdata
(a)(i,j,k,l)
 



vector
feature
 exp
1(i,j,kl')



state
 damage
ln a  ln
  

 
measures
– â(i,j,m)

3
 
1
call
(i,m)
1
α : false call rate
β : 1 - random missed flaw rate
σ : curve steepness
μ : flaw size median (50% POD)
maintenance parameter
– However,
problem difficult using
α : false estimation
call rate
action
classical
statistical
(i,m)
β : 1 - randommethods
missed flaw rate
σ : curvemethods
steepness
– call
It is likely that such
also require large data sets
(i,m)
μ : flaw size
median (50% POD)
(Very little work
performed
to date)
30
Addressing Deficiencies (2)
• Markov-Chain Monte Carlo (MCMC) offers a flexible method
to use sampling to calculate confidence bounds.
• Bayesian approach with non-informative priors can be used to
– Model function: Logit or Probit
– Model form: (Parameters): 2, 3, and 4 parameter models.
• Upper Bound = P(random missed call) = 
• Lower Bound = P(false call rate) = 
1.0
database –
[loop: j=1:P]
feature –
skewnessdatabase
vector
(i,j, kl')
feature
vector (i,j, kl')
[loop: j=1:P]
POD
0.5
distributed
sensor data
distributed
(i,j,k,l)
sensor data
(i,j,k,l)
signal
processing /
signal
feature
processing
extraction/
feature
signal
classification
signal
classification
raw
extraction feature
Pdata
FC (i,j,k,l)
vector (i,j,kl')
raw
data (i,j,k,l)
0
feature
vector (i,j,kl')
a50
P
database –
RMC

database

damage
state –


POD
(
a
)






1  exp
measures(i,j,m)
damage state
measures(i,j,m)
damage
decision
damage
criteria
decision
criteria
damage state
measures – â(i,j,m)
damage state
measures – â(i,j,m)
1
  ln a  ln   
1
    lna  ln   
3      
   1  exp
POD(a)    
 

maintenance
action
maintenanceβ :
(i,m)
call
(i,m)
call
(i,m)
crack length (a)
action
(i,m)
3


α : false call rate

 
: falseflaw
callrate
rate
1 - random α
missed
σ
steepness
β ::curve
1 - random
missed flaw rate
μ : flaw size median
(50%
POD)
σ : curve
steepness
μ : flaw size median (50% POD)
31
Bayesian Approach
Prior, “Belief”
Physics Based Model
Likelihood
Posterior
•
•
•
•
•
•
p ( y |  ) p ( )
p ( | y ) 
p( y )
Normalizing Constant
Prior – Physics based model or expert opinion
Normalizing Constant : Useful in model selection
Likelihood: forward model and measurement data
Posterior: Integration of information from model
and experimental data
y: data
λ : parameter(s)
posterior
likelihood
prior
q
32
Bayes Factors for Model Selection
Compare two models M2 and M1 Using the Bayes Factor
Marginallikelihood( M 2 ) P(y | M 2 )
BF21 

Marginallikelihood( M1 ) P(y | M1 )
Candidate
models
Model 1
Parameter
estimation
θˆ 1
θˆ 2
θˆ 3
Model
comparison
P(y | M1)
P(y | M2)
P(y | M3)
Model 2
BF21
BF
<1
2log(BF)
<0
1~3
0~2
3~20
20~150
>150
2~6
6~10
>10
Model 3
BF32
Strength of evidence
Negative (Support M0)
Barely worth
mentioning
Positive
Strong
Very Strong
―Bayes Factors by Kass and Raftery, 1995
33
Difficult Data Set #1
• NTIAC A9002(3)L
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
What’s going on here?
0.1
0
0
2
4
6
8
10
12
14
16
18
NTIAC, Nondestructive Evaluation (NDE) Capabilities Data Book 3rd ed., NTIAC DB-97-02, Nondestructive Testing Information Analysis Center, November 1997
34
Difficult Data Set #1
PROBABILITY OF DETECTION (%)
• Example of using the wrong model.
100
100
90
90
80
80
70
70
60
50
40
30
20
Data Set:
Test Object :
Aluminum,
A9002(3)L
2219
Stringer
Stiffened Panels
Condition:
After Etch
Method:
Eddy Current,
Raster
Scan with
Tooling Aid
10
60
50
40
30
20
10
0
0
-0.05 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 0.45 0.50 0.55 0.60 0.65 0.70 0.75
ACTUAL CRACK LENGTH - (Inch)
NTIAC, Nondestructive Evaluation (NDE) Capabilities Data Book 3rd ed., NTIAC DB-97-02, Nondestructive Testing Information Analysis Center, November 1997
35
Difficult Data Set #1
• 2 parameter logit/probit
• Appears to show a90 and a90/95 values
1
1
0.9
0.9
0.8
0.8
0.7
0.7
0.6
â
â
0.5
0.6
0.5
0.4
0.4
0.3
0.3
0.2
0.2
0.1
0.1
0
0
2
4
6
8
10
12
a (mm)
14
16
18
20
0
0
2
4
6
8
10
12
14
16
18
20
a (mm)
36
Difficult Data Set #1
• 3 parameter lower logit/probit
• Again, appears as if there are a90 and a90/95 values
â
1
1
0.9
0.9
0.8
0.8
0.7
0.7
0.6
0.6
â
0.5
0.5
0.4
0.4
0.3
0.3
0.2
0.2
0.1
0.1
0
0
2
4
6
8
10
12
a (mm)
14
16
18
20
0
0
2
4
6
8
10
12
14
16
18
20
a (mm)
37
Difficult Data Set #1
• 3 parameter upper logit/probit
1
1
0.9
0.9
0.8
0.8
0.7
0.7
â
0.6
â
0.6
0.5
0.5
0.4
0.4
0.3
0.3
0.2
0.2
0.1
0.1
0
0
2
4
6
8
10
12
14
a (mm)
16
18
20
0
0
2
4
6
8
10
12
a (mm)
14
16
18
20
38
Difficult Data Set #1
• Case study 4 parameter probit
intercept
slope
4500
4500
4000
4000
3500
3500
3000
3000
lower asymptote upper asymptote
2500
4000
1
0.9
3500
0.8
2000
3000
0.6
2500
1500
2500
0.7
â
2500
2000
2000
2000
0.4
1000
1500
1500
1000
1000
500
500
0
-40
-20
0
0
0.5
1500
0.3
1000
0.2
500
0.1
500
0
20
40
0
0
0.5
0
0
0.70.80.9
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5
a (mm)
39
Difficult Data Set #1
• 4 parameter Logit is most likely
1
0.9
0.8
0.7
â
0.6
0.5
0.4
0.3
0.2
0.1
0
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5
a (mm)
40
Difficult Data Set #1
• Summary of Results
2 parameter Logit
2 parameter Probit
3 parameter lower bound Logit
3 parameter lower bound Probit
3 parameter upper bound Logit
3 parameter upper bound Probit
4 parameter Logit
4 parameter Probit
intercept
-1.6645
-0.8476
-1.8501
-1.0195
-5.4408
-2.788
-13.7647
-9.8542
slope
1.7257
0.9242
1.7485
0.9616
5.5486
2.9377
12.2874
8.674
lower
upper
0.0898
0.1098
0.8478
0.8443
0.175 0.8307
0.1864 0.8282
ML
a90 a90/95
3.70E-97 9.4148 12.555
9.08E-98 10.0156 13.4993
7.29E-98
1.02E-98
3.27E-93
1.64E-93
7.24E-92
2.49E-92
41
Difficult Data Set #2
• Example of using the wrong model.
100
100
90
90
PROBABILITY OF DETECTION (%)
80
70
60
50
40
Data Set:
D8001(3)L
Test Object :
Aluminum,
80
2219
Stringer
Stiffened Panels
Condition:
As
Machined
Method:
70
60
50
40
30
30
20
20
10
10
0
0
-0.050.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 0.45 0.50 0.55 0.60 0.65 0.70 0.75
What’s
going on
here?
ACTUAL CRACK LENGTH - (Inch)
• Note: MH1823 Software Produce Numerous Warnings.
NTIAC, Nondestructive Evaluation (NDE) Capabilities Data Book 3rd ed., NTIAC DB-97-02, Nondestructive Testing Information Analysis Center, November 1997
42
Difficult Data Set #2
• 2 parameter logit/probit
• Appears to that a90 and a90/95 values exist.
1
1
0.9
0.9
0.8
0.8
0.7
0.7
0.6
â
â
0.5
0.5
0.4
0.4
0.3
0.3
0.2
0.2
0.1
0
0.6
0.1
0
2
4
6
8
10
12
14
a (mm)
16
18
20
0
0
2
4
6
8
10
12
14
16
18
20
a (mm)
43
Difficult Data Set #2
• 4 parameter probit
• a90 and a90/95 value doesn’t exist
1
0.9
0.8
0.7
0.6
0.5
â
0.4
0.3
0.2
0.1
0
0
1
2
3
4
5
6
a (mm)
7
8
9
10
44
Difficult Data Set #2
• Which model is correct?
• Log Marginal Likelihoods and Bayes factors
Bayes factor
Model type
logit
probit
/2-parameter
/2-parameter
(logit)
(probit)
logit/probit
2-parameter
–200.16
–201.63
1.47
———
———
–203.86
–203.49
–0.37
–3.7
–1.86
–189.30
–189.00
–0.30
10.86
12.63
–188.89
–185.12
–3.76
11.27
16.51
3-parameter lower
bound
3-parameter upper
bound
4-parameter
45
Small Data Set
• A great example where the last procedure fails
• Small data sets do not cause any warnings with
standard software.
46
Small Data Set
• 4 parameter model
1
0.9
0.8
0.7
0.6
â
0.5
0.4
0.3
0.2
0.1
0
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
a (inches)
47
Small Data Set
• Summary for Small data set
Maus
2 parameter Logit
2 parameter Probit
3 parameter lower bound Logit
3 parameter lower bound Probit
3 parameter upper bound Logit
3 parameter upper bound Probit
4 parameter Logit
4 parameter Probit
IMTT
2 parameter Logit
2 parameter Probit
3 parameter lower bound Logit
3 parameter lower bound Probit
3 parameter upper bound Logit
3 parameter upper bound Probit
4 parameter Logit
4 parameter Probit
intercept
slope
lower
upper ML
27.1517
12.6708
22.6165
10.577
24.0452
11.6731 0.0719
24.5728
12.1283 0.0705
27.6386
12.5028
0.7967
22.0074
9.8899
0.7917
24.8792
11.3063 0.0706 0.7926
23.7871
10.6664 0.0711 0.7781
25.5467
19.667
28.5743
23.2202
24.5688
20.405
25.3354
26.4209
9.8023
7.4983
11.3208
9.2585
9.2608
7.6861
9.9529
11.153
0.1273
0.1391
0.1263
0.1679
0.9055
0.9041
0.9067
0.8884
a90
a90/95
8.87E-04 0.1405 0.1829
2.60E-03 0.1329
0.159
9.67E-06
9.16E-06
2.29E-04
4.87E-05
1.97E-05
3.18E-06
3.30E-03
4.40E-03
2.07E-04
1.58E-04
6.34E-06
3.10E-05
2.17E-05
5.51E-07
0.0941
0.0873
0.1248
0.1139
4.51E+01
8.17E+02
1.52E+02
7.99E+03
48
Conclusion
• It sometimes appears (and is desirable) that there is a
systematic procedure that will automatically determine the
best model, but this actually isn’t the case.
• Bayes Factors provide useful approach to evaluate the
best model
• However, an example with a small data set showed that
even the Bayes factor procedure can lead one to a wrong
conclusion
– It doesn’t tell you to stop and not perform an analysis
– Need to look at data and perform ‘diagnostics’
• Bottom line – Procedures don’t replace statisticians.
49
Model-Assisted POD
•
C-5 Wing Splice Fatigue Crack Specimens:
–
–
–
–
–
–
Two layer specimens are 14" long and 2" wide,
0.156" top layer, 0.100" bottom layer
90% fasteners were titanium, 10% fasteners were steel
Fatigue cracks position at 6 and 12 o’clock positions
Crack length ranged from 0.027" – 0.169“ (2nd layer)
vary: location of cracks – at both 1st and 2nd layer
1st layer – corner crack
z
b
a
2nd
layer – corner crack
x
z
b
•
a
AFRL/UDRI Acquired Data (Hughes, Dukate, Martin)
A1-16C0
4
20
40
60
80
100
120
0.110"
2
0
100
0.107"
200
300
400
A1-16C0
500
600
700
20
40
60
80
100
120
2
0
-2
100
200
300
400
500
600
700
50
MAPOD
•
•
Perform simulated studies: Compare with experimental results
Bayesian methods can assist in determining best model.
0.1
0.1
model
exp.
0.08
measurement response (V)
measurement response (V)
0.08
0.06
0.04
0.02
0
-0.02
model-corner
model-through
exp.
0.06
0.04
0.02
0
0
0.02
0.04
0.06
0.08
0.1
0.12
crack length (in)
0.14
0.16
0.18
A) 1st layer – faying surface – corner cracks
x
-0.02
0.2
0
0.02
0.04
z
0.14
x
b
b
0.08
0.1
0.12
crack length (in)
0.16
0.18
0.2
B) 2nd layer – faying surface – corner / through cracks
x
z
0.06
x
z
z
b
a
a
a
a
51
MAPOD
Demonstration of model-assisted probability of detection (MAPOD)
Experimental Comparison with Full Model-Assisted
2nd layer – faying surface – corner & through cracks
1
0.9
MAPOD
exp.
0.8
POD
0.7
0.6
Successes:
• First demonstration of (MAPOD)
0.4
in the literature for structural
0.3
problem.
0.2
experimental POD
• Eddy current models were able to
0.1
full model-assisted POD
simulate eddy current inspection
0
0
0.05
0.1
0.15
0.2
0.25
of 2nd layer fatigue cracks around
crack length (in)
fastener holes.
Knopp, Aldrin, Lindgren, and Annis, “Investigation of a model-assisted approach to probability of detection evaluation”, Review of Progress in Quantitative
Nondestructive Evaluation, (2007)
.
0.5
52
Heteroscedasticity
• â vs a analysis
•
Berens, A.P and P.W. Hovey, “Flaw Detection Reliability Criteria, Volume I – Methods and Results,” AFWALTR-84-4022, Air Force Wright Aeronautical Laboratories, Wright-Patterson Air Force Base, April 1984 (â vs a
analysis is always more advantageous than hit/miss because much more information is available, but
hit/miss is used much more in practice)
•
Berens, A.P., NDE Reliability Data Analysis, American Society for Metals Handbook Nondestructive
Evaluation and Quality Control, Vol 17, pp. 689-701, ASM International, 1989. (classic reference on the
subject, still standard today)
•
MIL-HDBK-1823 (1999) – (Guidance for POD studies based on the methods described by Berens and Hovey)
• Box Cox transformations
•
Kutner, Nachtsheim, Neter, and Li, “Applied Linear Statistical Models”, (2005)
53
Heteroscedasticity
• â vs a assumes homoscedasticity, and if that
assumption is violated, one must resort to hit/miss
analysis. This was the case for an early MAPOD
study (Knopp et al. 2007)
• Box-Cox transformation can remedy this problem.
0.10
signal response â
0.08
0.06
0.04
0.02
0.00
-0.02
0
1
2
3
crack size (mm)
4
5
54
Heteroscedasticity
• Box Cox transformation according to Kutner et al.
• Note: Not to be used for nonlinear relations.
• Box-Cox identifies transformations from a family of
power transformations.
• The form is:
â  â 
• Some common transformations
2
  0.5
 0
   0 .5
   1 .0
â  â 2
â  â
â  loge â
1
â
1
â 
â
â 
55
Heteroscedasticity
• New regression model with power transform:
âi  0  1ai   i
• λ needs to be estimated. Box-Cox uses maximum likelihood.
• I use Excel’s Solver to do a numerical search for potential λ
values.
• Standardize observations so that the magnitude of the error sum
of squares does not depend on the value of λ.
gi 
1
(âi  1),
 1
c
gi  cln(âi ,
0
 0
c   âi 
1/ n
• c is the geometric mean of the observations
• Next step is to regress g on a for a given λ and calculate SSE.
56
Heteroscedasticity
• The value of λ that minimizes SSE is the best
transformation.
• This procedure is only a guide, and a high level of
precision is not necessary.
• For this data set, λ = 0.45
â transformed lambda = 0.45
0.12
0.10
â + 0.02
0.08
0.06
0.04
0.02
0.00
0
1
2
3
crack size (mm)
4
5
0.42
0.40
0.38
0.36
0.34
0.32
0.30
0.28
0.26
0.24
0.22
0.20
0.18
0.16
0.14
0.12
0.10
0.08
0.06
0.04
0
1
2
3
4
5
crack size (mm)
57
Heteroscedasticity
• Box Cox – POD curve associated with λ = 0.45
a90
transform.
a90 95
a50
probability of detection, POD | a
1.0
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0.0
0
1
2
3
size, a (mm)
4
58
Heteroscedasticity
• Box Cox transformation – square root transform
0.36
0.34
0.32
0.30
0.28
0.26
0.24
0.22
0.20
0.18
0.16
0.14
0.12
0.10
0.08
0.06
0.04
0.02
0.00
0.25
response, â
â transformed lambda = 0.5
0.30
0.20
0.15
0.10
0.05
0
1
2
3
crack size (mm)
4
5
0.00
0
1
2
3
4
size, a (mm)
59
Heteroscedasticity
• POD result for square-root
a90 transform
a50
a90
95
probability of detection, POD | a
1.0
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0.0
0
1
2
3
size, a (mm)
4
60
Summary
• Box-Cox enables â vs a analysis for data sets where
the variance is not constant but has some
relationship with the independent variable such as
crack size.
analysis
method
1st order linear
λ
detection
threshold
0.23
false
calls
0
a90 (mm)
a90/95 (mm)
0.45
left
censor
0.13
a90 - a90/95 %
2.176
2.327
6.9%
1st order linear
0.5
0.14
0.195
1
2.102
2.257
7.3%
1st order linear
0.5
0.195
0.195
1
2.269
2.53
11.5%
2nd order
linear
0.5
.14
0.195
1
2.277
2.472
8.5%
2nd order
linear
0.5
0.195
0.195
1
2.197
2.428
10.5%
hit/miss
1
0.187
1
1.72
2.04
18.6%
hit/miss
1
0.162
11
1.498
1.907
27.3%
difference
61
Physics-Inspired Models
• MAPOD – idea is to use simulation to reduce time and
cost of POD studies.
• Properly integrating simulation and experiment is an
enormous task.
• Intermediate step is to use models to inspire the
functional form of the regression model.
62
Physics-Inspired Models - literature
•
R.B. Thompson and W.Q. Meeker, “Assessing the POD of Hard-Alpha Inclusions from Field Data”, Review of Progress in QNDE,
Vol. 26, AIP, pp 1759-1766, (2007). (Example where kink regression is used to distinguish between Raleigh scattering at small flaw
sizes and regular scattering at larger sizes)
Figure from http://www.tc.faa.gov/its/worldpac/techrpt/ar0763.pdf
63
Physics-Inspired Models
• Simulation and Experiment
• Visual inspection reveals that a 2nd order linear model may fit the
data better than the standard â vs a analysis.
• Evidence beyond visual: 1) p-value for a2 is 0.001 and adjusted
R-square value increases slightly with inclusion of a2.
0.10
Simulation
Experiment
0.08
quadratic model
0.06
0.04
0.02
0.00
-0.02
-0.5
0.0
0.5
1.0
1.5
2.0
2.5
3.0
3.5
4.0
4.5
5.0
0.36
0.34
0.32
0.30
0.28
0.26
0.24
0.22
0.20
0.18
0.16
0.14
0.12
0.10
0.08
0.06
0.04
0.02
0.00
0
1
2
3
4
5
64
Physics-Inspired Models – parallel work
• Recent unpublished work by Li, Nakagawa, Larson,
and Meeker.
http://www.stat.iastate.edu/preprint/articles/2011-05.pdf
65
Summary
• Physics model hopefully provides functional form of
the response, and this knowledge can be used in the
initial DOE for a POD study.
• Physics-Inspired model concept is a first step in
using physics models for making inference on
reliability.
• Confidence bounds calculation on models more
complicated than â vs a is an open problem,
especially transforming to probability of detection
curve.
66
Bootstrap Methods
• Confidence bound calculations are complicated and
only available for hit/miss and â vs a analysis.
• More complicated models require new method.
• Bootstrap methods are simple and flexible enough to
provide confidence bounds for a wide variety of
models.
67
Bootstrap Methods - literature
• Efron, B., and Tibshirani, R. J., An Introduction to the Bootstrap,
Chapman & Hall, New York, NY, 1993.
• C.C. McCulloch and J. Murphy, “Local Regression Modeling for
Accurate Analysis of Probability of Detection Data”, Mat. Eval.,
Vol. 60, no. 12, pp. 1438-1143, (2002) (A rare example of
bootstrapping used in NDE context)
• Amarchinta, Tarpey, and Grandhi, “Probabilistic Confidence
Bound Framework for Residual Stress Field Predictions”, 12th
AIAA Non-Deterministic Approaches Conference, AIAA-20102519, Orlando, FL, (2010).
68
Bootstrap Methods
• Bootstrap procedure is simply to sample with
replacement and generate a POD curve each time.
• Sort all of the a90 values in ascending order and look
at the value in the 95th percentile to determine a90/95
• Example for the previous transformed data set with λ
= 0.5
a90
a90/95
Wald Method
2.102 mm
2.257 mm
Bootstrap 1,000
2.096 mm
2.281 mm
Bootstrap 10,000
2.099 mm
2.299 mm
Bootstrap 100,000
2.099 mm
2.297 mm
69
Summary
• Bootstrapping is beautiful.
• 1,000 samples probably sufficient, but 100,000 isn’t that difficult.
• Some interesting formal work could be done to look at the
influence of censoring, which is probably beyond the scope of
this work.
• Results seem to indicate the 2nd order model (which I think is the
best) is the most conservative.
• Further investigation of censoring planned.
analysis
method
λ
left
censor
detection
threshold
false
calls
a90 (mm)
a90/95 (mm)
1st order linear
1st order linear
1st order linear
2nd order linear
2nd order linear
hit/miss
hit/miss
0.45
0.5
0.5
0.5
0.5
1
1
0.13
0.14
0.195
.14
0.195
0.23
0.195
0.195
0.195
0.195
0.187
0.162
0
1
1
1
1
1
11
2.176
2.102
2.269
2.277
2.197
1.72
1.498
2.327
2.257
2.53
2.472
2.428
2.04
1.907
a90 - a90/95 %
difference
6.9%
7.3%
11.5%
8.5%
10.5%
18.6%
27.3%
70
Summary
• Hit/miss analysis – MCMC
• â vs a analysis – unchanged
• Higher order / complex models – bootstrapping
• Methods presented for putting confidence bounds on are
not elegant by any stretch of the imagination, but
incredibly robust and useful.
• Much work needs to be done via simulation to move these
methods into practice.
• UQ – Progress made on uncertainty propagation.
• UQ – Bayesian calibration techniques being investigated.
71
Efficient Uncertainty Propagation
• Deterministic simulations are very time consuming.
• NDE problems require stochastic simulation if the
models are to truly impact analysis of inspections.
• Need modern uncertainty quantification methods to
address this problem.
 1 
 
 2
  
 
 n 
Eddy Current
NDE
Model
[Stochastic]
~
Z
72
Efficient Uncertainty Propagation
Uncertainty Propagation Methods:
•
•
•
•
•
•
•
•
Monte Carlo
Latin Hypercube (Sampling Methods)
FORM/SORM
Full Factorial Numerical Integration
Univariate Dimension Reduction
Karhunen–Loève Expansion / ANOVA (High Dimension Problems)
Polynomial Chaos Expansion (Intrusive)
Probabilistic Collocation Method (Non Intrusive)
73
Uncertainty Propagation
• Motivation: Model evaluations are computationally expensive.
There is a need for more efficient methods than Monte Carlo
Input Parameters with Variation:
•
Probe dimensions
(Liftoff / tilt)
•
Flaw characteristics
(depth, length, shape)
 1 
 
 2
  
 
 n 
X 1 ~ Norm al

X n ~ Uniform
Eddy Current NDE
Model
[Deterministic]
~
Z
Eddy Current NDE
Model
[Stochastic]
~
Z ~?

• Objective: Efficiently propagate uncertain inputs through “black
box” models and predict output probability density functions.
(Non-intrusive approach)
• Approach: Surrogate models based on Polynomial Chaos
Expansions meet this need.
74
Uncertainty Propagation
Uncertainty propagation for parametric NDE characterization problems:
•
Probabilistic Collocation Method (PCM) approximates model response with
a polynomial function of the uncertain parameters.
•
This reduced form model can then be used with Zˆ  f (x) 
 cii (x)
N
i 1
traditional uncertainty analysis approaches,
such as Monte Carlo.
Extensions of generalized polynomial chaos (gPC) to high-dimensional
(2D, 3D) damage characterization problems:
•
•
•
Karhunen-Loeve expansion
Analysis of variance (ANOVA)
Smolyak Sparse Grids
Critical
Flaw Size
1
Key Damage and
Measurement
States (e.g. crack
length, probe liftoff)
>1
Parameterized
Flaw
Localization
and Sizing
Full 3D
Damage and
Material State
Characterization
>>1
N
75
Uncertainty Propagation and
High Dimensional Model Representation
Approach (1): Karhunen-Loeve Expansion
•
•
Address stochastic input variable reduction when number of
random variables (N) is large.
Apply Karhunen-Loeve Expansion to map random variables into
a lower-dimensional random space (N').
conductivity map with
N random variables
Coil
Crystallites
(Grains)
=2.2*106 S/m
 (x)
C (x, x)
covariance
KarhunenLoéve
Expansion
Eddy Current Example:
model
• Correlation function (covariance model)
defines random conductivity map,
• Set choice of grid length to
– achieve model convergence and
– eliminate insignificant eigenvalues
for reduced order conductivity map.
N' random variables 1... N 
N
 (x)   n n n (x)
n 1
reduced order conductivity map
with N' random variables
76
Uncertainty Propagation and
High Dimensional Model Representation
Approach (2): Analysis of Variance (ANOVA) Expansion
•
•
•
Provides surrogate to represent high dimensional set of parameters
Analogous to ANOVA decomposition in statistics
Locally represent model output through
expansion at anchor point in -space
– Requires inverse problem
– Replace random surface with
equivalent 'homogeneous' surface
(1) Identify
conductivity unique
map with
sources
N random of variance
variables
 (x)
defined by
covariance
model
C (x, x)
KarhunenLoéve
Expansion
N >> N'
N' random variables
1... N 
N
 (x)   n n n (x)
n 1
reduced order
conductivity map with
N' random variables
(2) Identify
significant
M random variables
factors
in model
1... M
ANOVA
Expansion
Z (ξ)
N' >> M
ξ  ξ
77
Uncertainty Propagation and
High Dimensional Model Representation
Approach (2): Analysis of Variance Expansion + Smolyak Sparse Grids
•
•
Significant computational expense for high-dimensional integrals
Can leverage sparse grids based on the Smolyak construction
[Smolyak, 1963; Xiu, 2010; Gao and Hesthaven, 2010]
– Provides weighted solutions at specific nodes and adds them to
reduce the amount of necessary solutions
– Sparse grid collocation provides subset of full
tensor grid for higher dimensional problems
– Approach can also be applied to gPC/PCM
Sparse Grid and Full Tensor Product Grid
1
0.5
0
-0.5
conductivity (1) Identify
unique
map with
sources
N random
variables of variance
 (x)
defined by
covariance
model
C (x, x)
KarhunenLoéve
Expansion
N >> N'
N' random variables
1... N 
N
 (x)   n n n (x)
n 1
reduced order
conductivity map with
N' random variables
(2) Identify
significant
M random variables
factors
in model
1... M
-1
-1
-0.5
0
ANOVA
Expansion
Z (ξ)
N' >> M
ξ  ξ
0.5
1
78
All Models Are Wrong
• “All models are wrong, and to suppose that inputs should
always be set to their ‘true’ values when these are
‘known’ is to invest the model with too much credibility in
practice. Treating a model more pragmatically, as having
inputs that we can ‘tweak’ empiracally, can increase is
value and predictive power” (Kennedy O’Hagan 2002)
• Eddy current liftoff is a particularly great example of this.
79
Bayesian Analysis
• Bayesian Model Averaging (BMA) – Used when experts
provide competing models for the same system.
• Bayesian calibration is the most promising technical option for
integrating experimental data and simulation in a rigorous way
that accounts for all sources of uncertainty.
• Kennedy / O’Hagan paper 2001 inspired many efforts in this
direction. BTW, rejected by Journal of the American Statistical
Association. Now published in Journal of the Royal Statistical
Society: Series B, and has been referenced 620 times. Add
that to the number of references to the unpublished technical
report that was rejected by JASA, and you get a large number.
• Many efforts ongoing in UQ community
80
Bayesian Calibration
• What uncertainty needs to be quantified to go from the
simulator to reality?
– Input
– Propagation from input to output (Hopefully done in previous
section, but notice no uncertainty is actually quantified in this
part)
– Code
– Discrepancy
81
Bayesian Calibration
• Terminology
– Model: set of equations that describes some real world
phenomena.
– Simulator: Executes the model with computer code.
– Calibration parameters: θ
– Controlled input variables: x
82
Bayesian Calibration
• Simulator: y = f(x,θ)
• Observations: observations = reality(control variables) +
ε, where ε is observation error
• Reality doesn’t depend on calibration parameters.
• Typically you see: observations = f(x,θ) + ε
• This is wrong, mainly because it doesn’t account for
uncertainty in θ, and the ε’s are not independent.
• Bayesian methods are used to learn about uncertainty in
θ.
* Paraphrasing discussion with Tony O’Hagan
83
Bayesian Calibration - literature
•
Kennedy, M. C. and O’Hagan, A., “Bayesian calibration of computer models,” J. R. Statist. Soc. B, Vol. 63,
pp. 425–464, (2001).
•
Park, I., Amarachinta, H. K., and Grandhi, R. V., “A Bayesian approach to quantification of model
uncertainty,” Reliability Engineering and System Safety, Vol 95, pp. 777-785, (2010)
84
Summary
• Hit/miss analysis – MCMC
• â vs a analysis – unchanged
• Higher order / complex models – bootstrapping
• Methods presented for putting confidence bounds on are
not elegant by any stretch of the imagination, but
incredibly robust and useful.
• Much work needs to be done via simulation to move these
methods into practice.
• UQ – Progress made on uncertainty propagation.
• UQ – Bayesian calibration techniques being investigated.
85

similar documents