Model Risk sources and examples

Model Risk –
sources and some examples
Tony Bellotti
Department of Mathematics
Imperial College London
Model development
A highly simplified model development framework:-
In this framework, once the model is developed, we then think of it
as correct.
However, the model is only an approximation to reality.
Thinking about model risk
Do you factor in the uncertainty of your model when you use it?
• Firstly, we need to understand the sources of model risk and how to
measure those risks.
• Secondly, the consequences of using the model needs to be assessed
in light of the model risks, prior to use.
Does model risk matter?
But… does model risk really matter?
Does it make a substantial difference in the real world?
“The reliance on models to handle risk carries its own risk” *
In securities markets, where complex pricing models are used, there is such
a thing as model arbitrage, where a trader will take advantage of known
errors in model structure or implementation to make money.
So there is a genuine effect. *
If this happens in retail credit, perhaps it could lead to adverse
selection (eg pricing a loan below the true risk level of the borrower).
* Emanuel Derman (1996), Model Risk, Goldman Sachs Quantitative
Strategies Research Notes
What about model risk in retail credit?
But retail credit employs relatively simple models, so perhaps there is no
• But model complexity is not the only source of model risk (although it is
an important one for pricing models).
• In the following slides I will consider several possible sources of model
• Note: This is not an exhaustive list and also there is some
overlap between the various categories.
• Later, I give some examples from retail finance to illustrate when there
could be model risk issues.
Sources of model risk
Statistical:»Model misspecification
»Model efficiency/inefficiency
»Data problems and selection bias
»Robustness over time
»Inappropriate use
Other/management:»Model development resources (analysts/time)
»Publication, implementation and software error
We consider only the statistical sources of model risk.
Model misspecification (1)
• Model structure
» Do we have the correct general model structure to model the
» In the past, it was common to use OLS. Now it is standard to use
logistic regression. Perhaps now we can ask if logit is the correct
link function?
» Is the basic linear scorecard correct? Is a nonlinear structure
more appropriate?
• Model assumptions: what are they and are we breaking them?
» Distributions on error terms (eg normality for OLS).
» Independence for observations in standard logistic regression. Is
this really true in retail credit?
Model misspecification (2)
• Inclusion of variables.
» Too few variables may lead to biassed estimates.
» Too many will lead to less efficient estimates and, hence, less
robust models.
• Variable transformations (to log or not to log?).
» With some variables like income, it is “standard” to take log.
» What about others? Age, eg?
» Some modellers use all weights-of-evidence – is this appropriate?
• Multicollinearity.
» Where predictor variables are themselves highly correlated, this
can lead to inefficient or wrong estimates (in particular, it can lead
to the wrong sign).
Model efficiency/inefficiency
Every model is inaccurate and every estimate is just that: an estimate.
Fortunately, most statistical models provide a measure of the accuracy of
estimates (ie the standard errors).
» This is not true of all models (eg standard linear discriminant
analysis and machine learning algorithms) – although it’s always
possible to bootstrap.
» Remember though that the accuracy of the standard errors
themselves can be suspect and is dependent on following model
assumptions (or relying on model robustness).
Data problems and selection bias
• Is the data appropriate for the modelling task?
» Reliability in data collection; eg how reliable is a self-assessment
of income?
» Or, eg, based on an existing portfolio of predominantly older
customers, build a model for a card targeting young customers.
» A data set of accepted loan applications, to build a scorecard
across all new applications.
• Of course, the last example is the problem of selection bias.
» It is a fairly well understood model risk issue in retail credit.
» Several reject inference techniques to handle it: eg parcelling
and augmentation.
Robustness over time (1)
• There are some problem domains where risk factors and
distributions on variables are stable over time.
• In such domains, models remain stable.
» For example, mortality scoring models based on physiology of
hospital in-patients (eg Apache III) are stable since human
physiology does not change much over time.
• However, consumer credit does not remain stable over time.
» Credit risk changes over the business cycle.
» Credit usage behaviour changes over time.
» Banks’ risk appetite changes over time.
» Innovations in technology and product development change risk.
• All of these time-varying factors affect the applicability of credit risk
models over time.
Robustness over time (2)
• Changes in the effect size of risk factors will have an obvious effect on
the applicability of a model.
• Population drift: Changes in the distribution of predictor or outcome
variables can also affect the robustness of the model.
• Slow versus sudden change (eg economic crisis) can have different
effects on the applicability of a model.
• Possible approaches to dealing with this problem:» Rebuild models regularly and Champion/challenger environment.
» Dynamic models (ie including time-varying factors in the risk
» Adaptive models.
Model robustness, in general
• The problem of model robustness over time generalizes to different
eg geographic or product type.
• For example, if we have a credit card product operating in UK, does
the same scorecard model apply to Ireland?
»How different will it be?
Inappropriate use
“In terms of risk control, you’re worse off thinking you have a model and
relying on it than in simply realizing there isn’t one.” *
A model may be built correctly.
However, it may be used for the wrong task.
For example, using a default model as the basis of a strategy on
customer retention…. Better to build a new model focussed on
* Emanuel Derman (1996), Model Risk, Goldman Sachs Quantitative
Strategies Research Notes
Consequences of model risk (1)
What are the consequences of model risk?
Need to measure the effect of model risk on model use :(1) Explanatory model
• If it is important that the model is used as an explanatory model,
then bias and inefficiency in model estimation will be important.
• Eg for discussion with management and regulators.
(2) Forecasting
• Individual / account level;
• Aggregate / loss forecasting;
• Does the flat maximum effect provide some robustness against
model bias and inefficiency?
Consequences of model risk (2)
(3) Stress testing
• Predictions of outcome for extreme values.
• Typically, value-at-risk, expected shortfall, or scenarios.
• Effects of model risk on stress testing are likely to be different to
the effect on standard forecasts.
I now give some quick examples of model risk, looking at usage,
measurement issues and consequences….
Example 1: Misspecification / Misapplication
Performance of models for extreme cases *
Models work well at estimating expected values for “typical” cases from
the population.
However, how do they fare when predicting default rates (DR) for extreme
• In this experiment, a logistic regression model is built for credit card
• DR is then predicted for an independent test set of extreme cases (with
respect to variables such as age and job) and compared with observed
* Work conducted by Alice Wang as part of her third year undergraduate
Example 1: Results
Full model
Years in
current job>24
Income (log) >
Years in
residence > 41
• We see that
these models
tend to under- or
over- estimate
DR for extreme
• Interestingly, the
model gives
better forecast
• Note: all extreme
criteria represent
2% of the test
data (N=600).
Example 2: Selection bias
Simulation study
• The problem of selection bias in application models is well known and
several reject inference methods have been proposed.
• Unfortunately, in a real world context it is not usually possible to
accurately evaluate the extent of the bias, or the effectiveness of a
reject inference method, since outcomes for rejects are unknown.
• However, simulation studies can be used to show the effect. These are
valuable to demonstrate the extent of the problem.
Here is the result of a simulation study using an augmentation method.
• In a nutshell, augmentation is a method that weights observations
from the accepts; usually according to how typical they are of being
accepts, based on an Accept-Reject model.
Example 2: Results
1. Suppose we simulate 25,000 applications with two variables: income
(1 ) and number of previous delinquencies (2 ) and outcome:
2. Reject 40% of applications using a scorecard.
3. Build an unbiassed model S1 on all applications:
» Score = -2.05 + 1.471 -0.642
» (remember, in the real world we could not build S1 since we do
not have outcomes for rejects)
4. Now build a biassed model S2 based on just the 60% accepted
cases:» Score = -2.08 +1.431 -0.322
Notice the difference in coefficient estimate on 2 .
Why does this happen?
Example 2 continued
All applications
Accepted applications
This graph shows the distribution is not the same for the accepted
population, compared to all.
Those with high numbers of delinquencies are under-represented.
This effects the model estimation.
Example 2 continued
A model using augmentation S3 uses only the sample of accepts like
S2, but weights observations with high delinquency more heavily in
the accepted sample.
Hence model estimation is closer to the unbiassed model:
• Score = -2.05 + 1.591 -0.462
The new model also gives better results on an independent test set:-
S1 (unbiassed)
S2 (biassed)
S3 (augmentation)
One lesson here is that simulation studies are of value to give insight
into aspects of model risk that are not immediately measureable in the
real-world setting.
Example 3: Model estimation error
Incorporating model estimation error in loss forecasts
Take the log-odds score  from a scorecard to build a univariate logistic
regression model.
• Of course, the coefficient estimate on  is  = 1 (approximately).
• However, there is a standard error  > 0 which allows us to
construct a CI for : 1 −  2 , 1 +  2  .
What consequences does this have in a real example?
Experiments with 50,000 credit cards where default rate=0.2:  = 0.0116.
• This has a small and modest effect on estimates of PD:
If PD estimate with  = 1 is 0.2, then 99%CI gives (0.193,0.207).
Example 3 continued
Effect on expected loss EL=PD x LGD x EAD:
However, if we look at Value-at-risk (VaR) of EL, then the small variation
in model, has a bigger impact.
Using Monte Carlo simulation of EL, either (A) with fixed coefficient
 = 1 , or (B) generated values of ~ 1,  :
• At the 99% level, VaR for simulation study (B) is 4% higher than for
study (A).
Based on Bellotti (2011), A simulation study of Basel II expected loss distributions
for a portfolio of credit cards. Journal of Financial Services Marketing
Example 4: Misspecification
Using Logit versus Poisson link function
In the context of large defaultable bond portfolios, Lucas and Verhoef*
experiment with Logit and Poisson link function.
• Note: there is a good rationale for using a Poisson link function since
default time can be modelled as a Poisson process.
How do the models perform in estimating expected loss?
* Lucas A and Verhoef B (2012), Aggregating Credit and Market Risk: the Impact
of Model Specification, working paper, Tinbergen Institute, VU University
Example 4 continued
For two segments, they report these results:Low quality
Medium quality
Expected Loss
VaR (99.9%)
• Hardly any model misspecification problem for Expected Loss
• But, importantly, for VaR, Logit underestimates (relative to Poisson).
“model specification matters … This is surprising, as the shape of the link
function is deemed to be less important for computing capital
requirements.” *
Example 5: Robustness over time
Use of time-varying risk factors for loss forecasting
One approach to dealing with changing risk levels over time is to include
macroeconomic time series.
Survival models are a good way to do this since macroeconomic and
behavioural data can be included as time-varying covariates (TVCs).
• Model time to default as a failure event.
Experiment on portfolio of UK credit card data: *
»Training data: 400,000 credit cards over period 1999 to 2004.
»Forecast for 150,000 credit cards from 2005 to mid-2006.
* Bellotti and Crook (2009), Forecasting and stress testing credit card default using
dynamic models, working paper, Credit Research Centre, Edinburgh
Example 5: Results
Inclusion of interest rate and unemployment rate are statistically significant.
We compare default rate (DR) forecasts between models with application
variables (AV) only (eg age, income, employment status, housing status, at
application), behavioural variables (BV) and macroeconomic variables
MAD = mean absolute difference
between estimated and observed
This shows an improvement in aggregate forecasts when macroeconomic
data is included in the model.
• There is a genuine problem of model risk.
 We have seen some suggestive examples.
• We need to understand the sources of model risk.
• We need to know the consequences of model risk and how to
measure it.
• We need to find ways to manage model risk:
 Develop methods to reduce or control it, and
 Incorporate model risk in our decision making.

similar documents