### parameter estimation in large scale kinetic models of microorganisms

```The Systems Biology
Modelling Cycle
EBI-BioPreDyn Workshop
12-15 May, 2014, UK
Parameter Estimation in Large-Scale
Kinetic Models of Microorganisms
Alejandro F. Villaverde
(Bio)-Process Engineering group
IIM-CSIC
e-mail: [email protected]
What is a kinetic model? (I)
Many biological processes
are
non-stationary,
time-dependent,
dynamic.
Example: metabolism
CCM of E. coli
Chassagnole et al, Biotechnol. Bioeng. 79(1), 2002
What is a kinetic model? (II)
• Kinetic model: mathematical model of a dynamic system
• Include mathematical expressions of the rates at which the
biochemical reactions take place
•  equations describe fluxes as a function of
concentrations
Example: kinetic model of E. coli’s CCM
Mass balance equations:
Example: kinetic model of E. coli’s CCM
(in COPASI)
Why use kinetic models?
• Think of an example application:
industrial fermentation process
• We would like to understand (and
ideally improve), how a particular
metabolite is produced in a bioreactor
• Dynamic process: different events can
affect the outcome
“Genome-scale kinetic models of metabolism are important for rational design
of the metabolic engineering required for industrial biotechnology
applications. They allow one to predict the alterations needed to optimize the
flux or yield of the compounds of interest, while keeping the other functions of
the host organism to a minimal, but essential, level.”
Large-scale metabolic models: From reconstruction to differential equations
K Smallbone, P Mendes. Industrial Biotechnology 2013, 9: 179–184
Kinetic models vs. GEMs
•
•
•
•
GEMs = “GEnome-scale Metabolic models”
GEMs focus on stoichiometry, not dynamics
GEMs include a large set of reactions, without kinetic detail
Constraint-based methods (FBA…) use GEMs to calculate
steady-state fluxes [GEMs are also called constraint-based
models]
• However, GEMs cannot predict how behavior emerges from
dynamic concentration changes of cellular components
•  to do this kinetic models are needed
Kinetic models from GEMs
• It’s possible to start from a constraintbased model to build a kinetic model
• Procedure:
2. Add generic rate laws (linlog,
Michaelis-Menten-like kinetics)
3. Estimate unknown kinetic constants
• Smallbone & Mendes presented a
pipeline for creating thermodynamically
consistent kinetic models, using limited
data and ensuring consistency with
known data and kinetic constants
Large-scale metabolic models: From reconstruction to differential equations
K Smallbone, P Mendes. Industrial Biotechnology 2013, 9: 179–184
What is a “large-scale” kinetic model?
• Large-scale models have (at least):
– dozens of reactions and species
– hundreds of parameters
• Example: E. coli’s CCM model
– 18 species (= dynamic states)
– 30 reactions
– 139 parameters
Which models of microorganisms
exist, and where to find them?
• Several LS kinetic models of microorganisms have been built,
mostly for E. coli and S. cerevisiae
• Talk by P. Mendes on Thursday:
“Large-scale kinetic models of E. coli and yeast ”
• Model building takes time and resources.
• Are there (LS) kinetic models available?
• Yes! See databases, e.g.:
• Biomodels http://www.ebi.ac.uk/biomodels-main/
• CellML http://models.cellml.org/cellml
(although most of these models are not really LS)
BioPreDyn-bench
• Collection of benchmark problems for PE in LS models
• Includes:
– Yeast, metabolic
– 2 x E. coli (metabolic, metab. + transcr. regul.)
– CHO, metabolic
– D. melanogaster, development
– Generic signaling network
• Available at the web (very soon!):
http://www.iim.csic.es/~gingproc/biopredynbench/index.html
• Matlab, AMIGO, Copasi, C, SBML
BioPreDyn-bench
So why are kinetic models not
widely used (yet)?
• Kinetic models: very useful, but… still an exception in
biotech applications
• Problem: incomplete knowledge of
– Regulatory interactions
– Kinetic parameters
• This leads to limited accuracy of predictions
•  parameter estimation (PE) is one of the ways of
How to build a kinetic model?
Model building steps:
1. Define the purpose of the model
2. Establish the network structure
(“wiring diagram”) of the model
3. Determine kinetic rate expressions
4. Model structure =
network structure + kinetics
5. Determine the parameters
6. Validate the model
“Kinetic models in industrial biotechnology – Improving cell factory performance”
J Almquist, M Cvijovic, V Hatzimanikatis, J Nielsen, M Jirstrand. Metabolic Engineering 2014
Parameter determination
• Parameter values are sometimes established one
by one, either from targeted experiments
measuring them directly or from other types of a
priori information on individual parameter
values.
• In contrast, parameter values can also be
determined simultaneously using parameter
estimation methods (PE)
• Parameter estimation as an optimization problem
(previous talk by Eva Balsa Canto)
Parameter estimation
Overview of PE methods
• Local vs. Global:
– Local methods converge to the closest optimum
– When several optima exist, global optimization
methods (GO) must be used
• Deterministic vs. Stochastic:
– Deterministic GO methods guarantee that the
solution is the global optimum, but the computational
effort is very high
– Stochastic GO methods do not guarantee the global
optimality of the solution, but they are frequently
capable of finding excellent solutions in reasonable
computation times
Parameter estimation: Optimization methods
LOCAL NLP
solvers
Converge to the closest optimum
to the initial guess.
May end up in local solutions.
GLOBAL NLP
solvers
Metaheuristics
 Heuristic: procedure based on expert knowledge, not on
formal analysis
 Metaheuristic: general-purpose heuristic method
designed to guide an underlying problem-specific heuristic
 A metaheuristic is therefore a general algorithmic
framework which can be applied to different optimization
problems with relative few modifications
Metaheuristic approaches are a particularly efficient class of
stochastic GO methods. They combine mechanisms for
exploring the search space and exploiting the obtained
knowledge
PE in LS kinetic models in biology
• The difficult problem of PE of LS kinetic models
–
–
–
–
–
–
–
Nonlinear systems
Multi-modal problems (several local minima)
Need of time-series data (usually scarce)
Lack of identifiability
Overfitting
Aligning the model with the data…
Computational issues (integrators, tolerances, …).
Different timescales: Stiffness
– CPU times can be very large (days, weeks…)
–  Stochastic (or hybrid) GO methods (metaheuristics)
Some classic, stochastic, natureinspired GO methods
• Genetic Algorithms A population of candidate solutions is
evolved toward better solutions. Each candidate solution has
a set of properties (chromosomes) which can be mutated
• Swarm intelligence: Ant Colony Optimization, Particle
Swarm… mimic the movement of agents in a swarm
• Simulated Annealing mimics the annealing process in
metallurgy: slow cooling of a material to produce crystals
(temperature = probability of accepting worse solutions)
• Etc etc …
Some classic, stochastic, natureinspired GO methods
doi: 10.5923/j.eee.20120204.09
Some classic, stochastic, natureinspired GO methods
PE methods: the eSS family
Scatter Search (SS): population-based metaheuristic (Glover 1977).
Main differences with GA:
• SS orients its exploration systematically, relative to a set of
reference points (RefSet). This allows to exploit the information
gathered by each solution.
• Besides, SS includes the Improvement Method (local search )
Five-method template:
1. Diversification Generation Method:
2. Improvement Method
3. Reference Set Update Method
4. Subset Generation Method
5. Solution Combination Method
PE methods: eSS
1.
2.
3.
4.
5.
Diversification Generation
Method: generates
solutions
Improvement Method:
enhances solutions
RefSet Update Method:
selects a ref. set of
solutions (according to
quality or diversity)
Subset Generation
Method: produces
subsets of solutions from
the RefSet
Solution Combination
Method
“Scatter search for chemical and bio-process optimization”
JA Egea, M Rodríguez-Fernández, JR Banga, R Martí. J Glob Optim (2007) 37:481–503
PE methods: the eSS family
Enhanced Scatter Search (eSS):
• Advanced implementation of the SS metaheuristics
• Combines SS with local methods (hybrid methodology),
to accelerate convergence to the optimum
• Includes several improvements of the original method
• Developed for parameter estimation in LS biological
problems
Egea JA, Martí R, Banga JR: An evolutionary method for complex-process optimization. Computers and
Operations Research 2010, 37(2):315–324.
eSS
PE methods: the eSS family –
extensions and implementations
CeSS: parallel cooperative version of eSS
• SSmGO toolbox: eSS in Matlab
http://www.iim.csic.es/~gingproc/ssmGO.html
• AMIGO: includes eSS, in Matlab
http://www.iim.csic.es/~amigo/
• MEIGO: includes eSS & CeSS in Matlab & R (& Python
interface to R) http://www.iim.csic.es/~gingproc/meigo.html
• COPASI also includes SS in its latest release
• SS implementation in C presented at this workshop (poster)
Example: PE of a LS kinetic model (I)
•
•
•
•
•
LS kinetic model of yeast (UNIMAN)
Largest model included in BioPreDyn-bench (B1)
1759 parameters, 285 reactions, 276 species
Implementation—difficulties
Numerical problems: integration errors (COPASI—
LSODA, MATLAB—CVODES)
Example: PE of a LS kinetic model (II)
• Ready-to-run implementations in AMIGO and
COPASI
• PE settings:
– Parameter bounds: [0.2×nominal, 5×nominal]
– In AMIGO: eSS + DHC
– Max. Time allowed = 1 week
• Results: see next slides
Example: PE of a LS kinetic model (III)
FITS
Example: PE of a LS kinetic model (IV)
Convergence
curve
Example: PE of a LS kinetic model (V)
Final remarks
• Kinetic modeling: adequate modeling framework for
dynamic systems
• LS kinetic models not widely used in systems biology
yet, due to uncertainties, which limit applicability
• Parameter estimation is necessary to address this
issue
• PE in LS kinetic models is problematic (and costly)
• Stochastic or hybrid GO methods are preferred
• Tomorrow, 10:30h: practical session on PE and OED
Recommended recent bibliography:
•
“Kinetic models in industrial biotechnology – Improving cell factory performance”
J Almquist, M Cvijovic, V Hatzimanikatis, J Nielsen, M Jirstrand. Metabolic Engineering 2014
•
“Advancing metabolic models with kinetic information”
H Link, D Christodoulou, U Sauer. Current Opinion in Biotechnology 2014, 29:8–14
•
“Modeling metabolic systems: the need for dynamics”
H-S Song, F DeVilbiss, D Ramkrishna. Current Opinion in Chemical Engineering 2013, 2:373–382
On kinetic
models
•
“Yeast 5–an expanded reconstruction of the saccharomyces cerevisiae metabolic network”
BD Heavner, K Smallbone, B Barker, P Mendes, LP Walker. BMC Systems Biology 2012, 6: 55.
•
“Large-scale metabolic models: From reconstruction to differential equations”
K Smallbone, P Mendes. Industrial Biotechnology 2013, 9: 179–184
•
“BioPreDyn-bench: a suite of benchmark problems for dynamic modelling in systems biology”
AF Villaverde, D Henriques, K Smallbone, S Bongard et al. in preparation
Yeast model
(and others)
•
“An evolutionary method for complex-process optimization”
JA Egea, R Martí, JR Banga. Computers and Operations Research 2010, 37(2):315–324
•
“A cooperative strategy for parameter estimation in large scale systems biology models”.
AF Villaverde, JA Egea, JR Banga. BMC Systems Biology 2012, 6: 75
•
“MEIGO: an open-source software suite based on metaheuristics for global optimization in systems
biology and bioinformatics”
JA Egea, D Henriques, T Cokelaer, AF Villaverde et al. BMC Bioinformatics 2014 arXiv:1311.5735
PE methods