### Resolution and Refinement

```Resolution: Implications in
Refinement
Swanand Gore & Gerard Kleywegt
May 6th 2010, 12-1 pm
Macromolecular Crystallography Course
Outline
• Intuitive idea of resolution – why higher order diffraction is
better.
• Parameters, model, observations, refinement – more data
is better.
• Observations, parameters, over-fitting in crystallographic
refinement.
• Features that can be modeled at various resolutions.
• Refinement practices at low and high resolution.
Idealized diffraction in 1D
h3
h2
h1
-h1
-h2
-h3
•
Images scanned from David Blow’s book
Idealized diffraction in 1D
h3
h2
Assuming:
•B = 0
•Occupancy = 1
•Uniform scattering power in all directions.
•Phase angles = 0
h1
-h1
-h2
Increasing
resolution
-h3
•
Images scanned from David Blow’s book
Idealized diffraction in 1D
•
•
•
•
•
•
Higher order diffraction
Higher Fourier coefficients
Higher frequency wave in real space
Sharper signal
Greater resolution
Images scanned from David Blow’s book
What separation can be resolved?
•
Nominal resolution
O
O
– The h-th order diffracted wave samples the
lattice at interval of a/h.
– a/h is the crystallographic resolution which
is routinely quoted.
– In tetragonal cell abc, diffraction hkl comes
from planes separated by
•
√[ (a/h)2 + (b/k)2 + (c/l)2 ]
– For tetragonal cell 100, 95, 90, and highest
order diffraction 50, 52, 48, resolution is
~3.29.
•
•
For non-orthogonal axes, corrections apply.
Resolution intuitively means the least
distance between objects below which
they cannot be distinguished apart.
– For 3D crystallography, it is ~0.92*dmin,
almost same as nominal.
•
•
Images from B. Rupp’s book
Image from Gerard’s ppt.
blob
Atomic scatterers in 1D
Resolution
Filter
Peaks get sharper as
higher resolution
Fourier coefficients
are included.
Fourier
Coefficients
& Phases
C
•
Images from B. Rupp’s book
O
C
O
C
O
Occupancy and B factors
due to larger B
factors and shorter
due to lower
occupancy.
•
Images from B. Rupp’s book
Data truncation
Happens naturally due to B factors.
Truncated data leads to incomplete reverse FT, causes ripples.
Ripples around heavy atoms can ‘drown’ nearby lighter atoms.
Ripples can seem to originate from real atoms.
•
Images from B. Rupp’s book
N’s at 0.5 occupancy?
O at 0.5 occupancy?
Diffracting duck in 2D
•
•
•
Leaving out higher order diffraction data will reduce the
detail retrieved through reverse transform.
Leaving out lower resolution data will blur the
boundaries.
Randomly absent data is not too problematic for maps.
–
•
Images from Kevin Cowtan’s website.
Doesn’t matter if Rfree set is / not used in map calc?
Make everything as simple as possible,
but not simpler…..
Fh exp(i φh) =V Σ fi oi exp(2πi h.x) exp (-Bi sin2θ/λ2)
Noise.
Errors in data collection.
Static, dynamic disorder.
Estimate phases.
Model xyz, B, o.
Model solvent.
…..
ρx = 1/V Σ Fh exp (-2πi h.x + iφh)
•
Images from B. Rupp’s book
Choosing resolution cutoff
•
B factors and scattering factors impose a natural
cutoff on what can be observed.
•
Reliability of measurement is indicated by S/N ratio
and completeness
•
Signal to noise ratio
–
–
•
Completeness
–
–
–
–
•
Images from B. Rupp’s book
Low completeness in highest resolution shell does not
confer a level of detail to the map as implied by nominal
resolution
Effective resolution = dmin . C-1/3
Randomly or systematically missing data creates
undesirable effects in reverse FT.
Completeness > 0.95
Number of reflections increases as cube of nominal
resolution.
–
–
•
A low SNR does not matter too much if proper maximum
likelihood target is used to weigh in error estimates.
High <I/σ(I)> matters when collecting data for phasing.
2/3z π VUC / dmin3
Not unique due to centro-symmetry and spacegroup
symmetry
Model and refinement
•
•
Model is defined as a set of parameters and a set
of functions over parameters, designed to explain
observations
Observations
Refinement
– Is an algorithmic process of fitting a model to explain
observations, by assigning optimal values to
parameters.
– Reduces the differences between observations and
model-calculated values of observations
•
A linear model in 2D consists of 2 parameters
– Y = mX + c
•
Some models are more accurate than others,
depending on quality of refinement.
•
Refinement is necessary when observations
contain errors and there are enough observations
to refine the parameters.
Well-refined
model
1
c
m
Ill-refined
model
Model and refinement
•
A linear model in 2D
–
–
consists of 2 parameters : Y = mX + c
1 observation, howsoever accurate, is not sufficient if
model has 2 parameters
•
•
–
2 distinct accurate observations are sufficient to determine
the linear model
–
–
3 accurate observations over-determine the model
But observations generally contain random error! Greater
number of observations lead to error cancellation and
more accurate model
•
•
•
Fitting to error too!
Quality of modelling
–
–
•
Well-determined model
Model with too few params can lead to under-fitting
Model with too many params can lead to over-fitting
–
•
Under-determined, over-fitted model
Many models can be imagined
Choice of model (linear, quadratic, higher polynomial?)
Quality of refinement (R value)
Images from B. Rupp’s book
M1
Model and
refinement
•
M2
In presence of errors, refinement quality
does not indicate model quality
– Well-refined model is of bad quality if it was
fitted to erroneous observations.
MC
•
Hence, observations not subject to
refinement are required to assess the
accuracy.
–
–
–
–
•
R and “free” R
M1: 0.2, 0.3
M2: 0.2, 0.4
M1 better than M2
Free R and data/param ratio helps in
comparing models with different number
of parameters
– MA: 0.2, 0.3. d/p = 15/2 = 7.5
•
Under-fit
– MB: 0.1, 0.2. d/p = 15/3 = 5
•
optimal
– MC: 0.01, 0.25. d/p = 15/10 = 1.5
•
•
Overfitting = Low d/p, high Rfree
Images from B. Rupp’s book
MA
MB
Occham’s valley
A crystallographic model
•
Biochemical entities
– Biopolymers
•
polypeptides, polynucleotides, carbohydrates
– Small-molecule ligands (ions, organic)
•
•
•
Physiologically relevant, e.g. heme, ions
Synthesized molecules, e.g. a drug candidate
– Solvent
•
Coordinates, Displacement
–
–
–
–
•
Unique x,y,z
Partial, multiple, absent (occupancy)
Isotropic or anisotropic B factors
TLS approximation
Crystallographic etc.
– Cell, symmetry, NCS
– Bulk solvent correction (Ksol, Bsol)
•
•
•
•
http://www.cgl.ucsf.edu/chimera/feature_highlights/ellipsoids.png
B factor putty from Antonyuk et al. 10.1073/pnas.0809170106
www.ruppweb.org/xray/tutorial/Crystal_sym.htm
Quick note on NCS, TLS
•
Non-crystallograpic symmetry
–
–
–
–
–
•
Translation-libration-screw
–
–
–
–
–
–
•
Molecule/s -> ASU -> locally-related ASUs -> Unitcell -> Crystal
Sometimes ASU can consist of multiple, nearly identical subunits.
The transformation operator between subunits is local and distinct from
space-group operators.
Subunits need not be identical because they are in different
environments, differences do not indicate problems!
This additional symmetry can be used in refinement (restraints,
constraints) and validation.
Overall anisotropy = lattice disorder + inter-molecular motions + intramolecular rigid body motions within molecule + atomic anisotropy
Paradigm shift from atom-level anisotropy modelling to anisotropic
movements of rigid bodies
1d: a point (3) through which rotation axis (2) will pass + ratio (1) of
rotation to translation on that axis = 6
2d: 2 points + 2 ratios + 2 orthogonal axes (3) + 2 more ratios = 13
3d: 3 points + 3 ratios + 3 orthogonal axes (3) + 6 more screws = 20(ish)
TLS group granularity can range from full domain to sidechain
Images from Rupp book and Martyn Winn ppt
Counting parameters
•
Average-case parameters
– Per atom 4 params
•
•
3 params for coordinates
1 param for isotropic B factor
– No hydrogens, 1 water per residue
– 8 atoms per residue
– N * 8 * 4 = 32 N
•
Increasing the parameters
–
–
–
–
•
6 params per atom for anisotropic B factor (>2x)
Refining occupancy (1.25x) or multiple occupancy
Hydrogens modeled explicitly (8 per residue) (2x)
Multiple models (M x)
Reducing parameters
– 20 params per TLS group
•
•
5 groups: 20 * 5 groups of 40 res each = 100
=> 32 * 200 to 100 (1/64 x for 200 res protein)
– Strict NCS (1/n x for n-fold)
Restraint counts taken from: http://ccp4wiki.org/~ccp4wiki/wiki/images/9/9f/Winn_prague09_data_parameters.pdf
1clm, calmodulin, 1.8Å
1132 protein atoms + 4 Ca + 71 waters
= 4828 xyzB
#unique reflections = 10610
Data / params = 2.2
1exr, calmodulin, 1Å
1467 protein atoms with alt conf + 5
Ca + 178 waters
9900 anisotropic B + 316 occupancy
= 15166 params
#unique reflections = 77150
Data / params = 4.6
1h6v , 3Å
6 TLS groups = 120 params
22514 protein atoms + 552 ligand
atoms + 9 waters
xyzB = 92300 (residual)
#unique reflections = 69328 (5% free)
d/p = 69328/92300 = 0.7
Data to parameters ratio
•
r = (number of unique reflections) / (number
of parameters)
– Graph for a calmodulin 1up5, ~2500 atoms, xyzB
– r < 1, i.e. under-determined for dmin < 2.5Å
– Reflections-based refinement is possible only for r
> 10, i.e. resolution approaching 1Å!
– But most PDB entries have r ~ 2-5
•
There must be more observations provided to
refinement than only the reflections
– Reflections = observations specific to a particular
MX experiment
– But there are other more general observations
applicable to any MX refinement
– Covalent geometry, steric clashes, ….
•
Image from B. Rupp’s book
Observations to parameters ratio
• Observations = reflections +
constraints and restraints based
on well-known features of
macromolecules
• o/p > d/p
• Bonds, angles, planarity, chirality…
– o/p = (22732 + 77150) / 15166 =
6.1 > 4.6 = d/p
Hangman
Bungee jumper
RElaxation = REstraint CONvict = CONstraint
Energy
– Tricky to estimate the difference
due to dependences, but generally
sufficient to make refinement
possible
– 1exr: 1Å, 22732 restraints
length
•
•
Images from Gerard’s slides
Restraint counts taken from: http://ccp4wiki.org/~ccp4wiki/wiki/images/9/9f/Winn_prague09_data_parameters.pdf
Observations to parameters ratio
• o/p > d/p for 1h6v at 3Å
– Restraints (including NCS) = 209378
– o/p = (209378+69328)/92300 = 3
– d/p = 0.7 < 3 = o/p
• 2 components of refinement residuals
– Data-based
• Changes model (xyzB..) to reduce Fo ~ Fc
– Knowledge-based
• Changes model (xyz) to take values of geometric features towards
idealized values
– Qtot = wx Qx + Qgeom
– Small wx : greater stress on geometric correctness
• Low resolution, low d/p
– Large wx : model deviation from ideal geometry
• High resolution, high d/p
•
Restraint counts taken from: http://ccp4wiki.org/~ccp4wiki/wiki/images/9/9f/Winn_prague09_data_parameters.pdf
Greater d/p => more detail
(given decent phases)
0.95Å
•
•
Image from http://www.crystal.uwa.edu.au/px/alice/projects/SCOA_atomic.html, 1mxt
Images from Rupp’s book
Lower d/p => lower detail
decent phases often not available
2g34, 5Å
•
Pics of 2g34, 1z56 with coot using EDS maps
1z56, 3.9Å
Lower d/p => lower detail
2bf1, 4Å
•
Pics of 2bf1 with coot using EDS maps
All resolutions not equal…
•
From Gerard’s slides and Phil Evans
Levels of detail interpretable
at various resolutions
Protein Feature
Resolution (Å)
Nucleic Acid Feature
Resolution (Å)
Helix
9
Double helix
20
Sheet
4
Single strand
12
Main chain
3.7
Stacked base pairs
4
Aromatic sidechains
3.5
Phosphates
3.5
Small sidechains
3.2
Purine or pyrimidine?
3.2
Sidechain conformations
2.9
Individual bases
2.7
Carbonyl, peptide
2.7
Ribose pucker
2.4
Ordered waters
2.7
Individual atoms
1.5
Central dimple of aromatic ring
2.4
Correct stereochemistry at Ile CB
2.2
Proline pucker
2.0
Individual atoms
1.5
Orbitals and bonds (beyond 1Å)!
•
From David Blow’s book
Rules of thumb at all resolutions for
model-building and refinement
• Be very conservative till a majority of backbone is identified and produces
stable refinement
• Prioritize: Backbone > side-chains > small-mols > waters
• Be aware of prevalent modeling practices at your resolution
• Whole model contributes to quality of region of interest.
• Use similar structures for comparison and copying.
• Use quality criteria often.
Low resolution refinement
•
Low resolution structures offer great biological insights.
– Mainly for complexes e.g. 70S ribosome at 7Å, SIV gp120
envelope glycoprotein at 4Å
– Large complexes generally diffract to lower resolution.
•
•
Components may have physiologically relevant conformations only
in complexed states.
High impact
– In absence of better resolution, low resolution data must be
used.
– Low resolution does not have to mean low quality!
•
Basic guidelines for model building and refinement.
–
–
–
–
•
Low d/p => Be cautious of biasing the model
Make extensive use of information in addition to reflections
Use as few parameters as possible
Increase params only when confident
Images from Karmali et al. Acta Cryst. 2009.
Low resolution refinement
• Build model with fewer parameters
– Mainchain-only model
– Constrain B factor values to be isotropic and
constant.
– Full occupancies only.
– TLS to model anisotropic motions of rigid
domains.
– Strictly constrained or restrained NCS to reduce
params many-fold
– No waters or small molecules, use only ‘bulk
solvent’
Low resolution
refinement
• Model cautiously
– Initial tracing
• Build regions that are likely to be
seen clearly
• Good packing, low B factors, bulky
group, electron-rich groups
• core, mainchain, helices, big
sidechains, bases, phosphates
– Sequence registry
• Beware of register and topology
errors
• Guess sequence register from bulky
sidechains
• Extend the register by trial and error
• Check sequence register with a
homologous structure
• Truncate to Gly wherever unsure of
residue identity
•
From Gerard’s slides
Low resolution refinement
– Try copying fragments from other high resolution
structures when there is clear homology
– Treat ligands extra-carefully
• Copy high-quality observed conformation or
predicted low energy conformation
• Restrain tightly unless there is density and other
clues to deviate
•
Axel T. Brunger et al. 2009. Acta Cryst D 65 128–133 X-ray structure determination at low resolution.
Low resolution refinement
density modification tools
•
Expected solvent density
– define solvent boundary
– followed by solvent flattening / flipping, histogram matching
•
•
•
Images B. Rupp’s book and from Acta Cryst. (2003). D59, 1881-1890. The phase problem. G. Taylor
Brunger 2006, Low resolution crystallography. Acta Cryst.
https://wasatch.biochem.utah.edu/chris/tutorial/Density_Modification.pdf a ppt on DM
Low resolution refinement
density modification
• Averaging maps of NCS-restrained copies
•
•
•
Image from B. Rupp’s Brook.
unger 2006, Low resolution crystallography. Acta Cryst.
https://wasatch.biochem.utah.edu/chris/tutorial/Density_Modification.pdf a ppt on DM
Low resolution refinement
density modification
• B-factor sharpening
– High-resolution reflections get attenuated most by B factors
– Application of negative B factors can artificially up-weigh high-res
terms to obtain greater detailed but possibly noisier map
•
•
Brunger 2006, Low resolution crystallography. Acta Cryst.
https://wasatch.biochem.utah.edu/chris/tutorial/Density_Modification.pdf a ppt on DM
Low resolution refinement
• Refinement techniques
– Rigid body refinement
• A fragment is constrained to be internally rigid, has
only 6 degrees of freedom
• B factor is isotropic and constant
• Powerful first step of refinement needing only low
resolution data
• Arbitrary rigid fragments (high quality helices, highresolution domain structures) can be optimized for
location and orientation relative to each other to
yield better phases and maps
– Torsion angle refinement
• Bonds, angles, chirality, planarity not variables, only
torsion angles are refined
• Protein is divided into rigid subgroups to sample
thoroughly a limited conformational space
• Higher radius of convergence, reduced overfitting
•
•
•
Image from Schwieters, C.D. & Clore, G.M. (2001) Internal coordinates for molecular dynamics and minimization in structure determination and refinement. J. Magn. Reson. 152, 288-302
Nice tutorial at http://speedy.st-and.ac.uk/~naismith/workshop/torsion.pdf
See Axel Brunger’s papers on torsion angle refinement
Low resolution refinement
•
Solving multiple times
–
–
–
–
•
Gradual increase in number of parameters
–
–
•
Keep track of Ramachandran and sidechains rotamers
Remove unlikely parts of mainchain and sidechain
Do not restrain Rama distribution or sidechains to rotamers during
refinement, it may give false validation results
Read what others are doing for low resolution
–
•
Mainchain -> bulky sidechains -> sequence register -> other sidechains
Finally known small mol binders with known binding site can be
modelled if reasonable density appears
Validation
–
–
–
•
Try to automate as much as possible the process of model building and
refinement, and then repeat it
Consensus substructures are more reliable, average them
Regions with differences are unreliable, remove them
Gives an idea of precision
e.g. Axel Brunger’s literature, CCP4 & phenix tools, CCP4bb
Images from wikipedia and Furnham et al. Structure 2006.
High resolution refinement
• High resolution structures provide atomic insights
–
–
–
–
Packing, binding
Flexibility
Enzyme mechanisms
Hydration
• Basic guidelines for model building and refinement
– High d/p => Be cautious of under-fitting!
– Make greater use of data than in low res case
– Make as detailed a model as possible, esp of interesting
regions
– Check all empty density critically
High resolution refinement
• Allow model to deviate from geometry when
data is strong
– Weight on xray term can be slowly increased
to reveal any unusual geometry without risking
model bias
• Use automation to fit biopolymers
– Trace secondary structure automatically, in
coot or with phenix tools
– Trace mainchain and build sidechains using
programs, e.g. with buccaneer, warpNtrace,
Rapper
– Do this multiple times to identify regions
requiring manual attention
•
•
Validation tools: can they indicate the information content of macromolecular crystal structures? EJ Dodson et al. Volume 6, Issue 6, 1998, 685-690.
Image from Terwillinger et al. papers in Acta Cryst D on automatic chain tracing.
High resolution refinement
• Explain all unoccupied
density
– Is it due to ligands?
• Build expected ligands
• Search unexpected small-mols
• E.g. coot or phenix ligand tools
– Is it due to multi-conformer
sidechains?
– Is it water?
•
Images from B. Rupp’s book and Terwillinger et al. Acta Cryst. 2005.
High resolution refinement
• Build waters
– Peak-pick semi-automatically to
form a reasonable hydration
network with sidechains
• Model hydrogens
– When difference density is visible
•
•
Image from B. Rupp’s book
Atomic resolution crystallography reveals how changes in pH shape the protein microenvironment. Lyubimov et al. Nature Chemical Biology 2, 259 - 264 (2006)
High resolution
refinement
• Verify correct sidechain
orientations of NQH
– Manually or automatically flip
NQH sidechains to improve hbonding
– Model more sidechain
conformations if necessary
• Use non-standard atomic
scattering models
– At subatomic resolution, model
electron density with nonspherical
multipolar model, or model bonds
as scatterers
•
•
•
Image from B. Rupp’s book
Afonine et al. Acta Cryst. (2007). D63, 1194–1197
Jelsch et al. PNAS 2000 97 7 3171.
High resolution refinement
•
Even in high res, maintain order of
– bb > sc > ligand
– Anisotropy, multiconformers, waters,
hydrogens
•Image from Antonyuk S V et al. PNAS 2005;102:12041-12046
•Image from David Blow’s book.
•
Invest more parameters around the
regions of interest
–
–
–
–
–
multi-conformers
Anisotropy
waters near active site
Possibility of multiple ligands
Releasing constraints / restraints
Summary
•
Resolution is the least distance between Bragg planes with observable reflection.
Two atoms closer than resolution cannot be observed distinctly using data at that
resolution.
•
Resolution dictates the detail revealed by electron density maps.
– Low resolution => low detail
– High resolution => high detail
•
Parameters in the model must be chosen to suit the resolution.
•
Over-fitting can be detected using Rfree and data to parameter ratio.
•
Knowledge-based constraints and restraints augment experimental data to make
refinement possible.
•
Geometric target is weighted more than crystallographic data at low resolution.
Model is allowed to diverge from ideal geometry at high resolution.
•
Greater detail should be modelled at higher resolution to make best use of data.
Acknowledgements
• Alejandro & IPMont MX organizers
• Sameer Velankar, Jawahar Swaminathan (EBI)
• Online resources
–
–
–
–
–
Kevin Cowtan
Rupp web