Charles Y. Tan, PhD
USP Statistics Expert Committee
 Introduction of <1210>
 Key topics
 Accuracy and Precision
 Linearity
 LOD, LOQ, range
 Summary
USP <1210>
 United States Pharmacopeia
 General Chapters
 <1210> Statistical Tools for Method Validation
 Current status: a draft is published in Pharmacopeial
Forum 40(5) [Sept-Oct 2014]
 Seek public comments
Purpose of <1210>
 A companion chapter to <1225> Validation of
Compendial Procedures
 USP <1225> and ICH Q2(R1)
 USP <1033> Biological Assay Validation
 Statistical tools
 TOST, statistical equivalence
 Statistical power, experimental design
 tolerance intervals, prediction intervals
 Risk assessment, Bayesian analysis
 AIC for calibration model selection
Recent Framework
 Life cycle perspective
 procedure design
 performance qualification / validation
 ongoing performance verification
 ATP: Analytical Target Profile
 Pre-specified acceptance criteria
 Assume established
 Validation: confirmatory step
 Statistical interpretation of “validation”
Performance Characteristics
 Different statistical treatments
 Tier 1: accuracy and precision
 Statistical “proof” ATP is met
 Equivalence test / TOST
 Sample size / power, DOE
 Tier 2: linearity, LOD
 Relaxed evidential standard, estimation
 Sample size / power optional
USP General Chapter <1210>
Statistical Tools for Method Validation
Accuracy and Precision
 Separate Assessment Of Accuracy And Precision
 Confidence interval within acceptance criteria from ATP
 Combined Validation Of Accuracy And Precision
 γ-expectation tolerance interval: 100γ% prediction
interval for a future observation,
Pr (-λ ≤ Y ≤ λ) ≥ γ
 γ-content tolerance interval: 100γ% confidence of all
future observations
 Bayesian tolerance interval
Experimental Condition
 Yij = μ + Ci + Eij
 Ci: experimental condition
 combination of ruggedness factors: analyst, equipment,
or day
 DOE: experience the full domain of operating
 As independent as possible
 Eij: replication within each condition
 One-way analysis (w/ random factor): why?
Separate Assessment
 Closed form formulas:
 Accuracy: classic confidence interval for bias
 Precision: confidence interval for total variability under
one-way layout (Graybill and Wang)
 Power and sample size calculation
 Statement of the parameters: bias, variance
 Eg. CI of bias: [-0.4%, 1.1%], within ±5% (ATP)
 Eg. CI of total variability: ≤2.4%, within 3% (ATP)
 Implicit risk level: 95% confidence intervals
Combine Accuracy and Precision
 Statement of observation(s)
 Closed form formulas, but a bit more complicate
99%-expectation tolerance interval: eg. [-4.3%, 5.0%] within
±10% (ATP)
99%-content tolerance interval: eg. [-5.9%, 6.6%] within ±15%
 Bayesian tolerance interval
 “the aid of an experienced statistician is recommended”
 Simpler Alternative: directly assess the risk with the λ
given in ATP
 Pr (-λ ≤ deviation from truth ≤ λ|data)
Scale of Analysis
 Pooling variances is central to stat analysis
 Variance estimates with df=2 are highly unstable
 Need to pool across samples, levels
 Variance at mass or concentration scale/unit
 Increase with level
 Solutions:
 Normalize with constants, eg. Label claim
Normalizing by observed averages makes stat analysis too
 Log transformation
 %NSD and %RSD
 Internal performance characteristic
 External view: accuracy and precision
 Transparency => credibility
 Appropriateness of standard curve fitting
 A model
 A range
 Better than the alternatives (all models are
 Proportional: model: Y = β1X + ε
 Straight line: Y = β0 + β1X + ε
 Quadratic model: Y = β0 + β1X + β2X2 + ε
Current Practices
 Pearson correlation coefficient
 Anscombe's quartet
 Lack-of-fit F test
 independent replicate
 Mandel’s F-test, the quality coefficient, and the Mark–
Workman test
 Test of significance
 Evidential standard: low since it gives the benefit of doubt to
the model you want
 Good precision may be “penalized” with a high false rejection
 Poor precision is “rewarded” with false confirmation of the
simpler and more convenient model
Anscombe's Quartet
Two New Proposals
 Equivalence test, TOST, in concentration units
 Define maximum allowable bias due to calibration in ATP
 Construct 90% confidence interval for the bias comparing the
proposed model to a slightly more flexible model
 Closed form formula, complex
 Evidential standard: could be high, depend on allowable bias
 Akaike Information Criterion, AICc
 Compare the AICc of the proposed model to a slightly more
flexible model (smaller wins)
 Very simple calculations
 Evidential standard: most likely among candidates
Different Burden of Proof
 Hypothesis Testing: Neyman-Pearson
 Frame the issue: null versus alternative hypotheses
 Goal: reject the null hypothesis
 Null hypothesis: protected regardless of amount of data
 Decision standard: beyond reasonable doubt
 Legal analogy: criminal trial
 Information Criteria: Kullback-Leibler
 Frame the issue: a set of candidate models
 Goal: find the best approximation to the truth
 Best: most parsimonious model given the data at hand
 Decision standard: most likely among candidates
 Legal analogy: civil trial
 Stepping-stone or tactical questions: information criteria are apt
alternatives to hypothesis tests
LOD: Using Prediction Bounds
Range and LOQ
 Range
 suitable level of precision and accuracy
 Both upper and lower limits
 acceptable precision and accuracy
 lower limit
 LOQ versus LOD
 Only one is needed for each use
 LOQ for quantitative tests
 LOD for qualitative limit tests
 LOQ calculation in ICH Q2: candidate starting values
 A draft of USP <1210> is published, seeking public
 A step in the right direction?
 More than a bag of tools
 Implement modern validation concepts with a
statistical structural
 More tools development needed
 More statisticians involvement needed in
pharmacopeia and ICH development

similar documents