Bayesian modeling of nonsampling error Alan M. Zaslavsky Harvard Medical School General setup for nonsampling error • Focus on measurement error problem – Item responses with error – Item or unit nonresponse as a special error response – …or nonresponse as part of error for aggregates • Y = data measured with error • Y* = latent “true” values (object of inference) – Might be observed for part of data (calibration) • X = covariates – Assumed (for presentation) correct and complete – Include design information Objective of inference • Estimate statistics of “true” values f(Y*) • Estimate parameters of models – From likelihood standpoint: inference from L(q | Y*,X) – (Specifically) from Bayesian standpoint, draw from P(q | Y*,X) • Both possible if we have draws of Y* – Multiple imputation for valid inferences Two ways to factorize distribution • Predictive factorization: P(Y,Y* | X,b,b*) = P(Y* | Y,X,b*) P(Y | X, b) – Direct prediction of Y* for imputation • “Scientific” factorization: P(Y,Y* | X,b,b*) = P(Y | Y*,X,b) P(Y* | X, b*) – First factor is observation (measurement error) model – Second factor is model for true relationships More on “scientific” factorization • Separates two distinct processes – Information might be from different sources – Possibility of more (or different) generalizability • Models are more interpretable – Incorporate prior information for specification and parameters – Easier to assess “congeniality” of models? • Compare model for P(Y* | X, b*) with model involving q – Simplifications? e.g. P(Y | Y*,X,b) = P(Y | Y*,b) Inference with “scientific” factorization • Computations via Gibbs sampler – Imputation of Y* by Bayes’s theorem – Complete-data inferences for b, b* • Inferences of scientific interest (q) – Multiple imputation inference using Y* – Direct from model if q=q(b*) Possible sources for measurement error model parameters (b) • Calibration study – Sample of (Y,Y*) pairs to identify the two parameters – For robustness, important to build in adequate flexibility to avoid identifying off unverified model assumptions about P(Y | X,b,b*) • Prior studies (also used Bayesianly as prior) – Previous calibration model estimates, if measurement process is consistent – Synthesis of accumulated survey methodology Example 1: Correction for underreporting in study of chemotherapy for colorectal cancer • Provision of guideline-recommended adjuvant chemotherapy a critical issue in quality of care for cancer • Cancer registries as a source of chemo data – Excellent population coverage – Underreporting of treatment California study • Cancer registry data – Statewide coverage – About 70,000 cases over 5 years in relevant stages (appropriate for chemotherapy) • Calibration survey – Request medical record data from physicians – Limited in time (1+ year) and space (3 of 10 regions) – 1956 cases in sample, 1449 (74%) respond Reporting of adjuvant therapy • Folllowup survey response rate higher … – at HMO-affiliated and high-volume hospitals – when chemo reported in original record • 82% of adjuvant therapy was reported to Registry (among “respondents”) – Substantial underestimation if Registry alone used – More complete in teaching hospitals, HMO affiliates, high volume hospitals, younger and rectal cancer patients Cress et al., Medical Care 2003 Naïve estimation of administration of adjuvant chemotherapy • Analysis based only on “gold standard” survey + Registry data in sample • Strong variation by patient characteristics – Age (less if older), marital status – Race (less if Black, more if Hispanic, Asian) – Income (upward gradient with higher income) • Substantial unexplained hospital-level variation Ayanian et al., J Clinical Oncology 2003 Limitations of standard analytic approaches • Survey respondents alone: – Small portion of available California data (1449/70,000) – Single area of state – Unrepresentative due to survey nonresponse – Confounding of survey response, reporting, treatment variation (e.g. volume effects) • Registry data alone: – Underreporting of chemotherapy – Reporting is nonuniform Combining Registry and survey data • Combine – power of large Registry data – correction for underreporting based on survey • Simple correction based on: P(reported chemo) = P(chemo) P(report | chemo) Therefore: P(chemo) = P(reported chemo) / P(report | chemo) Registry plus simple correction • In survey: P(reported chemo) = 59% P(report | chemo) = 82% P(chemo) = 59%/82% ≈ 71% • Outside survey (mostly rest of state): P(reported chemo) = 49% P(report | chemo) = 82% P(chemo) = 49%/82%≈ 60% Depends on assumption that reporting is similar in the two areas Model-based methodology (Yucel and Zaslavsky) • Disaggregated model – Take into account individual effects on both chemotherapy and reporting – Take into account hospital variation in both chemotherapy and reporting • Imputation of chemo for individual cases – Allow fitting of any desired models – Multiple imputation to obtain proper measures of uncertainty with imputed data Models for reporting and therapy • Logit or Probit regression for therapy (outcome) – Patient p has characteristics xhp: age, sex, race/ethnicity, comorbidity score (Charlson), tumor stage/site, income category – Hospital h has characteristics zh: volume, ACOS-certified registry, teaching – Random effect gh for hospital h logit P(chemohp) = bxhp + lzh + gh • Similar model (with or without random effect) for reporting given therapy – Random effects for reporting & therapy could be correlated Two versions of hierarchical model (a) single random effect Outcome Reporting (b) bivariate RE Outcome ←Parameters→ Latent “true” status Observed status Reporting Fitting the model • Full Bayesian specification – Diffuse priors for coefficients, (co)variances • Fit via Gibbs sampling: alternately – Impute true chemo status for non-survey cases – Draw random hospital effects g – Draw “fixed” coefficients b, l and variance components S Imputing chemo status (Bayes thrm) • Example: consider individual (not in survey) for whom models give – Prior P(chemo)=70% – Prior P(reporting | chemo) = 80% • If chemo reported, then true chemo = 1 • If chemo not reported: – P(no chemo, no report) = 30% – P(chemo, no report) = 70% 20% = 14% – P(chemo | no report) = 14%/(14% + 30%) ≈ 32% – Impute chemo=1 with probability 32% Computing: probit via latent variables • Probit model: F(P(Yhp=1))= bxhp + lzh + gh – Equivalently: Yhp=1 ↔ ehp < bxhp + lzh + gh, where ehp ~N(0,1) is a normal latent variable (Albert & Chib 1993) – Equivalently, Yhp=1 ↔ uhp= bxhp + lzh + gh−ehp >0 – Observing Yhp implies truncated normal posterior for uhp given higher-level parameters b, l, gh • Given a draw of uhp, higher levels reduce to normal multilevel model with observation uhp and fixed variance=1 at bottom level (well-known problem) • independent of the discrete data or imputed values • direct generalization to correlated bivariate response “Restricted” inference for robustness • Two kinds of information involved in inference for “reporting” model – “Direct” in survey sample (1449 cases): Y | Y*, parameters, X – “Indirect” in remaining area (~74,000+ cases): Y | parameters, X (combines outcome & reporting models) – Possibly sensitive to model misspecification? • Ad hoc solution: Restrict likelihood for reporting model to direct data from reporting survey cases – Throw away some information from others – Greater robustness to slight misspecification? – Reparametrize S as regression g(R)| g(O) & marginal g(O) Direct interpretation of fitted model • Effects broadly similar to those in naïve (sample only) analyses. – Volume effect on reporting but not on chemo – Lower chemo rate outside survey region • Substantial hospital random effects in both reporting and therapy rates – Indication of substantial unexplained variation – a problem (from health services standpoint)! – Reporting completeness and therapy rates not (residually) correlated Using imputations to estimate effect of chemotherapy on survival • Re-fit model including 2-year survival as predictor of chemotherapy • Using imputed corrected chemotherapy, fit model with chemotherapy (and other variables) as predictor of survival – Correct variances with multiple imputation – Missing info ≈70% for chemo, 1-4% for other variables • Finds significant positive effect (OR=1.26) of chemo on survival – [Are the severity controls good enough?] Modeling critical with missing data • Several kinds of missing data: – Unreported chemotherapy – Nonresponse to followback (validation) survey – Areas excluded from followback survey • Potential for confounding if unjustifiable MCAR (or insufficiently conditional MAR) assumptions are made – MCAR = Missing Completely at Random: missingness independent of everything – MAR = Missing at Random: missingness independent of unobserved, conditional on observed Some countinterintuitive results! Hospital Volume Low Med High 63 73 78 81 81 92 All 75 87 54 44 51 68 66 53 63 72 62 48 58 71 70 44 63 71 74 53 69 69 73 47 67 69 Survey response rate Reporting completeness in survey Chemotherapy rates by registry Survey respondents 60 Survey nonrespondents 40 All 52 Chemotherapy rates by survey 77 Chemotherapy rates by hybrid method Survey respondents 80 Survey nonrespondents 40 All 65 Chemotherapy rates under model 67 Limitations and potential design improvements • Major limitation: calibration survey is unrepresentative (in known ways) – Only covers some areas (trial implementation) – Differences by region in reporting are plausible – Can evaluate sensitivity to alternative assumptions • Could improve design for ongoing studies – Sample across entire area – Quality improvement for both therapy and reporting Example 2: Adjustment for measurement bias of 1990 Post Enumeration Survey • Post-Enumeration Survey provides estimates of proportional error in Decennial Census estimates – Includes whole-household and withinhousehold under- and overenumerations – Tabulated for poststrata of individuals defined by household-level (region, urbanicity) and individual-level (age, sex, race/ethnicity) variables Notation for undercount estimation (Zaslavsky 1993, JASA) • k = domain index • ck = population share of domain k • y*k = true census underenumeration rate • yˆ k = (biased) estimate of y*k from survey • yk = E yˆ = expectation, bk = yk −y*k= bias k • bˆ = unbiased estimate of bk, E bˆk = bk k • Constraints: S ck y*k = S ck yk = S ck yˆ k = S ck bk = 0 (sum of errors in shares is 0). • Sampling variance of yˆ = Var yˆ | y = Vy Components and variance of bˆk • Sources of bias estimates (total error model) – Small calibration studies to estimate process errors (matching, geocoding, fabrications) – Model-based estimates of correlation bias – Uncertainty about imputation model • Var ( bˆ − b) = Vb includes – Sampling variances from calibration studies, – Uncertainty across correlation bias models, – (Multiple) imputation variance and model uncertainty A naïve approach and its problems • Simple bias corrected estimate is yˆ bˆ – Unbiased estimator of y* – Variance is Vy + Vb and Vb is likely to be large – Problem for non-Bayesian approaches: if we have very little data to estimate something, must we assume that it could be “anything”? • Alternative (Bayesian) approach: introduce reasonable prior beliefs – Bias terms bk are a collection centered around 0 – Characterize variability by variance component – Similar argument for undercount terms yk Hierarchical model for estimation and bias correction ˆ y • “Sampling” model: ~ N y , Vy bˆ b 0 0 Vb – Not exactly “sampling” since some model uncertainty is included in Vb • “Structural” (Level 2) model: 2 U U y 0 y yb y b ~ N , 2 U U b 0 b yb y b Hierarchical model for estimation and bias correction • “Structural” (Level 2) model: 2 U U 0 y y yb y b ~ N , 2 U U b 0 b yb y b – Undercount and bias terms each drawn from common distribution – Proportional covariance structures for each and for correlation of the two – Matrix U based on a prior “similarity” of domains (number of common characteristics) Priors and inference • Fairly vague priors for variance components, correlation – These represent assessments of degree of variation in bias, undercount and how they relate across domains – Key to this inference is existence of collection of domains • Inference via Gibbs sampler • Extensive simulations – Compare to uniform shrinkage, hypothesis testing approaches, etc. – Suggested that full hierarchical Bayes model would outperform competitors Analyses with 1992 data • Data combined 3 sources – 1990 census – Post-Enumeration Survey – Various sources of bias component estimates • Estimates: – Substantial differential undercount, ~y 1.2% – Substantial differential bias, ~b 3.2% Refinement: misaligned domains (Zaslavsky 1992, Proc. SRMS) • Domains for bias estimates might differ from those for y – e.g. if they combine the main domains ˆ Xbˆ – Observation is b 0 • Modifies the sampling model: y Vy yˆ ~ N , ˆ b0 Xb 0 0 XVb X' • Applied to 1992 data: – 357 poststrata, 51 poststratum groups, but only 10 evaluation poststrata Other potential applications • Domain-level estimates – No gold standard data for individuals – No individual-level corrections • Many applications where there are small evaluation samples for a measure – Welfare or food stamp payment error – Quality evaluations in medical care Example 3: Imputation of households to correct for enumeration error • Setting: Census (or survey) of households with errors of enumeration – Whole-household errors – Within-household errors – [Assumption (here) that all errors are omissions] • Objective: To (multiply) impute corrected rosters. – Add person to households – Impute additional households Bayesian imputation strategy (Zaslavsky 2004; Zaslavsky & Rubin 1989 Proc. ARC) • Based on “scientific” factorization – Prevalence model: distribution of households by compositional type (roster of members by poststratum), P(Y*bk=t | bk) bk= (latent) parameter of block b – Observational model: probability of observed types (with error), P(Ybk=u | Y*bk=t,b) Model specifics • Prevalence models – x(t) summarizes characteristics of type t – Prevalence proportional to exp(x(t) · bk) · h(t) • h(t) is (nonparametric) general prevalence of type t • Observational model • Loglinear model based on probabilities of omission of individuals • Terms for dependence of omissions within household • Could be based on (hypothetical) dataset … • … and/or calibrated to match aggregate omission rate estimates by poststratum Imputations • Draw Y*bk by Bayes’s theorem – Possible values are those types that could “lose” one or more members yielding observed Y*bk – Draw from all possible values of t • Special type for unobserved households – Count imputed using SOUP (unbiased) prior – True types imputed similar to others • Gibbs sampler to estimate all parameters General summary of examples • All are “Bayesian” in drawing corrected values from posterior distributions – “Scientific” factorization for interpretability (Examples 1 and 3) – “Observations” might have simple (Ex. 1,2) or complex (Ex. 3) structure • Bayesian also in – Incorporating prior information – Pooling across collections of units (“shrinkage”) – Hierarchical specification of complex models – Probability representation of model uncertainty (Ex. 2) Program to move forward • Systematic quantitative meta-analysis of information on nonresponse errors • Models for various types of nonresponse error • Think more about how to combine information from data and model uncertainty • Standard algorithms and software • Integrate with analyses of nonresponse, item missing data, etc.