MEASUREMENT Goal To develop reliable and valid measures using state-of-the-art measurement models – Members: Chang, Berdes, Gehlert, Gibbons, Schrauf, Weiss 1 Why Item Response Theory? Classical Test Theory (Traditional) 2 Item Response Theory (Modern) Measures of precision fixed for all scores Precision measures vary across scores Longer scales increase reliability Shorter, targeted scales can be equally reliable (Short Form) Scale properties are sample dependent Item & scale properties are invariant within a linear transformation (DIF) Comparing person scores dependent on item set Person scores comparable across different item sets (CAT) Comparing respondents requires parallel scales Different scales can be placed on a common metric (Instrument Linking/Equating) Mixed item formats leads to unbalanced impact on total scale scores Easily handles mixed item formats Summed scores are on an ordinal scale Scores on interval scale Graphical tools for item and scale analysis Item Response Theory (IRT) 3 A family of mathematical descriptions of what happens when a person meets a test or survey question Relates characteristics of items (item parameters) and characteristics of persons (person latent traits) to the probability of a correct or rating/categorical response Models the test-taking behavior at the item level Item-Person Map Person Latent Trait Poor Likely (“easy”) Q Q Q Q QQ Q Q Q Q Q QQ Q Q Q Q QQ Q Q Q Q Q Item Location 4 Chang & Gehlert (2002). Good Unlikely (“hard”) Dichotomous Unidimensional IRT Models Probability of Success 1.0 – Difficulty (b) Constant x ai Pi( ) ci (1 ci) bi Ability () 2-PL – Difficulty (b) – Discriminating (a) 1 1 eDai ( bi ) cj 0 5 1-PL (Rasch) 3-PL – Difficulty (b) – Discriminating (a) – Guessing (c) Polytomous IRT Models Item Characteristic Curve: 0001 Partial Credit Model (Normal Metric) 1.0 1 – 1-PL (threshold) Probability 0.8 0.6 0.4 1=Yes, Limited a lot 2=Yes, Limited a little 3=No, Not Limited at all 2 3 -3 -2 -1 0 1 2 3 Ability 1 6 2 Partial Credit Rating Scale – 2-PL (threshold & discriminating) 0.2 0 Polytomous 3 * Vigorous activities, such as running, lifting heavy objects, participating in strenuous sports Nominal Graded Response Generalized Partial Credit Potential Advantages of Using IRT in “Geriatric” Pain Assessment 7 Refine existing instruments Evaluate item and scale characteristics Evaluate different response formats Detect differential item functioning Evaluate person fit (clinical diagnosis) Equate/Link instruments Establish item banks and brief forms Develop computerized adaptive testing Item Banking and CAT A B C D Item Pool F (Sets of Questions) 0.25 Probability of Response 0.50 Q Q Q Q IRT 0.75 0.00 0.8 0.6 0.4 0.2 0.0 -3 -2 -1 0 Depression 1 2 3 -3.00 -2.00 -1.00 Brief Forms 0.00 1.00 Overall Mental Health Item Bank (Catalogued; Hierarchically Structured) 8 new 1.0 1.00 Probability of Response E CAT 2.00 3.00 Principles of Adaptive Testing IRT pre-calibrated item bank Initial item selection Test scoring method Item selection during test administration Stopping rules 9 Item Bank 10 Set of carefully IRT-calibrated questions Items covers entire latent trait continuum Items represent differing amounts of trait Items represent differing amounts of information Basis for tailored/adaptive testing Items can be selected to maximize precision and retain clinical relevance Item Banking is Interdisciplinary 11 Psychometricians Information scientists Clinicians/healthcare providers Outcomes researchers Content experts … Approaches to Develop Item Banks Top-Down Approach Bottom-Up Approach Health Physical Physical Functioning 12 Pain Mental Symptom Depression Social Anxiety Spiritual Development and Maintenance of an Item Bank How to best calibrate existing items? – Model selection – Whose item parameters to use? – Standardization? – Generic vs. disease-specific Item parameter drift – Anchor or Re-calibrate? 13 How to write and best test new items? Adaptive Test An adaptive test is a tailored, individualized measure which involves selecting a set of test items for each individual that best measures the psychological characteristics of that person (Weiss, 1985) Weiss 14 DJ. Adaptive testing by computer. J Consult Clin Psychol. Dec 1985;53(6):774-789. Why Computerized Adaptive Testing? 15 Adaptive testing selects questions based on previous responses Tailored item and test difficulties Eliminates floor and ceiling effects Require fewer questions to arrive at an accurate estimate Automate question administration, data recording, scoring, and prompt reporting Allows for immediate feedback CAT Algorithm Administer Item of Median Difficulty (or Screening Item) Score Item Estimate Latent Trait (Theta) 16 Choose and Administer Next Item with Maximum Information No Terminatio n Criterion Satisfied Yes Stop Increase of Accuracy of Ability or Latent Trait Estimation in CAT For each item added to the test, the width of the interval decreases. Item 1-5 Item 1-4 Item 1-3 Item 1-2 Item 1 17 Ability () Potential Problems with CAT in Pain and Health Outcomes Measurement Context effects Unbalanced content Time frame Response categories Multidimensionality 18 What kind of short form? Rarely or none of the time (less than 1 day) Some or a little of the time (1-2 days) Occasional ly or a moderate amount of time (3-4 days) All of the time (5-7 days) 1. I was bothered by things that usually don't bother me Question 1 0 I do not feel sad. 1 I feel sad 2 I am sad all the time and I can’t snap out of it. 3 I am so sad or unhappy that I can’t stand it. 19 Are you basically satisfied with your life? True/False MORE Research Still Needed for Effective CAT Implementation 20 Item production Item statistics Item exposure Maintaining a valid bank of items for test construction Fairness Delivery options Effects of modes of administration Cost-benefit considerations Infrastructure of a National Geriatric Pain Item Bank Individual Researchers Pharm. Industries Non-profit Institutions Government Agencies Subscriber National “Central” Item Bank Collector Analyzer Builder Customized Information Retrieval; CAT; (automated) Brief Form Consortium Approval IRT Analyses Item Parameters 21 Retriever Public An Integrated Solution for Pain and Outcomes Assessments Data Collection Data Analysis/Mining Physician Station Physician PDA Field Survey PDA Clinic Survey Station Patient Field Survey Laptop Pharmaceutical Research Security Manager XML/XSL Parser University Research Survey Collector Service Dispatcher Device Detector Survey Analyzer Survey Builder Survey Retriever Survey Designer System Maintenance System Administrator Subject Profile 22 Survey Data Warehouse Survey Archive PROsIT System Chang, C.-H., & Yang, D. (2003, April 15). Patient-Reported Outcomes Information Technology: The PROsITTM System. ISPOR CONNECTIONS, 9(2), 5-6.