### Item Response Theory (IRT) and its Application to Health

```MEASUREMENT Goal

To develop reliable and valid
measures using state-of-the-art
measurement models
– Members: Chang, Berdes, Gehlert,
Gibbons, Schrauf, Weiss
1
Why Item Response Theory?
2
Item Response Theory (Modern)
Measures of precision fixed for all
scores
Precision measures vary across scores
Longer scales increase reliability
Shorter, targeted scales can be equally
reliable (Short Form)
Scale properties are sample dependent
Item & scale properties are invariant within
a linear transformation (DIF)
Comparing person scores dependent on
item set
Person scores comparable across different
item sets (CAT)
Comparing respondents requires
parallel scales
Different scales can be placed on a common
unbalanced impact on total scale
scores
Easily handles mixed item formats
Summed scores are on an ordinal scale
Scores on interval scale
Graphical tools for item and scale analysis
Item Response Theory
(IRT)



3
A family of mathematical descriptions of
what happens when a person meets a test
or survey question
Relates characteristics of items (item
parameters) and characteristics of persons
(person latent traits) to the probability of a
correct or rating/categorical response
Models the test-taking behavior at the item
level
Item-Person Map
Person Latent Trait


 
Poor   


     
Likely
(“easy”)
Q Q
Q Q QQ Q Q
Q Q Q QQ Q Q
Q Q QQ Q Q
Q Q Q
Item Location
4
Chang & Gehlert (2002).
Good
Unlikely
(“hard”)
Dichotomous
Unidimensional IRT Models

Probability of Success
1.0
– Difficulty (b)
Constant x ai
Pi( )  ci  (1  ci)

bi
Ability ()
2-PL
– Difficulty (b)
– Discriminating (a)
1
1  eDai ( bi )

cj
0
5
1-PL (Rasch)
3-PL
– Difficulty (b)
– Discriminating (a)
– Guessing (c)
Polytomous IRT Models
Item Characteristic Curve: 0001

Partial Credit Model (Normal Metric)
1.0
1
– 1-PL (threshold)
Probability
0.8
0.6
0.4
1=Yes,
Limited a
lot
2=Yes,
Limited a
little
3=No, Not
Limited at
all




2
3
-3
-2
-1
0
1
2
3
Ability
1
6
2
Partial Credit
Rating Scale
– 2-PL (threshold &
discriminating)
0.2
0
Polytomous
3
* Vigorous activities, such as running, lifting heavy
objects, participating in strenuous sports

Nominal
Generalized Partial
Credit
in “Geriatric” Pain Assessment








7
Refine existing instruments
Evaluate item and scale characteristics
Evaluate different response formats
Detect differential item functioning
Evaluate person fit (clinical diagnosis)
Establish item banks and brief forms
Item Banking and CAT
A
B
C
D
Item Pool
F
(Sets of Questions)
0.25
Probability of Response
0.50
Q
Q
Q
Q
IRT




0.75
0.00
0.8
0.6
0.4
0.2
0.0
-3
-2
-1
0
Depression
1
2
3
-3.00
-2.00
-1.00
Brief Forms
0.00
1.00
Overall Mental Health
Item Bank (Catalogued; Hierarchically Structured)
8
new
1.0
1.00
Probability of Response
E
CAT
2.00
3.00
Testing
 IRT
pre-calibrated item bank
 Initial item selection
 Test scoring method
 Item selection during test
 Stopping rules
9
Item Bank






10
Set of carefully IRT-calibrated questions
Items covers entire latent trait continuum
Items represent differing amounts of trait
Items represent differing amounts of
information
Items can be selected to maximize precision
and retain clinical relevance
Item Banking is Interdisciplinary






11
Psychometricians
Information scientists
Clinicians/healthcare providers
Outcomes researchers
Content experts
…
Approaches to Develop
Item Banks


Top-Down Approach
Bottom-Up Approach
Health
Physical
Physical
Functioning
12
Pain
Mental
Symptom
Depression
Social
Anxiety
Spiritual
Development and
Maintenance of an Item
Bank

How to best calibrate existing items?
– Model selection
– Whose item parameters to use?
– Standardization?
– Generic vs. disease-specific

Item parameter drift
– Anchor or Re-calibrate?

13
How to write and best test new items?

An adaptive test is a tailored,
individualized measure which
involves selecting a set of test
items for each individual that best
measures the psychological
characteristics of that person
(Weiss, 1985)
Weiss
14
DJ. Adaptive testing by computer. J Consult Clin Psychol. Dec
1985;53(6):774-789.
Why Computerized






15
Adaptive testing selects questions based on
previous responses
Tailored item and test difficulties
Eliminates floor and ceiling effects
Require fewer questions to arrive at an
accurate estimate
recording, scoring, and prompt reporting
Allows for immediate feedback
CAT Algorithm
Difficulty (or Screening Item)
Score Item
Estimate Latent Trait
(Theta)
16
Choose and
Next Item with
Maximum
Information
No
Terminatio
n Criterion
Satisfied
Yes
Stop
Increase of Accuracy of
Ability or Latent Trait
Estimation in CAT
For each item added to the test, the width of the interval
decreases.
Item 1-5
Item 1-4
Item 1-3
Item 1-2
Item 1
17
Ability ()
Potential Problems with CAT
in Pain and Health
Outcomes Measurement

Context effects
Unbalanced content
Time frame
Response categories

Multidimensionality



18
What kind of short form?
Rarely or
none of
the time
(less than 1
day)
Some or a
little of the
time
(1-2 days)
Occasional
ly or a
moderate
amount of
time
(3-4 days)
All of the
time
(5-7 days)
1. I was bothered by things that
usually don't bother me
Question 1
0 I do not feel sad.
2 I am sad all the time and I can’t snap out of it.
3 I am so sad or unhappy that I can’t stand it.
19
Are you basically satisfied with your life?
True/False
MORE Research Still Needed for
Effective CAT Implementation




20
Item production
Item statistics
Item exposure
Maintaining a valid
bank of items for
test construction




Fairness
Delivery options
Effects of modes of
Cost-benefit
considerations
Infrastructure of a National
Geriatric Pain Item Bank
Individual
Researchers
Pharm.
Industries
Non-profit
Institutions
Government
Agencies
Subscriber
National “Central” Item Bank
Collector Analyzer
Builder
Customized Information
Retrieval; CAT;
(automated) Brief Form
Consortium
Approval
IRT Analyses
Item
Parameters
21
Retriever
Public
An Integrated Solution for
Pain and Outcomes
Assessments
Data Collection
Data Analysis/Mining
Physician Station
Physician PDA
Field Survey PDA
Clinic Survey Station
Patient
Field Survey Laptop
Pharmaceutical Research
Security
Manager
XML/XSL
Parser
University Research
Survey
Collector
Service
Dispatcher
Device
Detector
Survey
Analyzer
Survey
Builder
Survey
Retriever
Survey Designer
System Maintenance
Subject Profile
22
Survey Data
Warehouse
Survey Archive
PROsIT
System
Chang, C.-H., & Yang, D. (2003, April 15). Patient-Reported Outcomes Information Technology: The PROsITTM
System. ISPOR CONNECTIONS, 9(2), 5-6.
```