Borrowing the Strength of Unidimensional Scaling to Produce

Report
Borrowing the Strength of Unidimensional
Scaling to Produce Multidimensional
Educational Effectiveness Profiles
P R E S E N T A T I O N A T T H E 1 2 TH A N N U A L
MARYLAND ASSESSMENT CONFERENCE
COLLEGE PARK, MD
OCTOBER 18, 2012
JOSEPH A. MARTINEAU
JI ZENG
MICHIGAN DEPARTMENT OF EDUCATION
Background
2
 Prior research showing that using unidimensional measures of
multidimensional achievement constructs can distort value-added

Martineau, J. A. (2006). Distorting Value Added: The Use of Longitudinal,
Vertically Scaled Student Achievement Data for Value-Added Accountability.
Journal of Educational and Behavioral Statistics, 31(1), 35-62.

Construct irrelevant variance can become considerable in value-added
measures when a construct is multidimensional, but is modeled in valueadded as unidimensional.
Common misunderstanding is that if the multiple constructs are highly
correlated, value-added should not be distorted.
Correct understanding is that if value-added on the multiple constructs is
highly correlated, value-added should not be distorted


Background
3
 Prior research showing that the choice of dimension/domain within
construct changes value-added significantly

Lockwood, J.R et al. (2007). The Sensitivity of Value-Added Teacher Effect
Estimates to Different Mathematics Achievement Measures. Journal of
Educational Measurement, 44(1), 47-67.

Depending on choices made in value-added modeling, the correlation
between teacher value-added on Procedures and Problem Solving ranged
from 0.01 to 0.46.
This gives a surprisingly low correlation in value-added that indicates that at
least in this situation, one needs to be concerned about modeling valueadded in both dimensions rather than unidimensionally.
Only work I am aware of to date that has inspected inter-construct valueadded correlations.


Background
4
 Prior research showing that commonly used factor analytic techniques
underestimate the number of dimensions in a multidimensional
construct

Zeng, J. (2010) . Development of a Hybrid Method for Dimensionality
Identification Incorporating an Angle-Based Approach. Unpublished
doctoral dissertation, University of Michigan.

Common dimensionality identifications procedures make the unwarranted
assumption that all shared variance among indicator variables arise because
the indicator variables measure the same construct (shared variance can also
arise because the indicator variables are influenced by a common exogenous
variable)
Because of this unwarranted assumption, commonly used dimensionality
identification techniques underestimate the number of dimensions in a data
set.

Background
5
 Scaling constructs as multidimensional is a
difficult task




Multidimensional Item Response Theory (MIRT) is timeconsuming and costly to run
Replicating MIRT analyses can be challenging (there are multiple
subjective decision points along the way)
Identifying the number of dimensions in MIRT can be
challenging
Once the number of dimensions is identified, identifying which
items load in which dimensions in MIRT can also be challenging

The factor analysis techniques underlying MIRT are techniques for
data reduction, not dimension identification
Background
6
 Short of resolving the considerable difficulties in
analytically identifying dimensions within a construct (and
replicating such analyses), can another approach be used?
 Propose using/trusting content experts’ identifications of
dimensions within constructs (e.g., the divisions agreed
upon by the writers of content standards) as the best
currently available identification of dimensions, for
example…


Within English language proficiency, producing reading, writing,
listening, and speaking scales.
Within Mathematics, producing number & operations, algebra,
geometry, measurement, and data analysis/statistics scales.
Background
7
 However, separately scaling each dimension can also be difficult and
costly compared to running a traditional unidimensional IRT
calibration



Confirmatory MIRT
Bi-factor IRT model
Separate unidimensional calibration and year-to-year equating of each
dimension score
 Another option:




Unidimensionally calibrate the total score
Unidimensionally equate the total score from year to year
Use (fixed) item parameters from the unidimensional calibration to create
the multiple dimension scores as specified by content experts
Use of this method needs to be investigated
 Practical necessity for Smarter Balanced Assessment Consortium
Purpose
8
 Investigate the feasibility and validity of relying on
unidimensional total score calibration as a basis for
creating multidimensional profile scores…


For reporting multidimensional student achievement scores
For reporting multidimensional value-added measures
 Investigate the impact of separate versus fixed calibration
of multidimensional achievement scores in terms of impact
on…


Student achievement scores
Value-added scores
 …as compared to the impact of other common decisions in
scaling, outcome selection, and value-added modeling
Methods
9
 Decisions Modeled in the Analyses

Psychometric decisions



Choice of outcome metric


Choice of psychometric model
 1-PL vs. 3-PL
 PCM vs. GPCM
Estimation of sub-scores
 Separate calibration for each dimension vs. fixed calibration based on
unidimensional parameters
Which sub-score is modeled
Value-added modeling decisions


Inclusion of demographics in models
Number of pre-test covariates (for covariate adjustment models)
Methods
10
 Outcomes
 Correlations in student achievement metrics compared across each
psychometric choice and outcome choice
 Correlations in value-added modeling compared across each choice
 Classification consistency in value-added compared across each
choice for


Three-category classification decisions
 Based on confidence intervals around point-estimates placing
programs/schools into three categories: (1) above average, (2)
statistically indistinguishable from the average, and (3) below average
Four-category classification decisions
 Based on sorting programs’/schools’ point estimates into quartiles,
representing arbitrary cut points for classification
Methods
11
 Data
 Michigan English Language Proficiency Assessment (ELPA)
 Level III (Grades 3-5)
 3391 students each with 3 measurement occasions (10,173 total scores)
 Measures







Total
Reading
Writing
Listening
Speaking
(domain)
(domain)
(domain)
(domain)
Calibrated the ELPA as a unidimensional measure using both 1PL/Partial Credit Model and 3-PL/Generalized Partial Credit Model
Created domain scores both from fixed parameters from unidimensional
calibration and in separate calibrations for each domain
Methods
12
 Data




Michigan Educational Assessment Program (MEAP) Mathematics
Grades 7 and 8 (not on a vertical scale)
Over 110,000 students per grade
Measures
Total
 Number & Operations
 Algebra



(using items from the two domains)
(domain)
(domain)
Calibrated the MEAP Math tests as unidimensional measures using
both 1-PL and 3-PL models
Created domain scores both from fixed parameters from
unidimensional calibration and in separate calibrations for each
domain
Methods
13
 Value-added modeling the ELPA
 3-level
HLM nesting test occasion within student
within English language learner program to
obtain program value-added
 
= 0 + 1  + 
 0
= 00 + β′0  + 0
 1
= 10 + β′1  + 1
 00
= 000 + ′00  + 00
= 100 + ′10  + 10
 10
Methods
14
 Value-added modeling the ELPA
 VAMs
were run in a fully-crossed design with…
 All
outcomes (R, W, L, S)
 PCM- and GPCM-calibrated outcomes
 Fixed and separately calibrated outcomes
 With and without demographics in the VAMs
 32
real-data applications across design
factors
Methods
15
 Value-added modeling MEAP mathematics
 2-level
HLM covarying grade-8 outcomes on
grade-7 outcomes with students nested within
schools
 
= 0 + 1 
 0
= 00 + ′0  + 0
= 10 + 1
= 20 + 2
 1
 2
−1 
+ 2 
−1 
+ ′ + 
Methods
16
 Value-added modeling MEAP mathematics
 VAMs
were run in a fully-crossed design with…
 Both
outcomes (algebra and number & operations)
 1-PL and 3-PL calibrated outcomes
 Fixed and separately calibrated outcomes
 With and without demographics
 With either one or two pre-test covariates
 32
real-data applications across design factors
Results
17
ELPA
Results: ELPA Student-Level Outcomes
18
 Correlations across fixed vs. separate calibrations
Model choice
PCM
GPCM
Content Area
Reading
Writing
Listening
Speaking
Reading
Writing
Listening
Speaking
Correlation
0.997
0.995
0.997
1.000
0.997
0.997
0.994
1.000
Results: ELPA Student-Level Outcomes
19
 Correlations across model choice (PCM vs. GPCM)
Calibration choice Content Area
Reading
Writing
Fixed
Listening
Speaking
Reading
Writing
Separate
Listening
Speaking
Correlation
0.972
0.983
0.967
0.982
0.978
0.983
0.977
0.982
Results: ELPA Student-Level Outcomes
20
 Correlations across content areas
Model
choice
Calibration
choice
Fixed
PCM
Separate
Fixed
GPCM
Separate
Content
Area
Reading
Writing
Listening
Speaking
Reading
Writing
Listening
Speaking
Reading
Writing
Listening
Speaking
Reading
Writing
Listening
Speaking
Reading
-
Content Area
Writing Listening Speaking
0.636
0.627
0.371
0.537
0.385
0.368
0.622
0.626
0.373
0.519
0.375
0.365
0.655
0.662
0.402
0.559
0.407
0.405
0.639
0.648
0.395
0.543
0.400
0.394
-
Low to moderate
inter-dimension
correlations
However, Rasch
dimensionality
analysis from
WINSTEPS
identified the total
score as a
unidimensional
score
Results: ELPA Program District-Level Value-Added
Outcomes
21
 Impact of fixed versus separate calibration
Correlations
3-Category
Consistency
4-Category
Consistency
Content Area
Reading
Writing
Listening
Speaking
No Demos
PCM
GPCM
1.000
0.987
1.000
0.997
1.000
0.987
1.000
1.000
Content Area
Reading
Writing
Listening
Speaking
No Demos
PCM
GPCM
0.996
0.996
1.000
0.996
1.000
1.000
1.000
1.000
Content Area
Reading
Writing
Listening
Speaking
No Demos
PCM
GPCM
0.982
0.875
0.973
0.946
0.991
0.897
1.000
1.000
Demos
PCM
1.000
1.000
1.000
1.000
GPCM
0.992
0.997
0.987
1.000
min
max
mean
SD
0.987
1.000
0.997
0.005
GPCM
0.991
0.991
0.996
1.000
min
max
mean
SD
0.991
1.000
0.998
0.003
GPCM
0.902
0.946
0.906
1.000
min
max
mean
SD
0.875
1.000
0.961
0.043
Demos
PCM
1.000
1.000
1.000
1.000
Demos
PCM
0.982
0.982
0.991
1.000
Results: ELPA Program District-Level Value-Added
Outcomes
22
 Correlations between Listening and Reading VA
Reading
No Demos
Demos
Listening
No Demos
Fixed
Separate
Fixed
GPCM
Separate
Fixed
PCM
Separate
Fixed
GPCM
Separate
PCM
PCM
Fixed Separate
0.371 0.371
0.372 0.371
0.360 0.361
0.376 0.377
0.330 0.330
0.329 0.330
0.304 0.305
0.328 0.329
 Min = 0.228, Max = 0.397
 Mean = 0.322, SD = 0.037
GPCM
Fixed Separate
0.301 0.327
0.303 0.328
0.387 0.392
0.389 0.397
0.292 0.308
0.294 0.309
0.341 0.342
0.346 0.350
Demos
PCM
Fixed Separate
0.303 0.302
0.304 0.303
0.301 0.302
0.327 0.328
0.318 0.317
0.318 0.318
0.307 0.309
0.333 0.335
GPCM
Fixed Separate
0.228 0.245
0.230 0.247
0.316 0.321
0.320 0.329
0.261 0.275
0.263 0.277
0.329 0.332
0.332 0.339
Results: ELPA Program District-Level Value-Added
Outcomes
23
 Correlations between Listening and Writing VA
Writing
No Demos
Demos
Listening
No Demos
Fixed
Separate
Fixed
GPCM
Separate
Fixed
PCM
Separate
Fixed
GPCM
Separate
PCM
PCM
Fixed Separate
0.358 0.359
0.359 0.360
0.403 0.403
0.368 0.368
0.362 0.362
0.363 0.364
0.395 0.395
0.364 0.364
 Min = 0.342, Max = 0.420
 Mean = 0.373, SD = 0.019
GPCM
Fixed Separate
0.369 0.366
0.370 0.367
0.420 0.412
0.383 0.376
0.373 0.371
0.374 0.372
0.410 0.405
0.378 0.373
Demos
PCM
Fixed Separate
0.342 0.343
0.343 0.344
0.385 0.385
0.354 0.355
0.361 0.362
0.362 0.363
0.397 0.397
0.365 0.365
GPCM
Fixed Separate
0.353 0.353
0.354 0.354
0.401 0.396
0.370 0.364
0.372 0.371
0.374 0.372
0.412 0.407
0.379 0.374
Results: ELPA Program District-Level Value-Added
Outcomes
24
 Correlations between Listening and Speaking VA
Speaking
No Demos
Demos
Listening
No Demos
Fixed
Separate
Fixed
GPCM
Separate
Fixed
PCM
Separate
Fixed
GPCM
Separate
PCM
PCM
Fixed Separate
0.002 0.002
0.004 0.004
0.068 0.068
0.051 0.051
-0.005 -0.005
-0.004 -0.004
0.065 0.065
0.047 0.047
 Min = -0.005, Max = 0.108
 Mean = 0.046, SD = 0.035
GPCM
Fixed Separate
0.026 0.026
0.028 0.028
0.102 0.102
0.080 0.080
0.025 0.025
0.027 0.027
0.097 0.097
0.076 0.076
Demos
PCM
Fixed Separate
0.005 0.005
0.007 0.007
0.081 0.081
0.061 0.061
0.001 0.001
0.002 0.002
0.075 0.075
0.056 0.056
GPCM
Fixed Separate
0.032 0.032
0.033 0.033
0.108 0.108
0.086 0.086
0.028 0.028
0.029 0.029
0.101 0.101
0.080 0.080
Results: ELPA Program District-Level Value-Added
Outcomes
25
 Correlations between Reading and Writing VA
Writing
No Demos
Demos
Reading
No Demos
Fixed
Separate
Fixed
GPCM
Separate
Fixed
PCM
Separate
Fixed
GPCM
Separate
PCM
PCM
Fixed Separate
0.389 0.390
0.392 0.393
0.466 0.464
0.455 0.454
0.365 0.365
0.369 0.369
0.453 0.450
0.440 0.438
 Min = 0.335, Max = 0.491
 Mean = 0.412, SD = 0.047
GPCM
Fixed Separate
0.393 0.386
0.396 0.389
0.480 0.466
0.468 0.455
0.370 0.365
0.374 0.369
0.465 0.454
0.452 0.442
Demos
PCM
Fixed Separate
0.335 0.336
0.338 0.339
0.442 0.440
0.420 0.419
0.374 0.374
0.379 0.379
0.478 0.476
0.464 0.462
GPCM
Fixed Separate
0.341 0.338
0.344 0.341
0.455 0.443
0.432 0.422
0.379 0.372
0.384 0.377
0.491 0.477
0.476 0.461
Results: ELPA Program District-Level Value-Added
Outcomes
26
 Correlations between Reading and Speaking VA
Speaking
No Demos
Demos
Reading
No Demos
Fixed
Separate
Fixed
GPCM
Separate
Fixed
PCM
Separate
Fixed
GPCM
Separate
PCM
PCM
Fixed Separate
0.121 0.121
0.122 0.122
0.129 0.129
0.134 0.134
0.122 0.122
0.125 0.125
0.163 0.163
0.162 0.162
 Min = 0.121, Max = 0.205
 Mean = 0.151, SD = 0.026
GPCM
Fixed Separate
0.132 0.132
0.134 0.134
0.174 0.174
0.172 0.172
0.136 0.136
0.139 0.139
0.205 0.205
0.199 0.199
Demos
PCM
Fixed Separate
0.131 0.131
0.132 0.132
0.152 0.152
0.154 0.154
0.125 0.125
0.128 0.128
0.171 0.171
0.168 0.168
GPCM
Fixed Separate
0.136 0.136
0.138 0.138
0.179 0.179
0.177 0.177
0.134 0.134
0.138 0.138
0.203 0.203
0.197 0.197
Results: ELPA Program District-Level Value-Added
Outcomes
27
 Correlations between Speaking and Writing VA
Writing
No Demos
Demos
Speaking
No Demos
Fixed
Separate
Fixed
GPCM
Separate
Fixed
PCM
Separate
Fixed
GPCM
Separate
PCM
PCM
Fixed Separate
0.151 0.150
0.151 0.150
0.207 0.205
0.207 0.205
0.173 0.172
0.173 0.172
0.216 0.215
0.216 0.215
 Min = 0.150, Max = 0.246
 Mean = 0.199, SD = 0.029
GPCM
Fixed Separate
0.169 0.180
0.169 0.180
0.225 0.236
0.225 0.236
0.192 0.202
0.192 0.202
0.235 0.246
0.235 0.246
Demos
PCM
Fixed Separate
0.158 0.157
0.158 0.157
0.209 0.208
0.209 0.208
0.167 0.165
0.167 0.165
0.212 0.210
0.212 0.210
GPCM
Fixed Separate
0.180 0.189
0.180 0.189
0.231 0.240
0.231 0.240
0.189 0.197
0.189 0.197
0.233 0.243
0.233 0.243
Results: ELPA Program District-Level Value-Added
Outcomes
28
 Impact of choice of psychometric model
Correlations
3-Category
Consistency
4-Category
Consistency
Content Area
Reading
Writing
Listening
Speaking
No Demos
Fixed
Sep
0.837
0.900
0.988
0.987
0.929
0.945
0.975
0.975
Demos
Fixed
Sep
0.834
0.887
0.988
0.986
0.942
0.955
0.980
0.980
min
max
mean
SD
0.834
0.988
0.943
0.052
Content Area
Reading
Writing
Listening
Speaking
No Demos
Fixed
Sep
0.973
0.982
0.996
0.991
0.987
0.987
0.964
0.964
Demos
Fixed
Sep
0.978
0.987
0.996
0.996
0.982
0.987
0.969
0.969
min
max
mean
SD
0.964
0.996
0.982
0.011
Content Area
Reading
Writing
Listening
Speaking
No Demos
Fixed
Sep
0.567
0.634
0.902
0.866
0.728
0.728
0.795
0.795
Demos
Fixed
Sep
0.580
0.634
0.920
0.893
0.768
0.754
0.839
0.839
min
max
mean
SD
0.567
0.920
0.765
0.113
Results: ELPA Program District-Level Value-Added
Outcomes
29
 Impact of Including/Not Including Demographics
PCM
Correlations
Content Area
Reading
Writing
Listening
Speaking
Fixed
0.915
0.978
0.982
0.993
GPCM
Sep
0.915
0.978
0.982
0.993
Fixed
0.931
0.979
0.980
0.997
PCM
3-Category
Consistency
Content Area
Reading
Writing
Listening
Speaking
Fixed
0.991
0.987
0.991
0.991
4-Category
Consistency
Fixed
0.808
0.830
0.924
0.902
min
max
mean
SD
0.915
0.997
0.969
0.030
Sep
0.982
0.973
0.982
0.996
min
max
mean
SD
0.973
0.996
0.988
0.006
Sep
0.741
0.839
0.915
0.911
min
max
mean
SD
0.741
0.924
0.859
0.060
GPCM
Sep
0.987
0.987
0.991
0.991
Fixed
0.987
0.987
0.987
0.996
PCM
Content Area
Reading
Writing
Listening
Speaking
Sep
0.922
0.982
0.981
0.997
GPCM
Sep
0.817
0.821
0.911
0.902
Fixed
0.750
0.848
0.911
0.911
Results
30
MEAP Mathematics
Results: MEAP Math Student-Level Outcomes
31
 Correlations among variables based on psychometric
decisions
3-PL 1-PL 3-PL 1-PL
Number &
Operations
Algebra
Grade 7 above
diagonal/Grade 8
below
Fixed
Sep
Fixed
Sep
Fixed
Sep
Fixed
Sep
Algebra
1-PL
Fixed
Sep
1.000
1.000
0.900 0.901
0.891 0.893
0.684 0.685
0.684 0.685
0.670 0.671
0.667 0.668
3-PL
Fixed
Sep
0.943 0.941
0.943 0.941
0.996
0.984
0.677 0.666
0.676 0.665
0.691 0.682
0.688 0.679
Number & Operations
1-PL
3-PL
Fixed
Sep Fixed
Sep
0.775 0.775 0.775 0.743
0.775 0.775 0.775 0.742
0.748 0.748 0.748 0.751
0.746 0.745 0.746 0.748
1.000 1.000 0.941
1.000
1.000 0.941
0.936 0.935
0.941
0.935 0.934 0.998
-
Results: MEAP Math Student-Level Outcomes
32
 Very high correlations based on fixed versus separate
calibrations
3-PL 1-PL 3-PL 1-PL
Number &
Operations
Algebra
Grade 7 above
diagonal/Grade 8
below
Fixed
Sep
Fixed
Sep
Fixed
Sep
Fixed
Sep
Algebra
1-PL
Fixed
Sep
1.000
1.000
0.900 0.901
0.891 0.893
0.684 0.685
0.684 0.685
0.670 0.671
0.667 0.668
3-PL
Fixed
Sep
0.943 0.941
0.943 0.941
0.996
0.984
0.677 0.666
0.676 0.665
0.691 0.682
0.688 0.679
Number & Operations
1-PL
3-PL
Fixed
Sep Fixed
Sep
0.775 0.775 0.775 0.743
0.775 0.775 0.775 0.742
0.748 0.748 0.748 0.751
0.746 0.745 0.746 0.748
1.000 1.000 0.941
1.000
1.000 0.941
0.936 0.935
0.941
0.935 0.934 0.998
-
Results: MEAP Math Student-Level Outcomes
33
 Very high correlations based on fixed versus separate
calibrations
3-PL 1-PL 3-PL 1-PL
Number &
Operations
Algebra
Grade 7 above
diagonal/Grade 8
below
Fixed
Sep
Fixed
Sep
Fixed
Sep
Fixed
Sep
Algebra
1-PL
Fixed
Sep
1.000
1.000
0.900 0.901
0.891 0.893
0.684 0.685
0.684 0.685
0.670 0.671
0.667 0.668
3-PL
Fixed
Sep
0.943 0.941
0.943 0.941
0.996
0.984
0.677 0.666
0.676 0.665
0.691 0.682
0.688 0.679
Number & Operations
1-PL
3-PL
Fixed
Sep Fixed
Sep
0.775 0.775 0.775 0.743
0.775 0.775 0.775 0.742
0.748 0.748 0.748 0.751
0.746 0.745 0.746 0.748
1.000 1.000 0.941
1.000
1.000 0.941
0.936 0.935
0.941
0.935 0.934 0.998
-
Results: MEAP Math Student-Level Outcomes
34
 Not as high correlations based on 1-PL versus 3-PL
calibrations
3-PL 1-PL 3-PL 1-PL
Number &
Operations
Algebra
Grade 7 above
diagonal/Grade 8
below
Fixed
Sep
Fixed
Sep
Fixed
Sep
Fixed
Sep
Algebra
1-PL
Fixed
Sep
1.000
1.000
0.900 0.901
0.891 0.893
0.684 0.685
0.684 0.685
0.670 0.671
0.667 0.668
3-PL
Fixed
Sep
0.943 0.941
0.943 0.941
0.996
0.984
0.677 0.666
0.676 0.665
0.691 0.682
0.688 0.679
Number & Operations
1-PL
3-PL
Fixed
Sep Fixed
Sep
0.775 0.775 0.775 0.743
0.775 0.775 0.775 0.742
0.748 0.748 0.748 0.751
0.746 0.745 0.746 0.748
1.000 1.000 0.941
1.000
1.000 0.941
0.936 0.935
0.941
0.935 0.934 0.998
-
Results: MEAP Math Student-Level Outcomes
35
 Moderate to high correlations across dimensions
3-PL 1-PL 3-PL 1-PL
Number &
Operations
Algebra
Grade 7 above
diagonal/Grade 8
below
Fixed
Sep
Fixed
Sep
Fixed
Sep
Fixed
Sep
Algebra
1-PL
Fixed
Sep
1.000
1.000
0.900 0.901
0.891 0.893
0.684 0.685
0.684 0.685
0.670 0.671
0.667 0.668
3-PL
Fixed
Sep
0.943 0.941
0.943 0.941
0.996
0.984
0.677 0.666
0.676 0.665
0.691 0.682
0.688 0.679
Number & Operations
1-PL
3-PL
Fixed
Sep Fixed
Sep
0.775 0.775 0.775 0.743
0.775 0.775 0.775 0.742
0.748 0.748 0.748 0.751
0.746 0.745 0.746 0.748
1.000 1.000 0.941
1.000
1.000 0.941
0.936 0.935
0.941
0.935 0.934 0.998
-
Results: MEAP Mathematics School-Level Value-Added
Outcomes
36
Correlations
1 pre-test covariate
No Demos
Demos
Content Area
1-PL 3-PL
1-PL 3-PL
Algebra
1.000 0.995
1.000 0.992
Number & Operations 1.000 0.977
1.000 0.956
2 pre-test covariates
No Demos
Demos
1-PL 3-PL
1-PL 3-PL
1.000 0.985
1.000 0.985
1.000 0.988
1.000 0.983
3-Cat
Consistency
1 pre-test covariate
No Demos
Demos
Content Area
1-PL 3-PL
1-PL 3-PL
Algebra
0.989 0.968
0.987 0.973
Number & Operations 0.989 0.923
0.994 0.935
2 pre-test covariates
No Demos
Demos
1-PL 3-PL
1-PL 3-PL
0.987 0.935
0.989 0.960
0.990 0.946
0.989 0.966
4-Cat
Consistency
 Impact of fixed versus separate calibration
1 pre-test covariate
No Demos
Demos
Content Area
1-PL 3-PL
1-PL 3-PL
Algebra
0.995 0.926
0.993 0.883
Number & Operations 0.989 0.827
0.984 0.712
2 pre-test covariates
No Demos
Demos
1-PL 3-PL
1-PL 3-PL
0.992 0.856
0.986 0.848
0.993 0.875
0.983 0.817
Results: MEAP Mathematics School-Level Value-Added
Outcomes
37
Correlations
1 pre-test covariate
Multidimensional
No Demos
Demos
Calibration Type
1-PL 3-PL
1-PL
3-PL
Fixed Parameter 0.548 0.608
0.361 0.391
Separate
0.549 0.649
0.366 0.436
2 pre-test covariates
No Demos
Demos
1-PL
3-PL
1-PL
3-PL
0.652 0.697
0.574 0.609
0.653 0.711
0.576 0.614
3-Cat
Consistency
1 pre-test covariate
Multidimensional
No Demos
Demos
Calibration Type
1-PL 3-PL
1-PL
3-PL
Fixed Parameter 0.637 0.667
0.649 0.703
Separate
0.637 0.691
0.650 0.726
2 pre-test covariates
No Demos
Demos
1-PL
3-PL
1-PL
3-PL
0.703 0.751
0.716 0.774
0.705 0.749
0.713 0.784
4-Cat
Consistency
 Impact of choice of outcome (Algebra vs. Number)
1 pre-test covariate
Multidimensional
No Demos
Demos
Calibration Type
1-PL 3-PL
1-PL
3-PL
Fixed Parameter 0.399 0.424
0.322 0.337
Separate
0.397 0.429
0.322 0.350
2 pre-test covariates
No Demos
Demos
1-PL
3-PL
1-PL
3-PL
0.447 0.475
0.404 0.412
0.444 0.484
0.405 0.436
Results: MEAP Mathematics School-Level Value-Added
Outcomes
38
Correlations
1 pre-test covariate
Multidimensional
No Demos
Demos
Calibration Type
Alg
Num
Alg
Num
Fixed Parameter 0.939 0.963
0.883 0.934
Separate
0.938 0.962
0.876 0.937
2 pre-test covariates
No Demos
Demos
Alg
Num
Alg
Num
0.918 0.961
0.925 0.962
0.925 0.962
0.873 0.938
3-Cat
Consistency
1 pre-test covariate
Multidimensional
No Demos
Demos
Calibration Type
Alg
Num
Alg
Num
Fixed Parameter 0.890 0.901
0.851 0.912
Separate
0.886 0.907
0.841 0.918
2 pre-test covariates
No Demos
Demos
Alg
Num
Alg
Num
0.867 0.921
0.837 0.915
0.876 0.918
0.839 0.915
4-Cat
Consistency
 Impact of choice of psychometric model
1 pre-test covariate
Multidimensional
No Demos
Demos
Calibration Type
Alg
Num
Alg
Num
Fixed Parameter 0.732 0.763
0.611 0.673
Separate
0.717 0.775
0.604 0.685
2 pre-test covariates
No Demos
Demos
Alg
Num
Alg
Num
0.679 0.773
0.602 0.677
0.701 0.770
0.610 0.670
Results: MEAP Mathematics School-Level Value-Added
Outcomes
39
Correlations
1 pre-test covariate
Multidimensional
1-PL
3-PL
Calibration Type
Alg
Num
Alg
Num
Fixed Parameter 0.964 0.815
0.813 0.717
Separate
0.962 0.819
0.806 0.780
2 pre-test covariates
1-PL
3-PL
Alg
Num
Alg
Num
0.984 0.822
0.895 0.775
0.983 0.825
0.877 0.793
3-Cat
Consistency
1 pre-test covariate
Multidimensional
1-PL
3-PL
Calibration Type
Alg
Num
Alg
Num
Fixed Parameter 0.880 0.772
0.771 0.713
Separate
0.875 0.767
0.774 0.724
2 pre-test covariates
1-PL
3-PL
Alg
Num
Alg
Num
0.928 0.774
0.841 0.771
0.927 0.775
0.831 0.756
4-Cat
Consistency
 Impact of Including/Not Including Demographics
1 pre-test covariate
Multidimensional
1-PL
3-PL
Calibration Type
Alg
Num
Alg
Num
Fixed Parameter 0.775 0.551
0.572 0.464
Separate
0.774 0.556
0.544 0.522
2 pre-test covariates
1-PL
3-PL
Alg
Num
Alg
Num
0.864 0.557
0.646 0.508
0.858 0.552
0.635 0.547
Results: MEAP Mathematics School-Level Value-Added
Outcomes
40
Correlations
No Demographics
Multidimensional
1-PL
3-PL
Calibration Type
Alg
Num
Alg
Num
Fixed Parameter 0.937 0.965
0.923 0.964
Separate
0.937 0.965
0.937 0.962
Includes Demographics
1-PL
3-PL
Alg
Num
Alg
Num
0.941 0.947
0.930 0.951
0.941 0.948
0.941 0.942
3-Cat
Consistency
No Demographics
Multidimensional
1-PL
3-PL
Calibration Type
Alg
Num
Alg
Num
Fixed Parameter 0.855 0.884
0.851 0.889
Separate
0.859 0.889
0.878 0.883
Includes Demographics
1-PL
3-PL
Alg
Num
Alg
Num
0.889 0.918
0.872 0.744
0.885 0.922
0.885 0.755
4-Cat
Consistency
 Impact of covarying on one vs. two pre-test scores
No Demographics
Multidimensional
1-PL
3-PL
Calibration Type
Alg
Num
Alg
Num
Fixed Parameter 0.734 0.764
0.696 0.753
Separate
0.729 0.768
0.727 0.754
Includes Demographics
1-PL
3-PL
Alg
Num
Alg
Num
0.715 0.687
0.704 0.713
0.716 0.693
0.714 0.698
Conclusions
41
 Practically important impacts on value-added
metrics and value-added classifications



Choice of psychometric model
Including/not including demographics
Including/not including multiple pre-test values
 Prohibitive impacts on value-added metrics and
value-added classifications

Choice of outcome (i.e., domain within construct)
 Practically negligible impacts on value-added metrics
and value-added classifications

Separate versus fixed calibrations of domains within construct
Conclusions, continued…
42
 Need to pay attention to modeling domains within
constructs if constructs can reasonably be considered
multidimensional


Of the common psychometric and statistical modeling decisions one
can make, the choice of which subscore to use as an outcome is the
most influential
Because subscores give different profiles of both student
achievement and program/school value-added, each subscore should
be modeled to the degree possible
 4-category (i.e., quartile) classifications on value-added
are appreciably impacted by every psychometric and
statistical modeling choice evaluated here, but 3-category
classifications are not


Discourage more than three categories
RTTT requires at least four categories
Conclusions, continued…
43
 3- vs. 4-category distinction is actually a proxy for
 Statistical decision categories (3-categories)
 Arbitrary cut point categories (4-categories)
 Can leverage unidimensional calibrations of
multidimensional achievement scales to create
multidimensional profiles of value-added

Except where using four categories of classifications
Limitations
44
 Inductive reasoning
 Results are likely to hold in similar circumstances
 Still will need to investigate feasibility of using fixed
parameters from unidimensional calibration for specific
circumstances if those circumstances are high stakes
 This is a proof of concept
 PCM and GPCM models were run using different
software (WINSTEPS vs. PARSCALE)
Contact Information
45
 Joseph A. Martineau, Ph.D.
 Executive Director
 Bureau of Assessment & Accountability
 Michigan Department of Education
 [email protected]
 Ji Zeng, Ph.D.
 Psychometrician
 Bureau of Assessment & Accountability
 Michigan Department of Education
 [email protected]

similar documents