PowerPoint: The MAP test

“You can attach a meaningless number
to anything.
Even a child.”
The graphs & statistical analysis for this presentation came from
This graph shows the most precise results obtained from MAP
tests in the classroom of an expert teacher. Yet, 10 out of 38 of
the growth scores (26%) fall outside a reasonable range.
Further, scores did not match the student performance that the
teacher was observing in the classroom.
• MAP is based on the assumption that all
students learn information in the same order,
and all schools teach information in the same
• The RIT scores
(RIT score = the final number
assigned to each student by NWEA’s MAP test)
are no more accurate than these assumptions.
What’s RIT ?
loaded question…
That’s a
IRT is Item Response Theory.
NWEA uses a specific IRT model conceived by Danish
mathematician, Georg Rasch, (1901-1980).
They’ve apparently rebranded “IRT” as “RIT”.
It’s proprietary….
• NWEA claims that
Characteristics of the RIT Scale include:
It is an achievement scale.
It is an accurate scale.
It is an equal interval scale.
It helps to measure growth over time.
It has the same meaning regardless of grade or age of the student.
• MAP levels are based on academic levels
(curriculum sequencing), not cognitive levels
(Bloom's Taxonomy).
• A high RIT score can represent low level
cognition in advanced curriculum material.
• Thus, MAP is not a good measure for gifted
performance, and should not be used to guide
decisions for gifted programs. (as it has been
repurposed in Seattle Public Schools)
• MAP does not provide information as to why
the student is having difficulty with specific
• MAP does not even provide specific
information as to what material students need
work on.
• As a result, MAP does not provide useful
information that can guide teachers in
meeting student's real needs.
• Teachers don’t see the test, and the test isn’t
tied to their classes or curricula
• All decisions based on test results should
consider the 2-sigma range. (Greek letter sigma is
a standard statistical result which describes the
size of measurement error within a set of
measurements. A low sigma represents more
believable data. As astronomers say, ”The road to
hell is paved with 3-sigma.”)
• NWEA reports sigma to be about 3 (except for
high RIT scores where sigma is even larger.)
• Thus, all decisions should consider the possibility
that each reported score could have been 6 point
higher, or 6 points lower.
• Many schools that have retested with MAP have
seen evidence that sigma is actually larger than 3.
• RIT scores are actually determined by averaging
together strand scores.
• Since a typical MAP test has 50 questions and 5
strands, only 10 questions were asked for each
• Uncertainty for a 10 question multiple choice test
is rather large.
• This is where the dream of the statistician meets
the reality of the young student. Students can’t
do tests for hours, and schools don’t have
computer resources for students to do tests all
the time. So you ask just a few questions, and you
get results with a lot of, well, sigma.
One means of checking the reliability of a test is to compare its results to another
test. This is particularly important if the goal is to increase the scores on the other
test. Such would be the case for high stakes testing mandated by NCLB.
Below is a comparison of the results of a MAP test to the results of an End of Grade
Test (EOG) required by the state. Reliability was low.
MAP ranked about 5 students a year higher in performance, and 4 students a year
lower in performance, (23% total) than the EOG.
NWEA's technical manual uses r-values to estimate the reliability of the tests. The rvalues they report range from 0.76 to 0.93 with most of the values being between 0.80
and 0.89. But what does this mean for those wishing to use the tests to guide
instruction? One can simulate the r-values to estimate what percentage of scores will
lie within a reliable range for given r-values. An r-value of 0.92 could easily mean that
over 25% of the data lies more than 6 points away from true. However, 6 points
constitutes a year's normal growth for over half of the tests.
As simulated in the graph above, even with an r-value of 0.92, 29% of the data is in error
by more than a year's normal growth. This imprecision would lead to serious errors in
identifying student needs, and then tracking the students incorrectly.
• For high RIT scores (score that are normal for 8th
grade and higher) the SEM (standard error
measurement) is larger than a typical year's
• Thus, MAP is not only unable to identify what
skills a student may need, MAP is unable to
precisely determine the grade level at which
student is performing.
• For very high RIT scores, negative growth is more common
that positive growth.
• This implies that either MAP is totally unreliable for high
performers, or that MAP testing reinforces educational
strategies that are counterproductive for high achievers.
So, if the MAP test is not so useful,
what should teachers look for ?
• If your school is considered adopting a standardized
test, quiz the salesperson to demonstrate how the
score reporting will help you.
• Give specific common examples from your experience.
"I have students who can solve two-step equations
with minimal difficulty, but frequently confuse the
properties for negatives. How will the test report let
me know what specific help these students need?"
• Be sure the testing company can give good answers to
these questions.
Ask the test salesperson to
demonstrate how the test
high level learning
accelerated knowledge.
formative assessments
that identify
specific strengths
as opposed to tests that rank for
tracking purposes.
• Parents & Teachers: Learn more about testing.
• Do not accept the “results” of any single type of
assessment at face value. All tests have
measurement errors, nowhere more so than in
testing students, particularly when those
students have no stake in the outcome.
• Test results are commonly used inappropriately
by administration for invalid purposes like
student placement and teacher evaluation.
• If you get a bad review based on test results
alone, file a grievance. Insist the administrator
demonstrate that the test was both precise and
accurate enough to justify the results.

similar documents