Assessment in the Service of Learning:
the roles and design of highstakes tests
Hugh Burkhardt
MARS: Mathematics Assessment Resource Service
Shell Center, University of Nottingham and UC Berkeley
Oakland Schools, October 2012
Structure of this talk
A word on the Common Core
The roles of assessment
Tasks and tests
Task difficulty and levels of understanding
Computer-based testing
Testing can be designed to serve learning.
Content: Getting Richer
Practices: Much deeper and richer
The Practices in CCSS-M:
• Make sense of problems and persevere in solving
• Reason abstractly and quantitatively.
• Construct and critique viable arguments
• Model with mathematics
• Use appropriate tools strategically
• Attend to Precision
• Look for and make use of structure
• Look for and express regularity in repeated
The Roles of Assessment
The traditional view:
• Tests are “just measurement”, “valid” and
“reliable” (if a little strange looking)
Reality: tests have three roles:
• Measuring a few aspects of math-related
• Defining the goals by which students and
teachers are judged
• Driving the curriculum
This implies a huge responsibility on those who
High-stakes assessment implicitly
• Exemplifies performance objectives
• For most teachers, and the public, test tasks are
assumed to exemplify the standards – so they
effectively replace them
• Determines the pattern of teaching and learning
• FACT: most teachers ‘teach to the test’
perfectly reasonable “bottom line”
Taking the standards seriously implies designing
tests that meet them: “Tests worth teaching to”
that enable all students to show what they can do
Mathematical Practices
“Proficient students expect mathematics to make sense.
They take an active stance in solving mathematical problems.
When faced with a non-routine problem, they have the courage
to plunge in and try something, and
they have the procedural and conceptual tools to carry through.
They are experimenters and inventors, and can adapt known
strategies to new problems.
They think strategically”.
How far do our current tests assess this? Not far?
Tasks and Tests
Levels of mathematical expertise
It is useful to distinguish task levels, showing increasing
emphasis on mathematical practices.
• Novice Tasks
Short items, each focused on a specific concept or skill, as
set out in the standards cf ELA spelling, grammar
• Apprentice Tasks
Rich tasks with scaffolding, structured so that students
are guided through a “ramp” of increasing challenge
• Expert Tasks
Rich tasks in a form they might naturally arise – in the
real world or in pure mathematics cf ELA writing
Task examples
Some Expert Tasks
Tasks that are not predigested.
Problems as they might arise:
in the world outside the math classroom
in really doing math
Expert Tasks
Traffic Jam
1. Last Sunday an accident caused a traffic jam 11 miles
long on a two lane highway.
How many cars do you think were in the traffic jam?
Explain your thinking and show all your calculations.
Write down any assumptions you make.
(Note: a mile is approximately equal to 5,000 feet.)
2. When the accident was cleared, the cars drove away from
the front, one car from each of the lanes every two
seconds. Estimate how long it took before the last car
Airplane turnaround
• How quickly could they do it?
Ponzi Pyramid Schemes
Max has just received this email
From: A. Crook
To: B. Careful
Do you want to get rich quick?
Just follow the instructions carefully below
and you may never need to work again:
1. Below there are 8 names and addresses.
Send $5 to the name at the top of this list.
2. Delete that name and add your own name and
address at the bottom of the list.
3. Send this email to 5 new friends.
Ponzi continued
• If that process goes as planned, how much money
would be sent to Max?
• What could possibly go wrong? Explain your
answer clearly.
• Why do they make Ponzi schemes like this illegal?
This task involves
Formulating the problem mathematically
Understanding exponential growth
Knowing it can’t go on for ever, and why
(3, 4, 5), (5, 12, 13), (7, 24, 25) and (9, 40, 41)
satisfy the condition that natural numbers (a, b, c)
are related by c2= a2+ b2
• Investigate the relationships between the lengths of the
sides of triangles which belong to this set
• Use these relationships to find the numerical values of at
least two further Pythagorean Triples which belong to this
• Investigate rules for finding the perimeter and area of
triangles which belong to this set when you know the
length of the shortest side.
Which sport? task from “the literature”, 1982
Which sport will give a graph like this?
Describe in detail how your answer fits the graph – as in a
radio commentary
Table tiles
Maria makes square tables, then sticks
tiles to the top.
Square tables have sides that are
multiples of 10 cm.
Maria uses quarter tiles at the corners
and half tiles along edges.
How many tiles of each type are needed
for a 40 cm x 40 cm square?
Describe a method for quickly
calculating how many tiles of each type
are needed for larger, square table tops.
Apprentice tasks
• Expert tasks with added scaffolding to:
• ease entry
• reduce strategic demand
• Ramp of difficulty within the task, with increasing:
• complexity
• abstraction
• demand for explanation
Balanced Assessment in Mathematics (BAM) tests
are of this kind – complementing state test (novice)
Apprentice tasks: design
Guide students through a ramp of challenge
“Patchwork” gives:
• Multiple examples that ease understanding
• Specific numerical cases to explore – counting
• A helpful representation – the table
only then
• Asks for a generalization – rule, formula
• Presents an inverse problem
A step in growing expertise: “climbing with a guide”
Task Difficulty
The difficulty of a task depends on various factors:
• Complexity
• Unfamiliarity
• Technical demand
• Autonomy expected of the student
• Expert Tasks fully involve the mathematical practices and all four
aspects, so must not be too technically demanding
• Apprentice Tasks involve the mathematical practices at a
modest level, with little student autonomy
• Novice Tasks present mainly technical demand, so this can be
“up to grade”, including concepts and skills just learnt
Levels of understanding
Explanation chains of reasoning (2nd sentence? )
Adaptation requires non-routine problems
Extension offer opportunities
Jean Piaget
The Practices in CCSS-M:
• Make sense of problems and persevere in solving
• Reason abstractly and quantitatively.
• Construct and critique viable arguments
• Model with mathematics
• Use appropriate tools strategically
• Attend to Precision
• Look for and make use of structure
• Look for and express regularity in repeated
These haven’t been
a focus of testing …
but they will be – maybe
Smarter Balanced Assessment Consortium
(Just google SBAC)
Here are some of the headlines.
SMARTER Balanced “content spec”
• Claim #1 Concepts & Procedures “Students can explain and apply
mathematical concepts and interpret and carry out mathematical
procedures with precision and fluency.”
• Claim #2 Problem Solving “Students can solve a range of complex
well-posed problems in pure and applied mathematics, making
productive use of knowledge and problem solving strategies.”
• Claim #3 Communicating Reasoning “Students can clearly and
precisely construct viable arguments to support their own
reasoning and to critique the reasoning of others.”
• Claim #4 Modeling and Data Analysis “Students can analyze
complex, real-world scenarios and can construct and use
mathematical models to interpret and solve problems.”
PARCC so far seems less specific; mainly CCSSM content standards
Total Score for Mathematics
Content and Procedures Score
Grade 3 C&P Sub-scores
Operations & Algebraic Thinking
Number/Ops – Fractions
Measurement & Data
Grade 4 C&P Sub-scores
Operations & Algebraic Thinking
Number/Ops – Base 10
Number/Ops – Fractions
Measurement & Data
Grade 5 C&P Sub-scores
Number/Ops – Base 10
Number/Ops – Fractions
Measurement & Data
Grade 6 C&P Sub-scores
Number System
Ratio & Proportion
Expressions & Equations
Grade 7 C&P Sub-scores
Number System
Ratio & Proportion
Expressions & Equations
Grade 8 C&P Sub-scores
Expressions & Equations
High School C&P Sub-scores
Number & Quantity
Total Score for Mathematics
Content and
Reasoning Score Modeling Score
So: A large part of the exam will be
devoted to things we haven’t tested
but– there is THE CAT
Computer-based testing
Promises of cheap instant adaptive testing
Great strengths and, even after 70 years, weaknesses
Key questions: for rich tasks does CBT provide
• Effective handling of the testing process?
• Better ways for presenting tasks?
• A natural medium for students to work on math?
• Effective ways to capture a student’s reasoning?
• Reliable ways to score a student’s response?
• Effective ways for collecting and reporting results?
Computer-based testing: summary
Best way to manage high-stakes testing
Fine on its own for Novice level tasks (short items)
Expert and Apprentice tasks essentially involve:
• long chains of autonomous student reasoning
• sketching and doodling: diagrams, numbers, equations
This needs
• image capture (paper, scan, or ? off tablet screen)
• human scoring (on screen) responses too diverse for computer
Can improve testing in various ways; for analysis see
Educational Designer lead article in Issue 5, out soon
SBAC test structure
Three components planned:
• CAT: computer-adaptive on-line test
• “set of rich constructed response items”
• “a classroom-based performance task”
(up to 2 periods)
Task types: extended examples in content spec
PARCC also has “end of course” CAT +
• “periodic assessments” during year – nature open
to “creative input” by educators and vendors
some impressions and comments
plans and challenges
Some seem desperate to stick with Computerbased testing
Here’s a sample PARCC “modeling” item.
Madlibs on a math test?
Think about WYTIWYG!
20 days of test prep – playing
math-related video games
Cost-effective human scoring?
Some standard approaches
• on-screen professional scorers
• on-screen trained teacher-scorers
• get-together training-scoring meetings
Factors to be weighed (cf ~ $2,000 per year)
marginal cost
consistency (“reliability”), with monitoring
professional development gain
non-productive test prep class time saved
Needs collaboration at system level: math; assess; PD
Some comments
• Realising these goals takes people outside their
comfort zone, particularly in the design of:
o rich tasks that work well with kids in exams
o implementation mechanisms that work smoothly
o processes that will have public credibility
• Cost containment requires some integration of
curriculum, tests and professional development
• Lots of experience, worldwide and some US, using:
o “literature” of rich tasks
o modes of teacher involvement (eg SVMI)
is inevitable from:
• fear of time, cost, litigation, …. anything new
• psychometric tradition and habit:
o testing is “just measurement”
o focus on statistics, ignoring systematic error ie not
measuring what you’re interested in
• overestimating in-house expertise
(principles fine; tasks often lousy)
Good outcomes will depend on close collaboration
of assessment folk, math folk, outside expertise
But the quality of the tests is crucial
• “It is now widely recognized that high stakes assessments establish
the ceiling with regard to performance expectations in most
classrooms – or, to put it another way, the lower the bar, the lower
people will aim. Accordingly, SBAC will seek to ensure that a
student’s success on its assessments depends on a learning
program that reflects the Common Core State Standards for
Mathematics in a rich and balanced way. This is the nation’s best
chance – for the next decade at least – to move the system in the
right directions.”
• “Because of the high stakes testing “ceiling effect” described above,
credibly assigned scores on performance tasks will need to be a
major part of the score reporting.”
From earlier draft of SBAC content specs
Structure of this talk
A word on the Common Core
The roles of assessment
Tasks and tests
Task difficulty and levels of understanding
Computer-based testing
Testing can be designed to serve learning.
Thank you
[email protected]
Lessons and tasks:
also ISDDE report on assessment

similar documents