Formative assessment: Challenges and opportunities

Report
Formative assessment in mathematics:
opportunities and challenges
Dylan Wiliam (@dylanwiliam)
Seminar at Teachers College, Columbia University
October 2013
A research agenda for formative assessment






Definitional issues
Domain-specificity issues
Effectiveness issues
Communication issues
Implementation issues
Adoption issues
3
Definitional issues
The evidence base for formative assessment
4








Fuchs & Fuchs (1986)
Natriello (1987)
Crooks (1988)
Bangert-Drowns, et al. (1991)
Dempster (1991, 1992)
Elshout-Mohr (1994)
Kluger & DeNisi (1996)
Black & Wiliam (1998)








Nyquist (2003)
Brookhart (2004)
Allal & Lopez (2005)
Köller (2005)
Brookhart (2007)
Wiliam (2007)
Hattie & Timperley (2007)
Shute (2008)
Definitions of formative assessment
We use the general term assessment to refer to all those activities
undertaken by teachers—and by their students in assessing
themselves—that provide information to be used as feedback to
modify teaching and learning activities. Such assessment becomes
formative assessment when the evidence is actually used to adapt the
teaching to meet student needs” (Black & Wiliam, 1998 p. 140)
“the process used by teachers and students to recognise and respond
to student learning in order to enhance that learning, during the
learning” (Cowie & Bell, 1999 p. 32)
“assessment carried out during the instructional process for the
purpose of improving teaching or learning” (Shepard et al., 2005 p.
275)
“Formative assessment refers to frequent, interactive
assessments of students’ progress and understanding to identify
learning needs and adjust teaching appropriately” (Looney,
2005, p. 21)
“A formative assessment is a tool that teachers use to measure
student grasp of specific topics and skills they are teaching. It’s a
‘midstream’ tool to identify specific student misconceptions and
mistakes while the material is being taught” (Kahl, 2005 p. 11)
“Assessment for Learning is the process of seeking and interpreting
evidence for use by learners and their teachers to decide where the
learners are in their learning, where they need to go and how best to
get there” (Assessment Reform Group, 2002 pp. 2-3)
“Assessment for learning is any assessment for which the first priority
in its design and practice is to serve the purpose of promoting
students’ learning. It thus differs from assessment designed primarily
to serve the purposes of accountability, or of ranking, or of certifying
competence. An assessment activity can help learning if it provides
information that teachers and their students can use as feedback in
assessing themselves and one another and in modifying the teaching
and learning activities in which they are engaged. Such assessment
becomes “formative assessment” when the evidence is actually used
to adapt the teaching work to meet learning needs.” (Black, Harrison,
Lee, Marshall & Wiliam, 2004 p. 10)
Theoretical questions
8

Need for clear definitions
 So

that research outcomes are commensurable
Theorization and definition
 Possible
variables
 Category
(instruments, outcomes, functions)
 Beneficiaries (teachers, learners)
 Timescale (months, weeks, days, hours, minutes)
 Consequences (outcomes, instruction, decisions)
 Theory of action (what gets formed?)
Formative assessment: a new definition
“An assessment functions formatively to the extent that
evidence about student achievement elicited by the
assessment is interpreted and used, by teachers,
learners, or their peers, to make decisions about the next
steps in instruction that are likely to be better, or better
founded, than the decisions that would have been taken
in the absence of that evidence.”
Unpacking formative assessment
10
Where the
learner is going
Teacher
Peer
Learner
Clarifying,
sharing and
understanding
learning
intentions
Where the learner is How to get there
Providing
Engineering effective
discussions, tasks, and feedback that
moves learners
activities that elicit
forward
evidence of learning
Activating students as learning
resources for one another
Activating students as owners
of their own learning
Definitional issues: potential research

How can formative assessment be
defined and what are the
consequences of different definitions,
for psychometrics, for communication,
and for adoption?
Domain specificity issues
Pedagogy and didactics



Some aspects of formative assessment are generic
Some aspects of formative assessment are
domain-specific
There is a continuing debate about what aspects of
formative assessment are generic (pedagogy) and
which are domain-specific (didactics)
14
Clarifying, sharing and
understanding learning intentions
A standard middle school math problem…



Two farmers have adjoining fields
with a common boundary that is not
straight.
This is inconvenient for plowing.
How can they divide the two
fields so that the boundary
is straight, but the two
fields have the
same area as
they had before?
How many rectangles?
m ( m -1) n ( n -1)
´
2
2
Engineering effective discussions,
activities, and classroom tasks that elicit
evidence of learning
Questioning in math: Diagnosis
20
In which of these right-angled triangles is a2 + b2 = c2 ?
A
b
a
B
a
c
C
b
a
b
D
c
c
b
c
E
c
a
a
b
F
b
c
a
Diagnostic item: medians
What is the median for the following data set?
38
a.
b.
c.
d.
e.
f.
g.
74
22
44
96
22
22
38 and 44
41
46
70
77
This data set has no median
19
53
Diagnostic item: means
What can you say about the means of the following
two data sets?
Set 1:
Set 2:
10
10
12
12
13
13
15
15
0
A. The two sets have the same mean.
B. The two sets have different means.
C. It depends on whether you choose to count the zero.
Providing feedback that moves
learners forward
Getting feedback right is hard
Response type
Feedback indicates performance…
falls short of goal
exceeds goal
Change behavior
Increase effort
Exert less effort
Change goal
Reduce aspiration
Increase aspiration
Abandon goal
Decide goal is too hard
Decide goal is too easy
Reject feedback
Feedback is ignored
Feedback is ignored
Activating students:
as learning resources for one another
as owners of their own learning
+/–/interesting: responses for “+”
26










I got that ball-park estimates are supposed to be simple
I know that you have to look at it and say “OK”
I know that when I am adding the number I end up with must
be bigger than the one I started at
I get most of the problems
It was easy for me because on the first one it says 328 so I
took the 2 and made it a 12
I know that we would have to regroup
I know how to do plus and minus because we have been
doing it for a long time
I get it when you cross out a number and make it a new one
I know that when you can’t – from both colomes you go to
the third colome and take that from it
I know that when my answer is right the ball park
estimate is close to it
+/–/interesting: responses for “–”
27










I am still a tiny bit confused about subtraction regrouping
I am a little bit confused about ball park estimates
I get confused because sometimes I don’t get the problem
I am confused when you subtract really big numbers like
1,000 something
I’m still a little bit confused about regrouping
Minus is confusing when you have to regroup twice
Minus is a little bit hard when you have to regroup
I don’t understand when you borrow which colome you
borrow from when both are 0
I am still confused about showing what I did to solve the
problem
I am a little confused about when you need to subtract
+/–/interesting: responses for “interesting”
28









Carrying the number over to the next number
It’s interesting how some people go to the nearest hundred
while some go to the nearest ten
It’s interesting how some have to regroup twice
It’s pretty interesting about how you have to work really hard
I am interested in borrowing because I didn’t just get it yet. I
want to really get to know it
I find it weird that you could just keep going from colome to
colome when you need to borrow
On the ball park estimate it is easy but sometimes hard
I really think that regrouping is pretty amazing
It is cool how addition and subtraction regrouping is just
moving numbers and you could get it right easily
Domain-specificity issues: potential research


How much domain-specific knowledge does a
teacher need in order to be able to implement
high-quality formative assessment routines
consistently?
Can domain-specific formative assessment tools be
independent of a particular curriculum?
The effectiveness issue
Effects of formative assessment
Standardized effect size: differences in means, measured
in population standard deviations
Source
Kluger & DeNisi (1996)
Effect size
0.41
Black &Wiliam (1998)
Wiliam et al., (2004)
0.4 to 0.7
0.32
Hattie & Timperley (2007)
Shute (2008)
0.96
0.4 to 0.8
32
Understanding meta-analysis:
“I think you’ll find it’s a bit more
complicated than that” (Goldacre, 2008)
Understanding meta-analysis
33

A technique for aggregating results from different
studies by converting empirical results to a
common measure (usually effect size)
Standardized effect size is defined as:

Problems with meta-analysis

 The
“file drawer” problem
 Variation in population variability
 Selection of studies
 Sensitivity of outcome measures
34
The “file drawer” problem
The importance of statistical power


The statistical power of an experiment is the
probability that the experiment will yield an effect that
is large enough to be statistically significant.
In single-level designs, power depends on
significance level set
 magnitude of effect
 size of experiment


The power of most social studies experiments is low
Psychology:
0.4 (Sedlmeier & Gigerenzer, 1989)
 Neuroscience: 0.2 (Burton et al., 2013)
 Education:
0.4


Only lucky experiments get published…
Variation in variability
Annual growth in achievement, by age
37
1.6
A 50% increase in the
rate of learning for sixyear-olds is equivalent
to an effect size of 0.76
annual growth (SDs)
1.4
1.2
A 50% increase in the
rate of learning for 15year-olds is equivalent
to an effect size of 0.1
1.0
0.8
0.6
0.4
0.2
0.0
5
6
7
8
9
10
11
Age
Bloom, Hill, Black, and Lipsey (2008)
12
13
14
15
16
Variation in variability
38


Studies with younger children will produce larger
effect size estimates
Studies with restricted populations (e.g., children
with special needs, gifted students) will produce
larger effect size estimates
Selection of studies
Feedback in STEM subjects
40


Review of 9000 papers on feedback in
mathematics, science and technology
Only 238 papers retained
 Background
papers
 Descriptive papers
 Qualitative papers
 Quantitative papers
 Mathematics
 Science
 Technology
Ruiz-Primo and Li (2013)
24
79
24
111
60
35
16
Classification of feedback studies
41
1. Who provided the feedback (teacher, peer, self, or technology-based)?
2. How was the feedback delivered (individual, small group, or whole
class)?
3. What was the role of the student in the feedback (provider or
receiver)?
4. What was the focus of the feedback (e.g., product, process, selfregulation for cognitive feedback; or goal orientation, self-efficacy for
affective feedback)
5. On what was the feedback based (student product or process)?
6. What type of feedback was provided (evaluative, descriptive, or
holistic)?
7. How was feedback provided or presented (written, video, oral, or
video)?
8. What was the referent of feedback (self, others, or mastery criteria)?
9. How, and how often was feedback given in the study (one time or
multiple times; with or without pedagogical use)?
Main findings
42
Characteristic of studies included
Maths
Science
Feedback treatment is a single event lasting minutes
85%
72%
Reliability of outcome measures
39%
63%
Validity of outcome measures
24%
3%
Dealing only or mainly with declarative knowledge
12%
36%
9%
0%
14%
17%
Schematic knowledge (e.g., knowing why)
Multiple feedback events in a week
Sensitivity to instruction
Sensitivity of outcome measures
44

Distance of assessment from the curriculum

Immediate


Close


e.g., if an immediate assessment asked students to construct boats
out of paper cups, the proximal assessment would ask for an
explanation of what makes bottles float
Distal


e.g., where an immediate assessment asked about number of
pendulum swings in 15 seconds, a close assessment asks about the
time taken for 10 swings
Proximal


e.g., science journals, notebooks, and classroom tests
e.g., where the assessment task is sampled from a different domain
and where the problem, procedures, materials and measurement
methods differed from those used in the original activities
Remote

standardized national achievement tests.
Ruiz-Primo, Shavelson, Hamilton, and Klein (2002)
Impact of sensitivity to instruction
45
Effect size
Close
Proximal
Effectiveness issues: potential research


Under what kind of conditions does
the implementation of formative
assessment practices in classrooms
lead to student improvement?
What kinds of increases in the rate of
student learning are possible?
Communication issues
Dissemination models






Gas-pump attendant
FedEx
IKEA
Sherpa
Gardener
PhD supervisor
So much for the easy bit…
Ideas
Theorization
Products
Evidence of
impact
Advocacy
Communication issues: potential research

How can the vision of effective formative
assessment practice be communicated to
teachers?
Implementation issues
Hand hygiene in hospitals
Study
Preston, Larson, & Stamm (1981)
Focus
Compliance rate
Open ward
16%
ICU
30%
Albert & Condie (1981)
ICU
28% to 41%
Larson (1983)
All wards
45%
Donowitz (1987)
Pediatric ICU
30%
Graham (1990)
ICU
32%
Dubbert (1990)
ICU
81%
Pettinger & Nettleman (1991)
Surgical ICU
51%
Larson, et al. (1992)
Neonatal ICU
29%
Doebbeling, et al. (1992)
ICU
40%
Zimakoff, et al. (1992)
ICU
40%
Meengs, et al. (1994)
ER (Casualty)
32%
Pittet, Mourouga, & Perneger (1999)
All wards
48%
ICU
36%
Pittet (2001)
Implementation issues


What are the practical obstacles to the
introduction of formative assessment
practices, and how can they be
overcome?
What kinds of tools and supports can
be provided for teachers, and what
needs to be developed locally?
Adoption issues
The story so far…

1993-1998
 Review

of research on formative assessment
1998-2003
 Face-to-face

2003-2008
 Attempts to

implementations with groups of teachers
produce faithful implementations at scale
2008-2013
 Creating
the conditions for implementations at scale
Adoption issues: potential research

How can we support leaders in
prioritizing changes that make the
most difference to student outcomes?
Comments? Questions?
www.dylanwiliam.net

similar documents