Coding and interpreting log stream data

Report
Coding and interpreting log stream data
Patrick Griffin
Assessment Research Centre, MGSE
Presented at the Annual Meeting of AERA, April 6th, 2014, Philadelphia
Data coding in computer based problem
solving assessments
• Current coding processes mainly use a dichotomous
success-failure (present – absent) scoring system;
• Greiff, Wustenberg & Funke (2012) determined three
measures which represent dynamic problem solving(DPS):
–
–
–
–
Model Building,
Forecasting and
Information Retrieval.
These are applied to a series of steps in complex problems
• students are scored as false (0) or true (1) on the task
• ATC21S project draws inferences about how students solve
problems as well and the outcome
• using a series of automated dichotomous scores, rubrics and partial credit
approaches.
ATC21S approach
ATC21S - five broad components of
collaborative problem solving (CPS)
(Hesse, 2014).
• cognitive skills (participation, perspective taking,
social regulation);
• social skills (task regulation and knowledge building);
Within these five components, students are assessed
on three ordered levels of performance on 19 elements.
Purpose of the assessments
• 11 assessment tasks tapping into different
and overlapping skills within this
framework
• Provides teachers with
• information to interpret students’ capacity in
collaborative problem solving subskills,
• a profile of each student’s performance for
formative instructional purposes
Unobtrusive assessments Problem Solving
Zoanetti, (2010)
• Moved individual problem solving from maths to
games
– Recorded interactions between the problem solver and the task
environment in an unobtrusive way
ATC21S (2009-2014)
• Collaborative problem solving tasks
capture interactions between..
• the problem solvers working together
• the individual problem solver and the task
Activity log files
• Following Zoanetti, the files generated
for the automatic records of these types
of student–task interactions are referred
to as a ‘session log file’.
• They contain free-form data referred to
as ‘process stream data’ as free-form
text files with delimited strings of text
Process stream data
• MySQL database architecture recorded interactions with
the task environment to describe solution processes in an
unobtrusive way.
• Process stream data describe distinct key-strokes and
mouse events such as typing, clicking, dragging, cursor
movements, hovering time, action sequences etc.
recorded with a timestamp.
• Sequential numbering of events enabled timestamps to
record analysis of action sequences and inactivity.
• ‘Process stream’ data describes the time stamped
data(Zoanetti, 2010).
Common and unique indicators
• ‘Common’ or ‘Global’ events apply to all tasks;
• ‘Unique’ or ‘Local’ events are unique to specific
tasks due to the nature of the behaviours and
interactions those tasks elicit.
Application example Laughing Clowns
Student A View
Student B View
Interpreting the log stream data 1
Event type
Process stream data format
Session Start Student student_id has commenced
Session End
Chat Text
Ready To
Explanation of data captured
Records the start of a task with student and task
task task_id
unique identification
Student student_id has completed task
Records the end of a task with student and task
task_id
unique identification
Message: “free form of message using
Captures the contents of the chat message the
the chat box”
students used to communicate with their partner
Requested to move to page: page_id
Indicates whether the student is ready to
Progress
progress or not, and records the navigation
endpoint which he is ready to progress to for
multipage tasks
Other Click
Screen x coords: x_coordinate; Screen
Captures the coordinates of the task screen if
y coords: y_coordinate;
the student has clicked anywhere outside the
domain of the problem
Interpreting log stream data
Event Type
Process stream data format
Explanation of data captured
StartDrag
startDrag: ball_id; x,y coordinates of
Records the identifier of the ball and it’s
the ball at the start of the drag
coordinates which is being dragged by the
student
StopDrag
DropShute
stopDrag: ball_id;
Records the identifier of the ball which is
x,y coordinates of the ball at the end
being dragged by the student and it’s
of the drag
coordinates at the end of the drag
dropShutePosofShuteId: ball_id;
Records the identifier of the ball, it’s
x,y coordinates where the ball when it coordinates and the value of the clown head
Check box
was dropped
shute when it was dropped by the student
SelectionValue: option_value
Captures data if students agree or disagree
on how their machines works
Session logs and chat stream
• process and click stream data are
accumulated and stored in session logs
• A chat box tool captures text exchanged
between students and stored in string data
format.
• All chat messages were recorded with a
time stamp.
Recording action and chat data
Interpreting counts and chats
• Each task process log stream was examined for behaviours
indicative of cognitive and social skills as defined by Hesse
(2014)that could be captured algorithmically.
• Indicators were coded as rule-based indicators through an
automated algorithmic process similar to that described by
Zoanetti (2010).
• Zoanetti showed how process data (e.g., counts of actions)
could be interpreted as an indicator of a behavioural variable
(e.g., error avoidance or learning from mistake)
• For example, in the Laughing Clowns task a count of the
‘dropShute’ actions (dropping the balls into the clown’s
mouth) can indicate how well the student managed their
resources (the balls).
Direct and inferred indicators
• Indicators that can be captured in all tasks are
labelled ‘global’.
• They included total response time, response time to
partner questions, action counts, and other behaviours
that were observed regardless of the task.
• Indicators that are task-specific were labelled
‘local’.
• There are two categories of local indicators: direct and
inferred.
• Direct indicators represent those that can be identified
clearly, such as a student performing a particular action.
• Inferred indicators relate to such things as sequences of
action/chat within the data. Patterns of indicators are
used to infer the presence of behaviour indicative of
elements in the Hesse conceptual framework.
Coding indicative actions
• Each indicator was coded with a unique ID code. Using the
example of the unique ID code ‘U2L004A’,
– ‘U2’ the Laughing Clowns task,
– ‘L’ ‘local’ indicator specific to that task
• (‘G’ would represent that it was a global indicator that could be applied to all
tasks),
– ‘004’ fourth indicator created for this task
– ‘A’ applicable to student A.
• Programming algorithms search for and capture the coded data
from the process stream log files;
• A count of actions in indicators are converted into either a
dichotomy or partial credit scores.
• Panels used an iterative process to map indicators onto the
Hesse framework until a stable allocation was agreed upon.
Algorithms and scoring rules
Indicator
Code
U2L004A
U2L004B
Details and scoring rule
Algorithm
Output
Systematic approach. All positions have been
Step 1: Find all drop ball occurrences captured as dropShute and their
Count values
covered.
Scoring rule: Threshold value.
Task name: Laughing Clowns.
corresponding positions as dropShuteL,dropShuteR, dropShuteM.
Step 2: Then count all the occurrence of the action recorded under
‘dropShute’ and their unique positions from the log.
Step 3: Increase the value of the indicator by one, if one or more
‘dropShute’ occurs in the form of dropShuteR, dropShuteL, and
dropShuteM.
Step 4: If the total number of unique dropShutes (dropShuteR,
dropShuteL, and dropShuteM) from the log is less than three then
the value of the indicator is defined as -1 to indicate missing data.
Global001A
Acceptable time to first action given reading load.
Global001B
Time (in seconds) spent on the task before first
action (interpreted as reading time)
Scoring rule: Threshold time.
Step 1: Find the starting time when a student joins a collaborative
Time
session.
Step 2: Find the previous record of the first action.
Step 3: Find the time of that previous record (from step 2).
Step 4: Calculate the time difference obtained (from step 1 and step 3),
indicating the time before first action.
Global005A
Interactive chat blocks: Count the number of chat
Global005B
blocks (A, B) with no intervening actions.
intervening action from A or B. Treat two or more consecutive chats
Consecutive chats from the same player counts
from a single student as one chat.
as 1 (e.g., A,B,A,B = 2 chat blocks; A,B,A,B,A,B =
3 chat blocks; AA,B,A,BB = 2 chat blocks)
Scoring rule: Threshold number.
Step 1: Find all the consecutive chat from student A and B without any
Step 2: Increase the value of the indicator by one if one block is found.
Count values
Coded data and variable identification
Defining indicators
Using indicator data
• Scores from a set of indicators function similarly to a
set of conventional test items requiring stochastic
independence of indicators;
•
Most indicators scored ‘1’ if and ‘0’ to the indicator if
absent for each student. In the clowns task a player
needs to leave a minimum number of balls for
his/her partner in order for the task to be completed
successfully. If true – ‘1’, if not ‘0’.
• Frequency-based indicators could be converted into
polytomous scores based on threshold values and
an iterative judgement and clibration process.
Forming a dichotomous indicator from
frequency data
Polytomous indicator from frequency data
Separating the players - scoring A and B
•
Collaboration cannot be summarised by a single indicator, – ‘students
communicated’ – it involves communication, cooperation and
responsiveness.
•
For collaboration, the presence of chat linked to action – pre and post a
chat event – was used to infer collaboration, cooperation or
responsiveness linked to the Hesse framework
•
The patterns of player-partner (A-B) interaction.
•
A series of three sequences of player partner interaction was found to
be adequate yielding the following possible player-partner combinations
:
1) A, B, A;
2) A, B, B;
3) A, A, B.
•
These combinations apply only to the action of the initiating student (A).
Each student was coded separately in the data file, so the perspective
changed when the other student (B) was scored
Assigning to A and B
Type
Measurement
Interactive chat-actionchat blocks
count
Interactive chat-actionaction blocks
Interactive chat-chataction blocks
Interactive actionaction-chat blocks AAC
Combination
Perspective from
student A
Perspective from
student B
player + player + partner
AAB
BBA
count
player + partner + partner
ABB
BAA
count
player + partner + player
ABA
BAB
count
player + player + player
AAA
BBB
count
player + player + partner
AAB
BBA
count
player + partner + player
ABA
BAB
count
player + partner + partner
ABB
BAA
count
player + partner + player
ABA
BAB
count
player + partner + partner
ABB
BAA
count
player + partner + player
ABA
BAB
Mapping indicators to Hesse Framework
The empirical data were checked against the
relevant skill in the conceptual framework
(Hesse, 2014).
• relative difficulty was consistent with
framework.
• map each indicator to relevant skill it was
intended to measure
• refine the definition of each indicator to clarify
the link between the algorithm and the
construct.
• Frequency used as a proxy measure of
difficulty
Indicator review cycle
• IRT yielded a hierarchy of the descriptors;
• Substantive order checked for meaning within
a broader collaborative problem solving
framework.
• Iterative review process to ensure that the
conceptual descriptors were supported by
empirical item location, which in turn informs
the construct continuum.
Domains of indicators social and cognitive
• Clusters of indicators interpreted to
identify levels of progression;
• The indicators were divided into their
two dimensions - social or cognitivebased on their previous mapping
– Then into five dimensions.
• Skills within each dimension were
identified to represent the progression
from novice to expert.
Parameter invariance and fit
• Multiple calibrations allowed for comparison
and analysis of item parameters.
• The stability of parameters remained after
number of indicators reduced. from over 450
to fewer than 200.
• The removal of poorly ‘fitting’ indicators
reduced the standard errors of the item
parameters, while maintaining the reliability of
the overall set.
Calibration of Laughing Clowns task
VARIABLES
item
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
ESTIMATE ERROR^
3.106
-0.686
1.01
0.454
-0.435
-1.409
-0.218
0.895
0.267
-0.657
1.094
0.424
-0.523
-1.416
-0.011
1.464
-2.714
-0.646
0.046
0.04
0.04
0.039
0.039
0.042
0.039
0.039
0.039
0.04
0.04
0.039
0.039
0.042
0.039
0.039
0.045
0.166
MNSQ
1.13
0.97
1
1
1
0.98
1.01
0.98
1.06
1
0.98
1.05
0.98
0.97
0.99
1.04
0.94
0.98
UNWEIGHTED FIT
Confidence Interval
0.91
0.91
0.91
0.91
0.91
0.91
0.91
0.91
0.91
0.91
0.91
0.91
0.91
0.91
0.91
0.92
0.92
0.94
1.09
1.09
1.09
1.09
1.09
1.09
1.09
1.09
1.09
1.09
1.09
1.09
1.09
1.09
1.09
1.08
1.08
1.06
T
MNSQ
2.8
-0.5
-0.1
-0.1
0
-0.5
0.3
-0.4
1.3
0
-0.5
1
-0.5
-0.7
-0.2
1
-1.6
-0.7
1.02
0.98
1
1
1
0.99
1.01
0.99
1.06
1
0.99
1.04
0.98
0.98
0.99
1.02
0.99
0.98
WEIGHTED FIT
Confidence Interval
0.75
0.95
0.94
0.97
0.96
0.9
0.97
0.95
0.98
0.95
0.94
0.97
0.96
0.9
0.98
0.93
0.8
0.97
1.25
1.05
1.06
1.03
1.04
1.1
1.03
1.05
1.02
1.05
1.06
1.03
1.04
1.1
1.02
1.07
1.2
1.03
T
0.2
-0.7
0
-0.2
-0.1
-0.2
0.9
-0.5
4.5
0
-0.5
2.8
-0.9
-0.3
-1
0.5
-0.1
-0.9
--------------------------------------------------------------------------------------|1
|
|
|
| 16
|
|
|
|
|
|
|
X|
|
|
|
X|
|
X| 11
|
XX|
|
1
XX| 3
|
XXX|
|
X| 8
|
XXXXX|
|
XXXXX|
|
XXXXXXX|
|
XXXXXXXXXX|
|
XXXXXXXXXXXXX|
|
XXXXXXXXXXXXXXXXX|
|
XXXXXXXXXXXXXXX|
|
XXXXXXXXXXXXXXXXXXXX|
|
XXXXXXXXXXXXXXXXXXX| 4
|
XXXXXXXXXXXXXXXXXXXXXXXXXX| 12
|
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX|
|
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX|
|
XXXXXXXXXXXXXXXXXXXXXXXXXXXX| 9
|
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX|
|
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX|
|
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX|
|
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX|
|
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX|
|
0
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX| 15
|
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX|
|
XXXXXXXXXXXXXXXXXXXXXXXXXXX|
|
XXXXXXXXXXXXXXXXXXXXXXXXXXX|
|
XXXXXXXXXXXXXXXXXXXXXXX| 7
|
XXXXXXXXXXXXXXXXXXX|
|
XXXXXXXXXXXXXXXXXXX|
|
XXXXXXXXXXXXXXXXXXXXXXX|
|
XXXXXXXXXXXXXX|
|
XXXXXXXXXXXXX| 5
|
XXXXXXXXXX| 13
|
XXXXXXXXX|
|
XXXXXX|
|
XXXX| 10 18
|
XXXXX| 2
|
XXX|
|
|
|
X|
|
|
|
|
|
-1
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 6 14 17
|
=======================================================================================
Each ' X' r epr esent s 2. 9 cases
=======================================================================================
Stability of indicator difficulty estimates
across countries
Challenges for the future
•
Scaling all 11 tasks
•
One, two and five dimensions
•
Stability of indicator estimates over language, curriculum and other
characteristics
•
Simplifying the coding process
•
Using chat including grammatical errors, non-standard syntax,
abbreviations, and synonyms or ’text-speak’.
•
Capture these text data in a coded form.
•
Complexity and simplicity without loss of meaning- built into task
construction as an a-priori design feature.
•
Design templates for task development and scoring.

similar documents