PSLC DataShop Introduction
Slides current to DataShop version 4.1.8
John Stamper
DataShop Technical Director
The DataShop Team
• John Stamper
– DataShop Technical Director
• Sandy Demi
– QA (Quality Assurance – Testing)
• Brett Leber
– Interaction Designer
• Alida Skogsholm
– DataShop Manager, Developer
• Duncan Spencer
– DataShop Developer
• Shanwen Yu
– DataShop Developer
What is DataShop?
• Central Repository
– Secure place to store & access research data
• Every LearnLab and every study
– Supports various kinds of research
• Primary analysis of study data
• Exploratory analysis of course data
• Secondary analysis of any data set
• Analysis & Reporting Tools
– Focus on student-tutor interaction data
– Learning curves & error reports provide summary and low-level
views of student performance
– Performance Profiler aggregates across various levels of
granularity (problem, dataset levels, knowledge components,
– Data Export
• Tab delimited tables you can open with your favorite spreadsheet
program or statistical package
– New tools created to meet highest demands
How do I get data in?
Tutors log to a standard log
Data is captured in files or
sent to the logging database
Older data can be transformed
to the tutor logging format
• …converts log data to a
uniform XML Format
• …anonymizes participant
• …imports data into a
relational database
• …provides visualization and
reports for researchers to
analyze their data
• …can export the data to a
tab-delimited format
How do I get data in?
• Directly
– Some tutors are logging directly to the PSLC logging database
– CTAT-based tutors (when configured correctly)
• Indirectly
– Other tutors are logging to their own file formats or their own
– These data require a conversion process
– Many studies are in this category
Getting to DataShop
• Explore data through the DataShop tools
• Where is DataShop?
– http://pslcdatashop.org
– Linked from DataShop homepage and learnlab.org
• http://pslcdatashop.web.cmu.edu/about/
• http://learnlab.org/technologies/datashop/index.php
Creating an account
• On DataShop's home page, click
"Sign up now". Complete the form to
create your DataShop account.
• If you’re a CMU student/staff/faculty, click “Log in with
WebISO” to create your account.
Getting access to datasets
• By default, you will have access to the
public datasets.
• Of these, we recommend three for getting
– Geometry Area (1996-1997)
– Joint Explanation - Electric Fields - Pitt - Spring 2007
– Chinese Vocabulary Fall 2006
• For access to other datasets, contact us:
[email protected]
DataShop – Dataset selection
Datasets you can
view or edit. You
have to be a project
member or PI for the
dataset to appear
Private datasets you
can’t view. Email us
and the PI to get
Public datasets that
you can view only.
Important Terms
KC (Knowledge Component)
see http://pslcdatashop.org/help?page=terms
Important Terms
• Transaction
– A transaction is an interaction between the student
and the tutoring system.
– Students may make incorrect entries or ask for hints
before getting a step correct. Each hint request,
incorrect attempt, or correct attempt is a transaction;
and a step can involve one or more transactions.
• Step
– A step is an observable part of the solution to a
problem. Because steps are observable, they are
partly determined by the user interface available to
the student for solving the problem.
Important Terms
• Sample
– a subset of the data
– the Sample Selector provides the ability to filter on
various aspects of the data
– You can use samples to:
• Compare across conditions
• Narrow the scope of data analysis to a specific time range,
set of students, problem category, or unit of a curriculum (for
– The columns available to filter on are organized into
categories. The categories are:
• Condition, Dataset Level, Problem, School, Student, Tutor
Important Terms
• KC, Knowledge Component
– a piece of information that can be used to accomplish
tasks, perhaps along with other knowledge
components. Knowledge component is a
generalization of everyday terms like concept,
principle, fact, or skill, and cognitive science terms
like schema, production rule, misconception, or facet.
• Opportunity
– An opportunity is a chance for a student to
demonstrate whether he or she has learned a given
knowledge component. An opportunity exists each
time a step is present with the associated knowledge
Important Terms
• Observation
– An observation is a group of transactions for a
particular student working on a particular step
within a problem view. If within these
constraints there is only one transaction
recorded, an observation will still exist for that
single transaction.
– Put another way, an observation is available
each time a student takes an opportunity to
demonstrate a knowledge component
Important Terms
• LFA, Learning Factors Analysis
– a logistic regression method which uses a set
of customized Item-Response models to
predict how a student will perform for each
knowledge component on each learning
opportunity. LFA was developed at Carnegie
Mellon by Hao Cen, Kenneth Koedinger, and
Brian Junker. In DataShop, the LFA algorithm
is run over each KC model of each dataset,
producing data that populates predicted
learning curves and other reports.
Important Terms
• AIC, Akaike Information Criterion
– a measure of the goodness of fit of a statistical model,
in this case, the LFA model. It is an operational way of
trading off the complexity of the estimated model
against how well the model fits the data2. In this way,
it penalizes the model based on its complexity (the
number of parameters). A lower AIC value is better.
• BIC, Bayesian Information Criterion
– a measure of goodness of fit of the LFA model. The
BIC penalizes free parameters more strongly than
does the Akaike information criterion (AIC)3. A lower
BIC value is better.

similar documents