The W*s of Data

The W’s of
have to be numbers?
It can be doesn’t have to be.
Without context, it’s useless!
Consider 17, 21, 44, and 76
Are those data?
Data Handout
The Five W’s of Data
Answering the Five W’s of Data provide the
context of the data.
 Who
 What
 When
 Where
 Why
 And if possible How
 Rows
of data correspond to individual cases about
whom (or which if not people) we record some
 Respondents – individuals who answer a survey
 Subjects or participants – people on whom we
 Experimental units – inanimate subjects for
 Data values may also be called observations
without being clear about the Who
From the data sheet
 Who?
 Variables
– the characteristics recorded about
each individual
 Variables are usually recorded in the columns of a
data table
 Variables identify What has been measured
 They may seem simple but think!
 Variables have measurement units – it’s natural to
count how many cases belong in each category.
 The units tell how each value has been measured
 Categorical
variables – name categories and
answers how cases fall into these categories.
Can also be a qualitative variable
Ex. Gender, Year in school, nationality, etc.
 Quantitative
variable – answers a question about
the quantity of what is measured
Ex. Height, weight, income, etc.
 Just
because the data are numbers does not
make it quantitative
Ex. Zip codes
From the data sheet
 What?
 It’s
the questions we ask a variable that shape how
we think about it.
 Ex. An end of class survey asks “How valuable do you
think this course will be to you?”
 1 = worthless
2 = slightly
3 = middling
 4 = reasonably 5 = invaluable
 Is the educational value categorical or quantitative?
From the data sheet
 Are
variables qualitative or quantitative?
 Why?
Counts count
 When
Amazon offers free shipping, they might first
analyze how purchases are shipped.
 Counting summarizes the categorical variable,
shipping method.
 We also use counts to measure quantities such as the
number of classes you are taking or how many songs
you own.
 Two ways to use counts:
Count the cases in each category of a categorical
variable, the category label are the What and the
individuals counted are the Who
The counts themselves are not data, but they are
something to summarize about the data
 Back
to Amazon’s shipping
Shipping Method
No. of purchases
 What
is the categorical variable?
 What?
 Who?
 Why?
 The
second way is when the focus is on the number
of something , which is measured by counting.
 Ex. Amazon might track the growth in the number of
teenage customers each month to forecast CD sales.
 What?
 Who?
No. of Teenage Customers
 Units?
 Why?
Is teen a category? Is it a quantitative variable?
 Is
your student ID number a quantitative variable?
 Why?
 Other examples of identifiers include UPS tracking
numbers, social security numbers, driver’s license
 Identifier variables do not tell us anything useful
about the category because there is exactly one
individual in each.
 The are used to:
Combine data from different sources
Protect confidentiality
Provide unique labels
We need more information…
 We
must know Who, What, and Why to analyze but
understand more we would also like to know When,
Where, and How.
 When can make a difference in the data.
Example Number of women with jobs outside the home
in 1900 and the number of women with jobs outside the
home in 2000.
 Where
can make a difference in the data
Example Number of high school students participating
in ice hockey in Florida and Number participating in ice
hockey in Minnesota
 How
data is collected matters
Survey, interviews, observation, etc.
How could surveys be flawed, especially internet
 Medical
researchers at a large city hospital
investigated the impact of prenatal care on newborn
health collected data from 882 births during 1998-2000.
They kept track of the mother’s age, the number of
weeks the pregnancy lasted, the type of birth
(cesarean, induced, natural), the level of prenatal care
the mother had (none, minimal, adequate), the birth
weight and sex of the baby, and whether the baby
exhibited health problems (none, minor, major).
 Identify the W’s, name the variables, specify for each
variable whether its use indicates it should be treated
as categorical or quantitative, identify the units in
which it was measured or note that they were not
 Homework
p. 16 2-12 even

similar documents