### The W*s of Data

```The W’s of
Data
Data
Does
have to be numbers?
It can be doesn’t have to be.
Without context, it’s useless!
Consider 17, 21, 44, and 76
Are those data?
Data Handout
The Five W’s of Data
Answering the Five W’s of Data provide the
context of the data.
 Who
 What
 When
 Where
 Why
 And if possible How

Who
 Rows
of data correspond to individual cases about
whom (or which if not people) we record some
characteristics
 Respondents – individuals who answer a survey
 Subjects or participants – people on whom we
experiment
 Experimental units – inanimate subjects for
experiments
 Data values may also be called observations
without being clear about the Who
From the data sheet
 Who?
What
 Variables
each individual
 Variables are usually recorded in the columns of a
data table
 Variables identify What has been measured
 They may seem simple but think!
 Variables have measurement units – it’s natural to
count how many cases belong in each category.
 The units tell how each value has been measured
(scale)
Variables
 Categorical
variables – name categories and
answers how cases fall into these categories.
Can also be a qualitative variable

Ex. Gender, Year in school, nationality, etc.
 Quantitative
the quantity of what is measured

Ex. Height, weight, income, etc.
 Just
because the data are numbers does not
make it quantitative

Ex. Zip codes
From the data sheet
 What?
Why
 It’s
the questions we ask a variable that shape how
 Ex. An end of class survey asks “How valuable do you
think this course will be to you?”
 1 = worthless
2 = slightly
3 = middling
 4 = reasonably 5 = invaluable
 Is the educational value categorical or quantitative?
From the data sheet
 Are
variables qualitative or quantitative?
 Why?
Counts count
 When
Amazon offers free shipping, they might first
analyze how purchases are shipped.
 Counting summarizes the categorical variable,
shipping method.
 We also use counts to measure quantities such as the
number of classes you are taking or how many songs
you own.
 Two ways to use counts:


Count the cases in each category of a categorical
variable, the category label are the What and the
individuals counted are the Who
The counts themselves are not data, but they are
something to summarize about the data
Example
 Back
to Amazon’s shipping
Shipping Method
No. of purchases
Ground
20,345
Second-day
7,890
Overnight
5,432
 What
is the categorical variable?
 What?
 Who?
 Why?
 The
second way is when the focus is on the number
of something , which is measured by counting.
 Ex. Amazon might track the growth in the number of
teenage customers each month to forecast CD sales.
 What?
 Who?
Month
No. of Teenage Customers
January
123,456
February
234,567
March
345,678
April
456,789
 Units?
 Why?

Is teen a category? Is it a quantitative variable?
Identifiers
 Is
your student ID number a quantitative variable?
 Why?
 Other examples of identifiers include UPS tracking
numbers, social security numbers, driver’s license
numbers
 Identifier variables do not tell us anything useful
about the category because there is exactly one
individual in each.
 The are used to:



Combine data from different sources
Protect confidentiality
Provide unique labels
 We
must know Who, What, and Why to analyze but
understand more we would also like to know When,
Where, and How.
 When can make a difference in the data.

Example Number of women with jobs outside the home
in 1900 and the number of women with jobs outside the
home in 2000.
 Where

can make a difference in the data
Example Number of high school students participating
in ice hockey in Florida and Number participating in ice
hockey in Minnesota
 How


data is collected matters
Survey, interviews, observation, etc.
How could surveys be flawed, especially internet
surveys?
Example
 Medical
researchers at a large city hospital
investigated the impact of prenatal care on newborn
health collected data from 882 births during 1998-2000.
They kept track of the mother’s age, the number of
weeks the pregnancy lasted, the type of birth
(cesarean, induced, natural), the level of prenatal care