7.SP.4 - wcpssccmathtraining2013

```Wendell B. Barnwell II
[email protected]
Major Topics/ Strands
A. Interpreting Categorical and Quantitative Data (ID)
• Exploring Data
B. Conditional Probability and the Rules of Probability
(CP)
C. Making Inferences and Justifying Conclusions (IC)
• Statistical Inference
Why Statistics?
 Arthur Benjamin
 TED Talk 2009
 Teach Statistics over
Calculus
m/watch?v=BhMKmovN
jvc
Why Statistics? (con’d)
 Most people will take at most one Statistics class in
their lives.
 That includes future senators to sales clerks, … as well
as presidents, CEOs, jurors, doctors, and other
decision makers
 It’s our job to teach them how to make informed
decisions!
Prudential Age Commercial
 Awesome data collection
example.
m/watch?v=C3qj88J7-jA
Types of Variables!!
 Categorical Data
 Quantitative Data
 M&M colors
 Height
 Gender
 Armspan
 Whether an individual
 Distance from home
has a cellular phone
Graphing Variables
 Categorical Data
 Quantitative Data
 Pie chart
 Dotplot
 Bar chart
 Stemplot
 Two-way table
 Histogram
 Scatterplot
 Time plot
Common Core Math 1 Goals
Summarize, represent, and interpret data on a
single count or measurement variable.
 S-ID.1 Represent data with plots on the real number
line (dot plots, histograms, and box plots).
 S-ID.2 Use statistics appropriate to the shape of the
data distribution to compare center (median, mean)
and spread (interquartile range, standard deviation) of
two or more different data sets.
 S-ID.3 Interpret differences in shape, center, and
spread in the context of the data sets, accounting for
possible effects of extreme data points (outliers).












Develop understanding of statistical variability.
6.SP.1. Recognize a statistical question as one that anticipates variability in the data related to the
question and accounts for it in the answers.
6.SP.2. Understand that a set of data collected to answer a statistical question has a distribution which
can be described by its center, spread, and overall shape.
6.SP.3. Recognize that a measure of center for a numerical data set summarizes all of its values with a
single number, while a measure of variation describes how its values vary with a single number.
Summarize and describe distributions.
6.SP.4. Display numerical data in plots on a number line, including dot plots, histograms, and box
plots.
6.SP.5. Summarize numerical data sets in relation to their context.
a) Reporting the number of observations.
b) Describing the nature of the attribute under investigation, including how it was measured and its
units of measurement.
c) Giving quantitative measures of center (median and/or mean) and variability (interquartile range
and/or mean absolute deviation), as well as describing any overall pattern and any striking deviations
from the overall pattern with reference to the context in which the data were gathered.
d) Relating the choice of measures of center and variability to the shape of the data distribution and
the context in which the data were gathered.
 Use random sampling to draw inferences about a population.
 7.SP.1. Understand that statistics can be used to gain information about a population by
examining a sample of the population; generalizations about a population from a sample
are valid only if the sample is representative of that population. Understand that random
sampling tends to produce representative samples and support valid inferences.
 7.SP.2 Use data from a random sample to draw inferences about a population with an
unknown characteristic of interest. Generate multiple samples (or simulated samples) of
the same size to gauge the variation in estimates or predictions.
 Draw informal comparative inferences about two populations.
 7.SP.3 Informally assess the degree of visual overlap of two numerical data distributions
with similar variabilities, measuring the difference between the centers by expressing it
as a multiple of a measure of variability. For example, the mean height of players on the
basketball team is 10 cm greater than the mean height of players on the soccer team, about
twice the variability (mean absolute deviation) on either team; on a dot plot, the
separation between the two distributions of heights is noticeable.
 7.SP.4 Use measures of center and measures of variability for numerical data from
random samples to draw informal comparative inferences about two populations. For
example, decide whether the words in a chapter of a seventh-grade science book are
generally longer than the words in a chapter of a fourth-grade science book.
Activity #1 – Tennis Balls
 Using a ruler measure the diameter of a tennis ball to
the nearest millimeter.
 Place your measurement on a post it and place it on
the board above our number line.
 Describe the distribution.
Activity #2 – Peanuts!
 Don’t freak out there are none in the room!
 My students took a sample of unshelled peanuts and
measured the lengths of those peanuts in millimeters.
 We then created a line plot
Middle School Foundation
 8.SP.4. Understand that patterns of association can also be
seen in bivariate categorical data by displaying frequencies
and relative frequencies in a two-way table. Construct and
interpret a two-way table summarizing data on twocategorical variables collected from the same subjects. Use
relative frequencies calculated for rows or columns to
describe possible association between the two variables.
For example, collect data from students in your class on
whether or not they have a curfew on school nights and
whether or not they have assigned chores at home. Is there
evidence that those who have a curfew also tend to have
chores?
Activity #3 – M&M Data
 Take pack of snack size M&Ms and compare it to a
pack of regular size M&Ms.
 Create a two way table to compare this data.
 How can we compare this data?
 How can we graph this data?
Middle School Foundation
 Use random sampling to draw inferences about a
population.
 8.SP.1. Construct and interpret scatter plots for bivariate
measurement data to investigate patterns of association between
two quantities. Describe patterns such as clustering, outliers,
positive or negative association, linear association, and nonlinear
association.
 .
 8.SP.2 Know that straight lines are widely used to model
relationships between two quantitative variables. For scatter
plots that suggest a linear association, informally fit a straight
line, and informally assess the model fit by judging the closeness
of the data points to the line.
Activity 4 – Typhoons in the Pacific
 This is a problem I adapted from the 2013 AP Statistics
exam problem #6.
Common Core Math 2 Goals
 S-CP.1 Describe events as subsets of a sample space
(the set of outcomes) using characteristics (or
categories) of the outcomes, or as unions,
intersections, or complements of other events
("or," "and," "not") with visual representations
including Venn diagrams.
 S-CP.2 Understand that two events A and B are
independent if the probability of A and B
occurring together is the product of their
probabilities, and use this characterization to
determine if they are independent.
Common Core Math 2 Goals
 S-CP.3 Understand the conditional probability of A
given B as P(A and B)/P(B), and interpret
independence of A and B as saying that the
conditional probability of A given B is the same as the
probability of A, and the conditional probability of B
given A is the same as the probability of B.
 S-CP.4 Construct and interpret two-way frequency
tables of data when two categories are associated with
each object being classified. Use the two-way table as a
sample space to decide if events are independent and
to approximate conditional probabilities.
Common Core Math 2 Goals
 S-CP.5 Recognize and explain the concepts of
conditional probability and independence in
everyday language and everyday situations.
 S-CP.6 Find the conditional probability of A given
B as the fraction of B's outcomes that also belong
to A, and interpret the answer in terms of the
model.
 CCSS.Math.Content.7.SP.C.5 Understand that the probability of a chance event
is a number between 0 and 1 that expresses the likelihood of the event
occurring. Larger numbers indicate greater likelihood. A probability near 0
indicates an unlikely event, a probability around 1/2 indicates an event that is
neither unlikely nor likely, and a probability near 1 indicates a likely event.
 CCSS.Math.Content.7.SP.C.6 Approximate the probability of a chance event by
collecting data on the chance process that produces it and observing its longrun relative frequency, and predict the approximate relative frequency given
the probability.
 CCSS.Math.Content.7.SP.C.7 Develop a probability model and use it to find
probabilities of events. Compare probabilities from a model to observed
frequencies; if the agreement is not good, explain possible sources of the
discrepancy.
 CCSS.Math.Content.7.SP.C.7a Develop a uniform probability model by
assigning equal probability to all outcomes, and use the model to determine
probabilities of events.
 CCSS.Math.Content.7.SP.C.7b Develop a probability model (which may not be
uniform) by observing frequencies in data generated from a chance process.
 CCSS.Math.Content.7.SP.C.8 Find probabilities of compound events using
organized lists, tables, tree diagrams, and simulation.
 CCSS.Math.Content.7.SP.C.8a Understand that, just as with simple events, the
probability of a compound event is the fraction of outcomes in the sample
space for which the compound event occurs.
 CCSS.Math.Content.7.SP.C.8b Represent sample spaces for compound events
using methods such as organized lists, tables and tree diagrams. For an event
described in everyday language (e.g., "rolling double sixes"), identify the
outcomes in the sample space which compose the event.
 CCSS.Math.Content.7.SP.C.8c Design and use a simulation to generate
frequencies for compound events.
 CCSS.Math.Content.8.SP.A.4 Understand that patterns of association can also
be seen in bivariate categorical data by displaying frequencies and relative
frequencies in a two-way table. Construct and interpret a two-way table
summarizing data on two categorical variables collected from the same
subjects. Use relative frequencies calculated for rows or columns to describe
possible association between the two variables.
Why Probability?
 Looking at games of chance
 Card games, lotteries, fantasy sports, horse racing
 Looking at social science data
 Life, Death, medical field, biostatistics
 Looking at scientific data
 variations in individual measurement are random
(example: tennis ball diameter measurements)
Chance Behavior
Chance Behavior is unpredictable
in the short run but has a regular
and predictable pattern in the long
run.
Randomness
We call a phenomenon random if
individual outcomes are uncertain
but there is nonetheless a regular
distribution of outcomes in a large
number of repetitions.
Priniples of Randomness
1.
Long series of independent trials
2. The idea is empirical. We can estimate a real-world
probability by actually observing many trials. (ex.
Simulation – combining class data)
3. Short runs only give a rough estimate; some several
hundred simulations are necessary to settle down a
probability.
Definition of Probability
 The probability of any outcome of a
random phenomenon is the proportion
of times the outcome would occur in a
very long series of repetitions. That is,
the probability is a long-term relative
frequency.
Interpreting Probabilities
 Ex. (a) – There is a .3 chance of rain tomorrow.
 How do you interpret this statement?
Interpreting Probabilities
 Ex. (a) – There is a .3 chance of rain tomorrow.
 Answer: Under the same conditions after a long run of
days under the same conditions there is 30% chance
that it will rain tomorrow.
 Meteorologists may have examined 100 days, 200 days
maybe more, but probably not just 10 days and 3
resulted in rain.
Interpreting Probabilities
 Ex. (b) – Your probability of winning at this lottery
game is 1/1000.
 How do you interpret this statement?
Interpreting Probabilities
 Ex. (b) – Your probability of winning at this lottery
game is 1/1000.
 Answer: Playing the lottery for a long run of the same
conditions there is a one and one-thousand chance of
winning.
 It may take a 1,000, 2,000, maybe more plays of this
lottery to settle down this probability and finally result
in a win.
Must be Independent !!!
 In order for an event to be considered
random it must be independent.
 Each event does not influence the outcome
of another event.
 Example: rolling a die. Rolling a 3 does not
influence the probability of rolling a 6 on
the next roll.
Sample Space
 A Sample Space S is a random
phenomenon is the set of all possible
Outcomes.
Event
 An event is any outcome or a set of
outcomes of a random phenomenon.
 This is a subset of the sample space
Probability Model
 A probability model is a mathematical
description of a random phenomenon
consisting of two parts:
 A sample space
 A way of assigning probabilities to events
Example #1
 Consider a situation in which shoppers were
categorized by gender (M or F) and the type of music
purchased (C = classical, R = rock, K = country, and P
= Rap)
a) What is the sample space?
b) What probability is associated to each event?
c) Event in which a shopper purchased classical.
d) Event in which the shopper was male.
Example #2
 An observer stands at the bottom of a freeway offramp and records the turning direction (L=left,
R=right) of each of three successive vehicles.
 What is the sample space?
What’s the probability of each outcome?
What event(s) has exactly one car turning right?
What event(s) has exactly one car turning left?
What event(s) have all cars turning the same
direction
Assigning Probability
 Some events are equally likely and some are not.
 Students need to be aware that the TOTAL number of
events is not always the denominator to the
probability.
Equally Likely Events
(a) Whether a fair die lands on 1,2,3,4,5 or 6.
(b) The sum of two fair dice landing on 2, 3, 4, 5,
6, 7, 8, 9, 10, 11, 12
(c) A fair coin landing on heads or tails when
tossed
(d) A fair coin landing on heads or tails when
spun on its side.
(e) A tennis racquet landing with the label “up”
or “down” when spun on its end
Equally Likely Events
(f) Your grade in this course being A, B, C, D, or F
(g) Whether or not California experiences a
catastrophic earthquake within the next year
(h) Whether or not your server correctly brings
you the meal you ordered in a restaurant
(i) Whether or not there is intelligent life on
Mars
(j) Whether or not a woman will be elected
President in next election.
Equally Likely Events
(k) Whether or not a woman will be elected
President before the year 2010.
(l) Colors of Reese’s Pieces candies: orange,
yellow and brown
Probability Example #1
 The heart association claims that only 10%
of US adults over age 30 can pass the
president’s physical fitness commission’s
minimum requirements. In a group of 4
randomly chosen adults, what is the
probability that 2 can pass and 2 cannot
pass?
Probability Example #2
Simulation
 The imitation of chance behavior, based on
a model that accurately reflects the
situation, is called a simulation.
Simulation Steps
 State: What is the question of interest about some
chance process?
 Plan: Describe how to use a chance device to imitate
one repetition of the process Explain clearly how to
identify the outcomes of the chance process and what
variable to measure.
 Do: Perform many repetitions of the simulation.
 Conclude: Use the results to answer the question of
interest
Probability Example #3
 Eric Staal, center for the Carolina Hurricanes, is off to a
strong start of an NHL season the season. He is getting
about 8 shots on goal a game and is making a third of
his shots. What is the probability that Eric scores 4
goals in a game?
Common Core Math 3 Goals
 Understand and evaluate random processes underlying statistical
experiments
 S-IC.1 Understand statistics as a process for making inferences about
population parameters based on a random sample from that population.
 Make inferences and justify conclusions from sample surveys, experiments, and
observational studies
 S-IC.3 Recognize the purposes of and differences among sample surveys,
experiments, and observational studies; explain how randomization relates to
each.
 S-IC.4 Use data from a sample survey to estimate a population mean or
proportion; develop a margin of error through the use of simulation models for
random sampling.
 S-IC.5 Use data from a randomized experiment to compare two treatments; use
simulations to decide if differences between parameters are significant.
 S-IC.6 Evaluate reports based on data.
 Use random sampling to draw inferences about a
population.
 7.SP.1. Understand that statistics can be used to gain
information about a population by examining a sample of the
population; generalizations about a population from a sample
are valid only if the sample is representative of that population.
Understand that random sampling tends to produce
representative samples and support valid inferences.
 7.SP.2 Use data from a random sample to draw inferences about a
population with an unknown characteristic of interest. Generate
multiple samples (or simulated samples) of the same size to
gauge the variation in estimates or predictions.
 Draw informal comparative inferences about two
populations.
 7.SP.3 Informally assess the degree of visual overlap of
two numerical data distributions with similar
variabilities, measuring the difference between the
centers by expressing it as a multiple of a measure of
variability.
 7.SP.4 Use measures of center and measures of
variability for numerical data from random samples to
draw informal comparative inferences about two
populations.
Activity #1 - The “1 in 6 wins”
Game
 As a special promotion for its 20-ounce bottles of soda,
a soft drink company printed a message on the inside
of each cap. Some of the caps said “Please try again”,
while others said “You’re a winner!” The company
advertised the promotion with the slogan “1 in 6 wins a
prize.” Seven friends each buy one bottle 20-ounce
bottle of the soda at a local convenience store. The
clerk is surprised when three of them win a prize. Is
this group of friends just lucky, or is the company’s
claim inaccurate?
Activity #2 –Sleep Deprivation
 Source: Rossman et. al NSF Project
 Researchers have established that sleep deprivation has a
harmful effect on visual learning. But do these effects linger
for several days, or can a person “make up” for sleep
deprivation by getting a full night’s sleep on subsequent
nights? A recent study investigated this question by
randomly assigning 21 subjects to one of two groups: one
group was deprived of sleep on the night following training
and pre-testing with a visual discrimination task, and the
other group was permitted unrestricted sleep on that first
night. Both groups were then allowed as much sleep as they
wanted on the following two nights. All subjects were then
re-tested on the third day.
Sleep Deprivation Data
 Subjects’ performance on the test was recorded as the
minimum time (in milliseconds) between stimuli
appearing on a computer screen for which they could
accurately report what they had seen on the screen.
• Sleep deprivation (n = 11): -14.7, -10.7, -10.7,
2.2, 2.4, 4.5, 7.2, 9.6, 10.0, 21.3, 21.8
• Unrestricted sleep (n = 10): -7.0, 11.6, 12.1,
12.6, 14.5, 18.6, 25.2, 30.5, 34.5, 45.6
Did sleep deprivation cause
difference in performance?
 Or is there another possible explanation?
Rerandomizing Simulation
 Place 21 cards (subjects) in a bag
 If no difference in treatment effects, then values same
as in original study
 How large a difference in group means with different
random assignments?
 Mix your cards and draw 10 to represent the
unrestricted group.
 Compare your mean to 19.82. Report.
Physical simulation can be
tedious…
Final Thoughts – sleep deprivation
 Research question?
 Do the effects of sleep deprivation on visual learning




last for several days?
Idea: suppose there’s no “treatment effect”
Differences due to random assignment?
“Re-randomize” many times
What would you conclude?
Closing Thoughts / Questions
 2 variable statistics (scatterplot, models)
```