Chapter 6 Powerpoint - Peacock

Report
The Standard Deviation
as a Ruler and the
Normal Model
Chapter 6
Objectives:
•
•
•
•
•
•
•
•
Standardized values
Z-score
Transforming data
Normal Distribution
Standard Normal Distribution
68-95-99.7 rule
Normal precentages
Normal probability plot
The Standard Deviation as a
Ruler
• The trick in comparing very differentlooking values is to use standard
deviations as our rulers.
• The standard deviation tells us how the
whole collection of values varies, so it’s a
natural ruler for comparing an individual to
a group.
• As the most common measure of variation,
the standard deviation plays a crucial role
in how we look at data.
Standardizing with z-scores
• We compare individual data values to their
mean, relative to their standard deviation
using the following formula:
z
 y  y
s
• We call the resulting values standardized
values, denoted as z. They can also be
called z-scores.
Slide 6 - 4
Standardizing with z-scores
(cont.)
• Standardized values have no units.
• z-scores measure the distance of each
data value from the mean in standard
deviations.
• A negative z-score tells us that the data
value is below the mean, while a positive
z-score tells us that the data value is
above the mean.
Benefits of Standardizing
• Standardized values have been converted
from their original units to the standard
statistical unit of standard deviations from
the mean (z-score).
• Thus, we can compare values that are
measured on different scales, with
different units, or from different
populations.
WHY STANDARDIZE A VALUE?
• Gives a common scale.
• We can compare two
different distributions with
different means and
standard deviations.
2.15 SD
Z=-2.15
This Z-Score
tells us it is
2.15 Standard
Deviations
from the mean
• Z-Score tells us how
many standard deviations
the observation falls away
from the mean.
• Observations greater
than the mean are
positive when
standardized and
observations less than
the mean are negative.
Example: Standardizing
• The men’s combined skiing event in the in the
winter Olympics consists of two races: a
downhill and a slalom. In the 2006 Winter
Olympics, the mean slalom time was 94.2714
seconds with a standard deviation of 5.2844
seconds. The mean downhill time was
101.807 seconds with a standard deviation of
1.8356 seconds. Ted Ligety of the U.S., who
won the gold medal with a combined time of
189.35 seconds, skied the slalom in 87.93
seconds and the downhill in 101.42 seconds.
• On which race did he do better compared
with the competition?
Solution:
z
 y  y
s
• Slalom time (y): 87.93 sec.
Slalom mean  y  : 94.2714 sec.
Slalom standard deviation (s): 5.2844 sec.
87.93  94.2714
zSlalom 
 1.2
5.2844
• Downhill time (y): 101.42 sec.
Downhill mean  y : 101.807 sec.
Downhill standard deviation (s): 1.8356 sec.
101.42  101.807
 0.21
1.8356
The z-scores show that Ligety’s time in the slalom
is farther below the mean than his time in the
downhill. Therefore, his performance in the slalom
was better.
zDownhill 
•
Your Turn: WHO SCORED BETTER?
• Timmy gets a 680 on the math of the SAT.
The SAT score distribution is normal with a
mean of 500 and a standard deviation of
100. Little Jimmy scores a 27 on the math
of the ACT. The ACT score distribution is
normal with a mean of 18 and a standard
deviation of 6.
• Who does better? (Hint: standardize both
scores then compare z-scores)
TIMMY DOES BETTER
• Timmy:
• Little Jimmy:
27  18
680  500
 1.5
z
 1 .8 z 
6
100
• Timmy’s z score is
further away from
the mean so he
does better than
Little Jimmy who’s
only 1.5 SD’s from
the mean
• Little Jimmy does
better than
average and is 1.5
SD’s from the
mean but Timmy
beats him because
he is .3 SD further.
Combining z-scores
• Because z-scores are standardized
values, measure the distance of each data
value from the mean in standard
deviations and have no units, we can also
combine z-scores of different variables.
Example: Combining z-scores
• In the 2006 Winter Olympics men’s
combined event, Ted Ligety of the U.S.
won the gold medal with a combined time
of 189.35 seconds. Ivica Kostelic of
Croatia skied the slalom in 89.44 seconds
and the downhill in 100.44 seconds, for a
combined time of 189.88 seconds.
• Considered in terms of combined z-scores,
who should have won the gold medal?
Solution
• Ted Ligety:
zSlalom 
zDownhill
87.93  94.2714
 1.2
5.2844
101.42  101.807

 0.21
1.8356
• Combined z-score: -1.41
• Ivica Kostelic: zSlalom 
89.44  94.2714
 0.91
5.2844
zDownhill 
100.44  101.807
 0.74
1.8356
• Combined z-score: -1.65
• Using standardized scores, Kostelic would
have won the gold.
Your Turn:
• The distribution of SAT scores has a mean
of 500 and a standard deviation of 100.
The distribution of ACT scores has a mean
of 18 and a standard deviation of 6. Jill
scored a 680 on the math part of the SAT
and a 30 on the ACT math test. Jack
scored a 740 on the math SAT and a 27
on the math ACT.
• Who had the better combined SAT/ACT
math score?
Solution:
• Jill
zSAT 
z ACT
680  500
 1.8
100
30  18

 2.0
6
• Combined math score: 3.8
• Jack
zSAT 
740  500
 2.4
100
z ACT 
27  18
 1.5
6
• Combined math score: 3.9
• Jack did better with a combined math
score of 3.9, to Jill’s combined math score
of 3.8.
Linear Transformation of Data
• Linear transformation
• Changes the original variable x into the new variable
xnew given by
xnew = a + bx
• Adding the constant a shifts all values of x upward
or downward by the same amount.
• Multiplying by the positive constant b changes the
size of the values or rescales the data.
Shifting Data
• Shifting data:
• Adding (or subtracting) a constant
amount to each value just adds (or
subtracts) the same constant to (from)
the mean. This is true for the median
and other measures of position too.
• In general, adding a constant to every
data value adds the same constant to
measures of center and percentiles, but
leaves measures of spread unchanged.
Example: Adding a Constant
• Given the data: 2, 4, 6, 8, 10
• Center: mean = 6, median = 6
• Spread: s = 3.2, IQR = 6
• Add a constant 5 to each value, new data 7, 9,
11, 13, 15
• New center: mean = 11, median = 11
• New spread: s = 3.2, IQR = 6
• Effects of adding a constant to each data
value
• Center increases by the constant 5
• Spread does not change
• Shape of the distribution does not change
Shifting Data (cont.)
• The following histograms show a shift from
men’s actual weights to kilograms above
recommended weight:
Rescaling Data
• Rescaling data:
• When we divide or multiply all the data
values by any constant value, all
measures of position (such as the
mean, median and percentiles) and
measures of spread (such as the range,
IQR, and standard deviation) are
divided and multiplied by that same
constant value.
Example: Multiplying by a Constant
• Given the data: 2, 4, 6, 8, 10
• Center: mean = 6, median = 6
• Spread: s = 3.2, IQR = 6
• Multiple a constant 3 to each value, new data:
6, 12, 18, 24, 30
• New center: mean = 18, median = 18
• New spread: s = 9.6, IQR = 18
• Effects of multiplying each value by a constant
• Center increases by a factor of the constant
(times 3)
• Spread increases by a factor of the constant
(times 3)
• Shape of the distribution does not change
Rescaling Data (cont.)
• The men’s weight data set measured weights in
kilograms. If we want to think about these weights in
pounds, we would rescale the data:
Summary of Effect of a Linear
Transformation
• Multiplying each observation by a positive
number b multiples both measures of
center (mean and median) and measures
of spread (IQR and standard deviation) by
b.
• Adding the same number a (either positive
or negative) to each observation adds a to
measures of center and to quartiles, but
does not change measures of spread.
• Linear transformations do not change the
shape of a distribution.
Back to z-scores
• Standardizing data into z-scores shifts the
data by subtracting the mean and rescales
the values by dividing by their standard
deviation.
• Standardizing into z-scores does not
change the shape of the distribution.
• Standardizing into z-scores changes
the center by making the mean 0.
• Standardizing into z-scores changes
the spread by making the standard
deviation 1.
Standardizing Data into z-scores
Standardizing Data into
z-scores
When Is a z-score BIG?
• A z-score gives us an indication of how
unusual a value is because it tells us how
far it is from the mean.
• A data value that sits right at the mean,
has a z-score equal to 0.
• A z-score of 1 means the data value is 1
standard deviation above the mean.
• A z-score of –1 means the data value is 1
standard deviation below the mean.
When Is a z-score BIG?
• How far from 0 does a z-score have to be
to be interesting or unusual?
• There is no universal standard, but the
larger a z-score is (negative or positive),
the more unusual it is.
• Remember that a negative z-score tells us
that the data value is below the mean,
while a positive z-score tells us that the
data value is above the mean.
When Is a z-score Big? (cont.)
• There is no universal standard for zscores, but there is a model that shows up
over and over in Statistics.
• This model is called the Normal model
(You may have heard of “bell-shaped
curves.”).
• Normal models are appropriate for
distributions whose shapes are unimodal
and roughly symmetric.
• These distributions provide a measure of
how extreme a z-score is.
Smooth Curve (model) vs Histogram
• Sometimes the overall pattern is
so regular that it can be described
by a Smooth Curve.
• Can help describe the location of
individual observations within the
distribution.
Smooth Curve (model) vs Histogram
• The distribution of a histogram depends on the choice of
classes, while with a smooth curve it does not.
• Smooth curve is a mathematical model of the
distribution.
• How?
• The smooth curve describes what proportion of the
observations fall in each range of values, not the
frequency of observations like a histogram.
• Area under the curve represents the proportion of
observations in an interval.
• The total area under the curve is 1.
Smooth Curve or Mathematical Model
• Always on or above the horizontal axis.
• Total Area under curve = 1
Area underneath curve=1
Normal Distributions
(normal Curves)
• One Particular class of distributions or
model.
1. Symmetric
2. Single Peaked
3. Bell Shaped
• All have the same overall shape.
DESCRIBING A NORMAL
DISTRIBUTION
The exact curve for a particular normal distribution is
described by its Mean (μ) and Standard Deviation (σ).
μ located at the center of
the symmetrical curve
σ controls
the spread
Notation: N(μ,σ)
More Normal Distribution
• The Mean (μ) is located at the center of
the single peak and controls location of the
curve on the horizontal axis.
• The standard deviation (σ) is located at the
inflection points of the curve and controls
the spread of the curve.
Inflection Points
• The point on the curve where the curve changes from
falling more steeply to falling less steeply (change in
curvature – concave down to concave up).
Inflection point
Inflection point
• Located one standard deviation (σ) from the mean (μ).
Are not Normal Curves
• Why
a)
b)
c)
d)
Normal curve gets closer and closer to the
horizontal axis, but never touches it.
Normal curve is symmetrical.
Normal curve has a single peak.
Normal curve tails do not curve away from the
horizontal axis.
When Is a z-score Big? (cont.)
• There is a Normal model for every possible combination
of mean and standard deviation.
• We write N(μ,σ) to represent a Normal model with a
mean of μ and a standard deviation of σ.
• We use Greek letters because this mean and standard
deviation are not numerical summaries of the data. They
are part of the model. They don’t come from the data.
They are numbers that we choose to help specify the
model.
• Such numbers are called parameters of the model.
When Is a z-score Big? (cont.)
• Summaries of data, like the sample mean
and standard deviation, are written with
Latin letters. Such summaries of data are
called statistics.
• When we standardize Normal data, we still
call the standardized value a z-score, and
we write
z
y

When Is a z-score Big? (cont.)
• Once we have standardized, we need only
one model:
• The N(0,1) model is called the standard
Normal model (or the standard Normal
distribution).
• Be careful—don’t use a Normal model for
just any data set, since standardizing does
not change the shape of the distribution.
Standardizing Normal
Distributions
• All normal distributions are the same
general shape and share many common
properties.
• Normal distribution notation: N(μ,σ).
• We can make all normal distributions the
same by measuring them in units of
standard deviation (σ) about the mean (μ).
• This is called standardizing and gives us
the Standard Normal Curve.
Standardizing & Z - SCORES
• We can standardize a variable that has a
normal distribution to a new variable that
has the standard normal distribution using
the formula:
Substitute your
variable as y
z
BAM! Pops out
your z-score
y

Then divide by your
Standard Deviation
Subtract the mean
from your variable
Standardize a Normal Curve to the Standard Normal Curve
y
y

The Standard Normal Distribution
•
•
•
•
•
Shape – normal curve
Mean (μ) = 0
Standard Deviation (σ) = 1
Horizontal axis scale – Z score
No vertical axis
Z-SCORE
z
y

Standard Normal Distribution
N(μ,σ)
When Is a z-score Big? (cont.)
• When we use the Normal model, we are
assuming the distribution is Normal.
• We cannot check this assumption in
practice, so we check the following
condition:
• Nearly Normal Condition: The shape of
the data’s distribution is unimodal and
symmetric.
• This condition can be checked with a
histogram or a Normal probability plot
(to be explained later).
The 68-95-99.7 Rule (Empirical Rule)
• Normal models give us an idea of how
extreme a value is by telling us how likely
it is to find one that far from the mean.
• We can find these numbers precisely, but
until then we will use a simple rule that
tells us a lot about the Normal model…
The 68-95-99.7 Rule (cont.)
• It turns out that in a Normal model:
• about 68% of the values fall within one standard deviation
of the mean; (µ – σ to µ + σ)
• about 95% of the values fall within two standard
deviations of the mean; (µ – 2σ to µ + 2σ ) and,
• about 99.7% (almost all!) of the values fall within three
standard deviations of the mean. (µ – 3σ to µ + 3σ)
The 68-95-99.7 Rule (cont.)
• The following shows what the 68-95-99.7
Rule tells us:
More 68-95-99.7% Rule
Using the 68-95-99.7 Rule
• SOUTH AMERICAN RAINFALL
• The distribution of rainfall in South
American countries is approximately
normal with a (mean) µ = 64.5 cm and
(standard deviation) σ = 2.5 cm.
• The next slide will demonstrate the
empirical rule of this application.
N(64.5,2.5)
• 68% of the countries receive rain fall between 64.5(μ) –
2.5(σ) cm (62) and 64.5(μ)+2.5(σ) cm (67).
• 68% = 62 to 67
• 95% of the countries receive rain fall between 64.5(μ) –
5(2σ) cm (59.5) and 64.5 (μ) + 5(2σ) cm (69.5).
• 95% = 59.5 to 69.5
• 99.7% of the countries receive rain fall between 64.5(μ)
– 7.5(3σ) cm (57) and 64.5(μ) + 7.5(3σ) cm (72).
• 99.7% = 57 to 72
The middle 68% of
the countries (µ ± σ)
have rainfall between
62 – 67 cm
The middle 95% of
the countries (µ ± 2σ)
have rainfall between
59.5 – 69.5 cm
Almost all of
the data
(99.7%) is
within 57 – 72
cm (µ ± 3σ)
Example: IQ Test
• The scores of a referenced
population on the IQ Test are
normally distributed with μ=100 and
σ=15.
1) Approximately what percent of
scores fall in the range from 70 to
130?
2) A score in what range would
represent the top 16% of the
scores?
Example: IQ Test
μ=100
σ=15
1) 70 to 130 is μ±2σ, therefore it would 95%
of the scores.
2) The top 16% of the scores is one σ above
the μ, therefore the score would be 115.
Your Turn:
• Runner’s World reports that the times of
the finishes in the New York City 10-km
run are normally distributed with a mean of
61 minutes and a standard deviation of 9
minutes.
1) Find the percent of runners who take
more than 70 minutes to finish.
16%
2) Find the percent of runners who finish in
less than 43 minutes.
2.5%
The First Three Rules for Working
with Normal Models
• Make a picture.
• Make a picture.
• Make a picture.
• And, when we have data, make a
histogram to check the Nearly Normal
Condition to make sure we can use the
Normal model to model the distribution.
Finding Normal Percentiles by
Hand
• When a data value doesn’t fall exactly 1, 2,
or 3 standard deviations from the mean,
we can look it up in a table of Normal
percentiles.
• Table Z in Appendix D provides us with
normal percentiles, but many calculators
and statistics computer packages provide
these as well.
Finding Normal Percentiles by Hand (cont.)
• Table Z is the standard Normal table. We have to convert
our data to z-scores before using the table.
• The figure shows us how to find the area to the left when
we have a z-score of 1.80:
Standard Normal
Distribution Table
• Gives area under the
curve to the left of a
positive z-score.
• Z-scores are in the 1st
column and the 1st
row
• 1st column – whole
number and first
decimal place
• 1st row – second
decimal place
Standard Normal
Distribution Table
• Also gives areas to the
left of negative z-scores.
• The curve is
symmetrical, therefore
the area to the left of a
negative z-score is the
same as the area to the
right of the same positive
z-score.
Table Z
• The table entry for each value z is the area
under the curve to the LEFT of z.
USING THE Z TABLE
•
1.
2.
3.
You found your z-score
to be 1.40 and you
want to find the area to
the left of 1.40.
Find 1.4 in the left-hand
column of the Table
Find the remaining digit
0 as .00 in the top row
The entry opposite 1.4
and under .00 is
0.9192. This is the area
we seek: 0.9192
Other Types of Tables
Using Left-Tail Style Table
1. For areas to the left of a specified z value, use the table
entry directly.
2. For areas to the right of a specified z value, look up the
table entry for z and subtract the area from 1. (can also
use the symmetry of the normal curve and look up the
table entry for –z).
3. For areas between two z values, z1 and z2 (where z2 > z1),
subtract the table area for z1 from the table area for z2.
More using Table Z (left tailed table)
Use table directly
Example: Find Area Greater
Than a Given Z-Score
• Find the area from the standard normal
distribution that is greater than -2.15
THE ANSWER IS 0.9842
• Find the corresponding Table Z value using
the z-score -2.15.
• The table entry is 0.0158
• However, this is the area to the left of -2.15
• We know the total area of the curve = 1, so
simply subtract the table entry value from 1
• 1 – 0.0158 = 0.9842
• The next slide illustrates these areas
Practice using Table A to find areas under
the Standard Normal Curve
1. z<1.58
2. z<-.93
3. z>-1.23
4. z>2.48
5. .5<z<1.89
6. -1.43<z<1.43
1. .9429 (directly from table)
2. .1762 (directly from table)
3. .8907 (1-.1093 z<-1.23 or
use symmetry z<1.23)
4. .0066 (1-.9934 z<2.48 or
use symmetry z<-2.48)
5. .2791 (z<1.89=.9706 –
z<.5=.6915)
6. .8472 (z<1.43=.9236 – z<1.43=>0764)
CAUTION!
• The average statistics student will look up
a z-value in Table Z and use the entry
corresponding to that z-value, not paying
attention to if the problem asks for the area
to the right or to the left of that z-value
• BUT, YOU as an AP stats student should
always be more meticulous and make sure
your answer is reasonable in the context of
the problem
Using the TI-83/84 to Find the Area
Under the Standard Normal Curve
• Under the DISTR menu, the 2nd entry is
“normalcdf”.
• Calculates the area under the Standard Normal
Curve between two z-scores (-1.43<z<.96).
• Syntax normalcdf(lower bound, upper bound).
Upper and lower bounds are z-scores.
• If finding the area > or < a single z-score use a
large positive value for the upper bound (ie.
100) and a large negative value for the lower
bound (ie. -100) respectively.
Practice use the TI-83/84 to find areas
under the standard normal curve
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
z>-2.35 and z<1.52
.85<z<1.56
-3.5<z<3.5
0<z<1
z<1.63
z>.85
z>2.86
z<-3.12
z>1.5
z<-.92
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
.9264
.1383
.9995
.3413
.9484
.1977
.0021
.0009
.0668
.1789
Using TI-83/84 to Find Areas Under the
Standard Normal Curve Without Z-Scores
• The TI-83/84 can find areas under the
standard normal curve without first changing
the observation x to a z-score
• normalcdf(lower bound, upper bound, mean,
standard deviation) If finding area < or > use
very large observation value for the lower and
upper bound receptively.
• Example: N(136,18) 100<x<150
• Answer: .7589
• Example: N(2.5,.42) x>3.21
• Answer: .0455
Procedure for Finding Normal Percentiles
1. State the problem in terms of the observed
variable y.
• Example : y > 24.8
2. Standardize y to restate the problem in terms
of a z-score.
• Example: z > (24.8 - μ)/σ, therefore z > ?
3. Draw a picture to show the area under the
standard normal curve to be calculated.
4. Find the required area using Table Z or the
TI-83/84 calculator.
Example 1:
• The heights of men are approximately
normally distributed with a mean of 70 and
a standard deviation of 3. What proportion
of men are more than 6 foot tall?
Answer:
1. State the problem in terms of y. (6’=72”)
y  72
2. Standardize and state in terms of z.
y
72  70
z
z
 .67

3
3. Draw a picture of the area under the curve to be
calculated.
4. Calculate the area under the curve.
Example 2:
• Suppose family incomes in a town are
normally distributed with a mean of $1,200
and a standard deviation of $600 per
month. What are the percentage of
families that have income between $1,400
and $2,250 per month?
Answer:
1. State the problem in terms of y.
1400  y  2250
2. Standardize and state in terms of z.
1400  1200
2250  1200
z
600
600
 .33  z  1.75
3. Draw a picture.
4. Calculate the area.
Your Turn:
• The Chapin Social Insight (CSI) Test
evaluates how accurately the subject
appraises other people. In the reference
population used to develop the test, scores
are approximately normally distributed with
mean 25 and standard deviation 5. The
range of possible scores is 0 to 41.
1. What percent of subjects score above a
32 on the CSI Test?
2. What percent of subjects score at or
below a 13 on the CSI Test?
3. What percent of subjects score between
16 and 34 on the CSI Test?
Solution:
1) What percent of subjects score above a
32 on the CSI Test?
1. y>32
32  25
 1.4
2. z 
5
3. Picture
4. 8.1%
Solution:
2) What percent of subjects score at or
below a 13 on the CSI Test?
1) y≤13
13  25
 2.4
2) z 
5
3) Picture
4) .82%
Solution:
3) What percent of subjects score between
16 and 34 on the CSI Test?
1) 16<y<34
2) 16  25  z  34  25 ,
5
5
3) Picture
4) 92.8%
 1.8  z  1.8
From Percentiles to Scores: z in
Reverse
• Sometimes we start with areas and need
to find the corresponding z-score or even
the original data value.
• Example: What z-score represents the first
quartile in a Normal model?
z in Reverse
• Given a normal distribution proportion (area under the
standard normal curve), find the corresponding
observation value.
• Table Z – find the area in the table nearest the given
proportion and read off the corresponding z-score.
• TI-83/84 Calculator – Use the DISTR menu, 3rd entry
invNorm. Syntax for invNorm(area,[μ,σ]) is the area to
the left of the z-score (or Observation y) wanted (left-tail
area).
From Percentiles to Scores: z in
Reverse (cont.)
• Look in Table Z for an area of 0.2500.
• The exact area is not there, but 0.2514 is
pretty close.
• This figure is associated with z = –0.67, so
the first quartile is 0.67 standard deviations
below the mean.
Inverse Normal Practice
Proportion (area
under curve, left tail)
Using Table Z
1. .3409
2. .7835
3. .9268
4. .0552
Using TI-83/84
1. .3409
2. .7835
3. .9268
4. .0552
Z-Score
Using Table Z
1. Z = -.41
2. Z = .78
3. Z = 1.45
4. Z = -1.60
Using the TI-83/84
1. Z = -.4100
2. Z = .7841
3. Z = 1.4524
4. Z = -1.5964
Procedure for Inverse Normal
Proportions
1. Draw a picture showing the given
proportion (area under the curve).
2. Find the z-score corresponding to the
given area under the curve.
3. Unstandardize the z-score.
4. Solve for the observational value y and
answer the question.
Example 1: SAT VERBAL
SCORES
• SAT Verbal scores are approximately
normal with a mean of 505 and a standard
deviation of 110
• How high must a student score in order to
place in the top 10% of all students taking
the verbal section of the SAT.
Analyze the Problem and
Picture It.
• The problem wants to know the SAT score
y with the area 0.10 to its right under the
normal curve with a mean of 505 and a
standard deviation of 110. Well, isn't that
the same as finding the SAT score y with
the area 0.9 to its left? Let's draw the
distribution to get a better look at it.
1. Draw a picture showing the given
proportion (area under the curve).
y=505
y=?
2.
Find Your Z-Score
1. Using Table Z - Find the entry closest to
0.90. It is 0.8997. This is the entry
corresponding to z = 1.28. So z = 1.28 is
the standardized value with area 0.90 to
its left.
2. Using TI-83/84 – DISTR/invNorm(.9). It is
1.2816.
3. Unstandardize
• Now, you will need to unstandardize to
transform the solution from the z, back to
the original y scale. We know that the
standardized value of the unknown y is z =
1.28. So y itself satisfies:
y  505
 1.28
110
4. Solve for y and Summarize
• Solve the equation for y:
y  505  (1.28)(110)  645.8
• The equation finds the y that lies 1.28 standard
deviations above the mean on this particular normal
curve. That is the "unstandardized" meaning of z = 1.28.
• Answer: A student must score at least 646 to place in the
highest 10%
Example 2:
• A four-year college will accept any student
ranked in the top 60 percent on a national
examination. If the test score is normally
distributed with a mean of 500 and a
standard deviation of 100, what is the
cutoff score for acceptance?
Answer:
1. Draw picture of given proportion.
2. Find the z-score. From TI-83/84, invNorm(.4) is z = -.25.
y  500
3. Unstandardize:
0.25 
100
4. Solve for y and answer the question.
y = 475, therefore the minimum score the college will
accept is 475.
Your Turn:
• Intelligence Quotients are normally
distributed with a mean of 100 and a
standard deviation of 16. Find the 90th
percentile for IQ’s.
Answer:
1. Draw picture of given proportion.
2. Find the z-score. From TI-83/84, invNorm(.9) is z =
1.28.
y  100
3. Unstandardize: 1.28 
16
4. Solve for y and answer the question.
y = 120.48, what this means; the 90th percentile for IQ’s
is 120.48. In other words, 90% of people have IQ’s
below 120.48 and 10% have IQ’s above 120.48.
Are You Normal? How Can You
Tell?
• When you actually have your own data,
you must check to see whether a Normal
model is reasonable.
• Looking at a histogram of the data is a
good way to check that the underlying
distribution is roughly unimodal and
symmetric.
Are You Normal? How Can You
Tell? (cont.)
• A more specialized graphical display that
can help you decide whether a Normal
model is appropriate is the Normal
probability plot.
• If the distribution of the data is roughly
Normal, the Normal probability plot
approximates a diagonal straight line.
Deviations from a straight line indicate that
the distribution is not Normal.
Are You Normal? How Can You
Tell? (cont.)
• Nearly Normal data have a histogram and
a Normal probability plot that look
somewhat like this example:
Are You Normal? How Can You
Tell? (cont.)
• A skewed distribution might have a
histogram and Normal probability plot like
this:
Summary Assessing Normality
(Is The Distribution Approximately Normal)
1. Construct a Histogram or Stemplot. See if the shape of
the graph is approximately normal.
2. Construct a Normal Probability Plot (TI-83/84). A
normal Distribution will be a straight line. Conversely,
non-normal data will show a nonlinear trend.
3. Determine the proportion of observations within one,
two, and three standard deviations of the mean and
compare with the 68-95-99.7 Rule for normal
distributions.
Assess the Normality of the Following Data
• 9.7, 93.1, 33.0, 21.2, 81.4, 51.1, 43.5, 10.6,
12.8, 7.8, 18.1, 12.7
• Histogram – skewed right
• Normal Probability Plot – clearly not linear
• 68-95-99.7 Rule – mean = 32.92 & standard
deviation = 29
1. μ ± σ = 3.92-61.92 = 10 obs./12 total obs. = 83%
2. μ ± 2σ = -25.08-90.92 = 11/12 = 92%
3. μ ± 3σ = -54-119.92 = 12/12 =100%
Distribution doesn’t follow 68-95-99.7 Rule
• Distribution is not Normal.
What Can Go Wrong?
• Don’t use a Normal model when the
distribution is not unimodal and symmetric.
What Can Go Wrong? (cont.)
• Don’t use the mean and standard
deviation when outliers are present—the
mean and standard deviation can both be
distorted by outliers.
• Don’t round off too soon.
• Don’t round your results in the middle of a
calculation.
• Don’t worry about minor differences in
results.
What have we learned?
• The story data can tell may be easier to
understand after shifting or rescaling the
data.
• Shifting data by adding or subtracting
the same amount from each value
affects measures of center and position
but not measures of spread.
• Rescaling data by multiplying or
dividing every value by a constant
changes all the summary statistics—
center, position, and spread.
What have we learned? (cont.)
• We’ve learned the power of standardizing
data.
• Standardizing uses the SD as a ruler to
measure distance from the mean (zscores).
• With z-scores, we can compare values
from different distributions or values
based on different units.
• z-scores can identify unusual or
surprising values among data.
What have we learned? (cont.)
• We’ve learned that the 68-95-99.7 Rule
can be a useful rule of thumb for
understanding distributions:
• For data that are unimodal and
symmetric, about 68% fall within 1 SD
of the mean, 95% fall within 2 SDs of
the mean, and 99.7% fall within 3 SDs
of the mean.
What have we learned? (cont.)
• We see the importance of Thinking about
whether a method will work:
• Normality Assumption: We
sometimes work with Normal tables
(Table Z). These tables are based on
the Normal model.
• Data can’t be exactly Normal, so we
check the Nearly Normal Condition by
making a histogram (is it unimodal,
symmetric and free of outliers?) or a
normal probability plot (is it straight
enough?).
Assignment
• Exercises pg. 129 – 133: #1 – 19 odd, 23,
25, 29, 37, 39, 43, 45, 47
• Read Ch-7, pg. 146 - 163

similar documents