MATH 2400

Report
MATH 2400
Chapter 5 Notes
Regression Line
Uses data to create a linear equation in the form y = ax + b where
“a” is the slope of the line (unit rate of change)
“b” is the y-intercept (initial value)
Can be used generalize a set of data, to estimate a value, or predict a
value.
Example 1 (Exercise 5.1)
We expect a car’s highway gas mileage to be related to its city gas
mileage. Data for all 1040 vehicles in the government’s 2010 Fuel
Economy Guide give the regression line
HWY MPG = 6.554 + (1.016 x CITY MPG)
for predicting highway mileage from city mileage.
a) What is the slope of this line? Say in words what the numerical
value of the slope tells you.
b) What is the intercept? Explain why the value of the intercept is not
statistically meaningful.
c) Find the predicted highway mileage for a car that gets 16 mpg in
the city. Do the same for a car with city mileage 28 mpg.
Example 2 (Exercise 5.2…sort of)
You use the same bottle of body wash every day. The volume was
initially 355 ml. You estimate you use 7 ml of body wash each day.
What is the equation of the regression line for predicting the volume of
body wash left in the bottle after each day?
Least-Square Regression Line
 =  + 
Where  = 


and  =  − .
Sy represents the standard deviation of the response variable.
Sx represents the standard deviation of the explanatory variable.
r represents the correlation coefficient.
 represents the mean of the explanatory variable.
 represents the mean of the response variable.
Example 3
This table displays the data regarding 8 U.S
airports and their total number of
passengers for the year 1992 and 2005.
Use the 1992 data as the explanatory
variable and the 2005 data as the
response variable. Create a least-squares
regression line and use that line to
estimate how many passengers RaleighDurham International had in 2005 if the
airport had 4.9 million passengers in1992.
r and
2
r
• r tells us if there is a positive or negative relationship between the
explanatory variable and the response variable.
• r also tells us how strong of a relationship the variables have.
• r2 tells us what portion of the linear relationship between the
variables can be explained by the explanatory variable.
• 1 – r2 tells us what portion of the linear relationship between the
variables can not be explained by the explanatory variable.
Ex: If r = 0.6,  r2 = 0.36. 36% of the linear relationship can be
explained by the explanatory variable and 64% cannot be explained.
Ex: If r = -1,  r2 = 1. 100% of the linear relationship can be explained
by the explanatory variable and 0% cannot be explained.
Example 4
Example 5
Residuals
A residual is the difference
between an observed value of
the response variable and the
value predicted by the
regression line. That is, a
residual is the prediction error
that remains after we have
chosen the regression line:
Residual = observed y –
predicted y
=y-
Residuals…continued
A residual plot makes it easier to see
unusual observations and patterns.
The regression line is horizontal (think
about it…).
Residual Graphing
Use the following data to create a
least-squares regression line and
plot the residuals on the graph
HEIGHT
provided. AGE
0
20
1
31
2
36
3
39
4
43
5
46
6
48
7
51
8
54
9
56
CAUTION!!!
• Correlation and regression lines describe only linear relationships.
• Correlation and least-squares regression lines are not resistant to
influential data (data drastically outside the norm). We should always
plot our data and look for observations that might be influential.
• Ecological Correlation is based on averages rather than on
individuals.
Ex: There is a large positive correlation between average income and
number of years of education. The correlation is smaller if we compare
the incomes of individuals with number of years of education. The
correlation based on average income ignores the large variation in the
incomes of individuals having the same amount of education.
CAUTION!!!
Extrapolation is the use of a regression line for prediction far outside
the range of values of the explanatory variable that you used to obtain
the line.
Ex: Using the least-squares regression line for the height of the child
from ages 0-9 to predict their height at age 30.
Lurking Variables should always be thought about before drawing
conclusions based on correlation or regression.
Correlation  Causation???
NO!!!
A serious study once found that people with two cars live longer than
people who own only one car. Owning three cars is even better, and so
on. There is a substantial positive correlation between number of cars
x and length of life y.
Lurking variables?
HW 5.17
HW 5.25
HW 5.27
HW 5.29
HW 5.53

similar documents