Powerpoint 3.2 C

AP Statistics Section 3.2 C
Coefficient of Determination
A residual plot is a graphical tool for
evaluating how well a linear model fits
the data. The numerical quantity that
tells us how well the least-squares line
(LSL) does at predicting values of the
response variable y is called the
coefficient of determination
The symbol is ____.
r Some computer
packages call it “_____”.
We have seen instances where the
least-squares regression line does not
fit the data, and therefore, does not
help predict the values of the
response variable, y, as x changes. In
such cases, our “best guess” for the
value of y at any given value of x is
simply ___,
the mean of the y values.
y _____________________
The idea of r is this: How much
better is the LSL at predictions then
if we just used y as our prediction
each time?
Once again we consider the NEA vs Fat Gain
example from section 3.2 A. The LSL and the
y lines have been drawn in the residual plot to
the right. We would like to know which line
comes closer to the actual y-values?
We know that the LSL minimizes the sum
of the squared residuals.
For this data:
 residual
  ( y  yˆ )  7 . 663
We will call this ____,
SSE for sum of squared
If we use y  y to make predictions, then our
prediction errors would be the vertical distances
of the points away from the horizontal line.
For this data:  ( y  y ) 2  _________
19 . 4575 We will call
this _____,
SST for sum of squared total variation.
The difference SST-SSE (in this case
. 7545 ) shows how much the
LSL reduces the total variation in
the responses y.
We define the coefficient of
determination, r2, as the fraction of
the variation in the values of y that is
explained by the least-squares
regression line. We can calculate r2 as
r 
For the NEA vs Fat Gain data:
. 606
We have already seen how to
calculate r on our calculators (i.e.
the same way we found r). Find r2
on your calculator for the NEA vs
Fat Gain data.
. 606
A lot of factors, such as metabolism for example,
affect the variation in the y-values. We can say
60 . 6 % of the variation in fat gain is explained
by the least-squares regression line relating fat
gain and non-exercise activity. The other 39% is
individual variation among the subjects that is
not explained by the linear relationship.
Facts about Least-Squares
The distinction between
explanatory and response variables
is essential in regression. This
means we cannot reverse the roles
of the two variables to make
predictions. Be sure you know
which variable is the explanatory.
There is a close connection
between correlation and the slope
of the least-squares line. We know
b 
along the regression line, a change
in one standard deviation in x
corresponds to a change of r
standard deviations in y.
The least-squares regression line of
y on x always passes through the
point ( __,
x __
y ).
The correlation r describes the
strength of a straight-line relationship.
In the regression setting, the square of
the correlation, r2, is the fraction of
the variation in the values of y that is
explained by the least-squares
regression of y on x.

similar documents