### Chapter 12

```Adapted by Peter Au, George Brown College
McGraw-Hill Ryerson
Part 1 Basic Multiple Regression
Part 2 Using Squared and Interaction Terms
Part 3 Dummy Variables and Advanced Statistical
Inferences (Optional)
12-2
12.1 The Multiple Regression Model
12.2 Model Assumptions and the Standard Error
12.3 The Least Squares Estimates and Point
Estimation and Prediction
12.5 The Overall F Test
12.6 Testing the Significance of an Independent
Variable
12.7 Confidence and Prediction Intervals
12-3
12.8 The Quadratic Regression Model (Optional)
12.9 Interaction (Optional)
12-4
12.10 Using Dummy Variables to Model Qualitative
Independent Variables
12.11 The Partial F Test: Testing the Significance of
a Portion of a Regression Model
12-5
Part 1
12-6
• Simple linear regression uses one independent variable to
explain the dependent variable
• Some relationships are too complex to be described using a
single independent variable
• Multiple regression models use two or more independent
variables to describe the dependent variable
• This allows multiple regression models to handle more
complex situations
• There is no limit to the number of independent variables a
model can use
• Like simple regression, multiple regression has only one
dependent variable
12-7
• The linear regression model relating y to x1, x2,…, xk is
y = my|x1,x2,…,xk + e = b0 + b1x1 + b2x2 + … + bkxk +e
where
• my|x1,x2,…,xk + e = b0 + b1x1 + b2x2 + … + bkxk is the mean
value of the dependent variable y when the values of
the independent variables are x1, x2,…, xk
• β0, β1,β2, … βkare the regression parameters relating
the mean value of y to x1, x2,…, xk
• ɛ is an error term that describes the effects on y of all
factors other than the independent variables
x1, x2,…, xk
12-8
• Consider the following data table that relates two
independent variables x1 and x2 to the dependent
variable y (table 12.1)
Data
x1
x2
y
1
28.0
18
12.4
2
28.0
14
11.7
3
32.5
24
12.4
4
39.0
22
10.8
5
45.9
8
9.4
6
57.8
16
9.5
7
58.1
1
8.0
8
62.5
0
7.5
12-9
12-10
• The plot shows that y tends to decrease in a
straight-line fashion as x1 increases
• This suggests that if we wish to predict y on the
basis of x1 only, the simple linear regression model
y = β0 + β1x1 + ɛ relates y to x1
12-11
12-12
• This plot shows that y tends to increase in a
straight-line fashion as x2 increases
• This suggests that if we wish to predict y on the
basis of x2 only, the simple linear regression model
y = β0 + β1x2 + ɛ
12-13
L01
• The experimental region is defined to be the range
of the combinations of the observed values of x1
and x2
12-14
L01
• The mean value of y when IV1 (independent
variable one) is x1 and IV2 is x2 is μy|x1, x2 (mu of y
given x1 and x2
• Consider the equation μy|x1, x2 = β0 + β1x1 + β2x2,
which relates mean y values to x1 and x2
• This is a linear equation with two variables,
geometrically this equation is the equation of a
plane in three-dimensional space
12-15
L01
12-16
L02
• We need to make certain assumptions about the
error term ɛ
• At any given combination of values of x1, x2, . . . ,
xk, there is a population of error term values that
could occur
12-17
L02
• The model is
y = my|x1,x2,…,xk + e = b0 + b1x1 + b2x2 + … + bkxk +e
• Assumptions for multiple regression are stated
about the model error terms, e’s
12-18
L02
1.
2.
3.
4.
Mean of Zero Assumption
The mean of the error terms is equal to 0
Constant Variance Assumption
The variance of the error terms s2 is, the same for every
combination values of x1, x2,…, xk
Normality Assumption
The error terms follow a normal distribution for every
combination values of x1, x2,…, xk
Independence Assumption
The values of the error terms are statistically
independent of each other
12-19
SSE 
e
2
i


2
ˆ
(y i  y i )
12-20
• This is the point estimate of the residual variance
s2
• This formula is slightly different from simple
regression
s  MSE 
2
SSE
n- k  1 
12-21
• This is the point estimate of the residual
standard deviation s
• MSE is from last slide
• This formula too is slightly different from
simple regression
s
MSE 
SSE
n- k  1 
• n-(k+1) is the number of degrees of freedom associated with the
SSE
12-22
• Using Table 12.6
• Compute the SSE to be
s  MSE 
2
SSE
n- (k  1)

0 . 674
83
 0.1348
s
s 
2
0 . 1348  0.3671
12-23
L03
• Estimation/prediction equation
yˆ  b 0  b 1 x 01  b 2 x 02  ...  b k x 0 k
• is the point estimate of the mean value of the dependent variable
when the values of the independent variables are x1, x2,…, xk
• It is also the point prediction of an individual value
of the dependent variable when the values of the
independent variables are x1, x2,…, xk
• b0, b1, b2,…, bk are the least squares point
estimates of the parameters b0, b1, b2,…, bk
• x01, x02,…, x0k are specified values of the
independent predictor variables x1, x2,…, xk
12-24
• A formula exists for computing the least squares
model for multiple regression
• This formula is written using matrix algebra and is
presented in Appendix F available on Connect
• In practice, the model can be easily computed using
Excel, MegaStat or many other computer packages
12-25
12-26
12-27
1. Total variation is given by the formula
 (y i  y )
2
2. Explained variation is given by the formula
 ( yˆ i  y )
2
3. Unexplained variation is given by the formula
2
ˆ
(y

y
)
 i i
4. Total variation is the sum of explained and unexplained
variation
5. R2 is the ratio of explained variation to total variation
R 
2
Explained variation
Total variation
12-28
L04
• The multiple coefficient of determination, R2, is
the proportion of the total variation in the n
observed values of the dependent variable that is
explained by the multiple regression model
12-29
• The multiple correlation coefficient R is just the
square root of R2
• With simple linear regression, r would take on the
sign of b1
• There are multiple bi’s in a multiple regression
model
• For this reason, R is always positive
• To interpret the direction of the relationship
between the x’s and y, you must look to the sign of
the appropriate bi coefficient
12-30
• Adding an independent variable to multiple
regression will always raise R2
• R2 will rise slightly even if the new variable has no
relationship to y
• The adjusted R2 corrects for this tendency in R2
• As a result, it gives a better estimate of the
importance of the independent variables
k  n  1 
 2
R  R 


n  1   n  (k  1) 

2
• The bar notation indicates adjusted R2
12-31
• Excel Multiple Regression Output from Table 12.1
n
k
Explained
variation
Total
variation
R 
2
24.87502
25.54875
 0 . 97363
2  8  1 

2
R   0 . 97363 
  0 . 963081

8  1   8  (2  1) 

12-32
• Hypothesis
• H0: b1= b2 = …= bk = 0 versus
• Ha: At least one of b1, b2,…, bk ≠ 0
• Test Statistic
F(model) 
(Explained variation) /k
(Unexplain ed variation) /[n - (k  1)]
• Reject H0 in favor of Ha if:
• F(model) > Fa* or
• p-value < a
*F
is based on k numerator and n-(k+1) denominator degrees of
freedom
a
12-33
• Test Statistic
F(model) 
(Explained variation) /k
(Unexplain ed variation) /[n - (k  1)]

24 . 8751 / 2
0 . 6737 /( 8  3)
 92 . 33
• F-test at a = 0.05 level of significance
• Fa is based on 2 numerator and 5 denominator degrees of freedom
F(model)  92 . 33  5 . 79  F.05
and
p - value  0 . 000  0 . 001  a
• Reject H0 at a=0.05 level of significance
12-34
• The F test tells us that at least one independent
variable is significant
• The natural question is which one(s)?
• That question will be addressed in the next section
12-35
• A variable in a multiple regression model is not
likely to be useful unless there is a significant
relationship between it and y
• Significance Test Hypothesis
• H0: bj = 0 versus
• Ha: bj ≠ 0
12-36
• If the regression assumptions hold, we can reject
H0: bj = 0 at the a level of significance (probability
of Type I error equal to a) if and only if the
appropriate rejection point condition holds
• Or, equivalently, if the corresponding p-value is
less than a
12-37
Alternative
Ha: βj ≠ 0
Reject H0 If
|t| > tα/2*
Ha: βj > 0
t > tα
Ha: βj < 0
t < –tα
* That
p Value
Twice area under t
distribution right of |t|
Area under t
distribution right of t
Area under t
distribution left of t
is t > tα/2 or t < –tα/2
tα/2, tα, and all p values are based on n - (k + 1)
degrees of freedom
12-38
• Test Statistic
t=
bj
s bj
• A 100(1-α)% confidence interval for βj is
[b j  ta 2 sb j ]
• ta, ta/2 and p-values are based on n – (k+1) degrees of
freedom
12-39
• It is customary to test the significance of every
independent variable in a regression model
• If we can reject H0: bj = 0 at the 0.05 level of
significance, then we have strong evidence that the
independent variable xj is significantly related to y
• If we can reject H0: bj = 0 at the 0.01 level of
significance, we have very strong evidence that the
independent variable xj is significantly related to y
• The smaller the significance level a at which H0 can be
rejected, the stronger is the evidence that xj is
significantly related to y
12-40
• Whether the independent variable xj is
significantly related to y in a particular regression
model is dependent on what other independent
variables are included in the model
• That is, changing independent variables can cause
a significant variable to become insignificant or
cause an insignificant variable to become
significant
• This issue is addressed in a later section on
multicollinearity
12-41
• A sales manager evaluates the performance of
sales representatives by using a multiple
regression model that predicts sales performance
on the basis of five independent variables
• x1 = number of months the representative has been employed by
the company
• x2 = sales of the company’s product and competing products in the
sales territory (market potential)
• x3 = dollar advertising expenditure in the territory
• x4 = weighted average of the company’s market share in the
territory for the previous four years
• x5 = change in the company’s market share in the territory over the
previous four years
• y = β0 + β 1x1 + β 2x2 + β 3x3 + β 4x4 + β 5x5 + ɛ
12-42
• Using MegaStat a regression model was computed using
collected data
Sbj
• The p values associated with Time, MktPoten, Adver, and
MktShare are all less than 0.01, we have very strong
evidence that these variables are significantly related to y
and, thus, are important in this model
• The p value associated with Change is 0.0530, suggesting
weaker evidence that this variable is important
12-43
L06
• The point on the regression line corresponding to a
particular value of x01, x02,…, x0k, of the independent
variables is
yˆ  b 0  b 1 x 01  b 2 x 02  ...  b k x 0 k
• It is unlikely that this value will equal the mean value of y
for these x values
• Therefore, we need to place bounds on how far the
predicted value might be from the actual value
• We can do this by calculating a confidence interval for
the mean value of y and a prediction interval for an
individual value of y
12-44
L06
• Both the confidence interval for the mean value of y
and the prediction interval for an individual value of y
employ a quantity called the distance value
• With simple regression, we were able to calculate the
distance value fairly easily
• However, for multiple regression, calculating the
distance value requires matrix algebra
• See Appendix F on Connect for more details
12-45
L06
• Assume that the regression assumptions hold
• The formula for a 100(1-a) confidence interval for
the mean value of y is as follows:
[ yˆ  t a /2 s ( y  yˆ ) ] s ( y  yˆ )  s Distance value
• This is based on n-(k+1) degrees of freedom
12-46
• Assume that the regression assumptions hold
• The formula for a 100(1-a) prediction interval for
an individual value of y is as follows:
[ yˆ  t a /2 s yˆ ],
s yˆ  s 1 + Distance value
• This is based on n-(k+1) degrees of freedom
12-47
Data
Sales
Time MktPoten
3669.88
43.10 74065.11
4582.88
3473.95
108.13 58117.30
5539.78
5.51
0.15
2295.10
13.82 21118.49
2950.38
10.91
-0.72
4675.56
186.18 68521.27
2243.07
8.27
0.17
6125.96
161.79 57805.11
7747.08
9.15
0.50
2.51
Change
0.34
2134.94
8.94 37806.94
402.44
5.51
0.15
5031.66
365.04 50935.26
3140.62
8.54
0.55
3367.45
220.32 35602.08
2086.16
7.07
-0.49
6519.45
127.64 46176.77
8846.25
12.54
1.24
4876.37
105.69 42053.24
5673.11
8.85
0.31
2468.27
57.72 36829.71
2761.76
5.38
0.37
2533.31
23.58 33612.67
1991.85
5.43
-0.65
2408.11
13.82 21412.79
1971.52
8.48
0.64
2337.38
13.82 20416.87
1737.38
7.80
1.01
4586.95
86.99 36272.00 10694.20
10.34
0.11
2729.24
165.85 23093.26
8618.61
5.15
0.04
3289.40
116.26 26879.59
7747.89
6.64
0.68
2800.78
42.28 39571.96
4565.81
5.45
0.66
3264.20
52.84 51866.15
6022.70
6.31
-0.10
3453.62
165.04 58749.82
3721.10
6.35
-0.03
1741.45
10.57 23990.82
860.97
7.37
-1.63
2035.75
13.82 25694.86
3571.51
8.39
-0.43
1578.00
8.13 23736.35
2845.50
5.15
0.04
4167.44
58.54 34314.29
5060.11
12.88
0.22
2799.97
21.14 22809.53
3552.00
9.14
-0.74
12-48
• Using The Sales Territory Performance Case
• The point prediction of the sales corresponding to;
•
•
•
•
•
TIME = 85.42
MktPoten = 35182.73
Mothered = 9.64
Change = 0.28
• Using the regression model from before;
• ŷ = -1,113.7879 + 3.6121(85.42) + 0.0421(35,182.73) + 0.1289(7,281.65) +
256.9555(9.64) + 324.5334(0.28) = 4,181.74 (that is, 418,174 units)
• This point prediction is given at the bottom of the
MegaStat output in Figure 12.7, which we repeat here:
12-49
12-50
L06
• 95% Confidence Interval
[ yˆ  t a /2 s Distance
[4181.74
value ]
 ( 2.093)( 430 . 232 ) 0.109 ]
[ 4181 . 74  296 . 829 ]
[ 3884 . 91, 4478 . 58 ]
• 95% Prediction Interval
[ yˆ  t a /2 s 1  Distance
[4181.74
value ]
 ( 2.093)( 430 . 232 ) 1  0.109 ]
[ 4181 . 74  948 . 137 ]
[ 3233 . 60 , 5129 . 88 ]
12-51
Part 2
12-52
• One useful form of linear regression is the quadratic
regression model
• Assume that we have n observations of x and y
• The quadratic regression model relating y to x is
y = b0 + b1x + b2x2 + e, where
• b0 + b1x + b2x2 is the mean value of the dependent variable y
when the value of the independent variable is x
• b0, b1, and b2 are unknown regression parameters relating the
mean value of y to x
• e is an error term that describes the effects on y of all factors
other than x and x2
Next Section
Next Part
12-53
12-54
• Even though the quadratic model employs the
squared term x2 and, as a result, assumes a curved
relationship between the mean value of y and x,
this model is a linear regression model
• This is because b0 + b1x + b2x2 expresses the mean
value y as a linear function of the parameters b0,
b1, and b2
• As long as the mean value of y is a linear function
of the regression parameters, we have a linear
regression model
12-55
• The human resources department administers a
stress questionnaire to 15 employees in which
people rate their stress level on a 0 (no stress) to 4
(high stress) scale
• Work performance was measured as the average
number of projects completed by the employee
per year, averaged over the last five years
12-56
12-57
^
y  25 . 7152  4 . 9762 x  1 . 01905 x
2
12-58
• We have only looked at the simple case where we
have y and x
• That gave us the following quadratic regression model
y = b0 + b1x + b2x2 + e
• However, we are not limited to just two terms
• The following would also be a valid quadratic
regression model
y = b0 + b1x1 + b2x12 + b3x2 + b4x3 + e
12-59
• Multiple regression models often contain
interaction variables
• These are variables that are formed by multiplying
two independent variables together
• For example, x1·x2
• In this case, the x1·x2 variable would appear in the
model along with both x1 and x2
• We use interaction variables when the relationship
between the mean value of y and one of the
independent variables is dependent on the value
of another independent variable
Next Section
Next Part
12-60
• Consider a company that runs both radio and
• It is reasonable to assume that raising either ad
amount would raise sales
• However, it is also reasonable to assume that the
effectiveness of television ads depends, in part, on
• Thus, an interaction variable would be appropriate
12-61
12-62
12-63
12-64
• These last two figures imply that the more is spent
on one type of advertising, the smaller the slope
for the other type of advertising
• The is, the slope of one line depends on the value
on the other variable
• That says that there is interaction between x1 and
x2
12-65
• Froid Frozen Foods Experiment
12-66
• It is fairly easy to construct data plots to check for
interaction when a careful experiment is carried
out
• It is often not possible to construct the necessary
plots with less structured data
• If an interaction is suspected, we can include the
interactive term and see if it is significant
12-67
• When an interaction term (say x1x2) is important to
a model, it is the usual practice to leave the
corresponding linear terms (x1 and x2) in the model
no matter what their p-values
12-68
Part 3
12-69
• So far, we have only looked at including
quantitative data in a regression model
• However, we may wish to include descriptive
qualitative data as well
• For example, might want to include the sex of
respondents
• We can model the effects of different levels of a
qualitative variable by using what are called
dummy variables
• Also known as indicator variables
12-70
• A dummy variable always has a value of either 0 or
1
• For example, to model sales at two locations,
would code the first location as a zero and the
second as a 1
• Operationally, it does not matter which is coded 0 and which is
coded 1
12-71
• Suppose that Electronics World, a chain of stores
that sells audio and video equipment, has
gathered the data in Table 12.13
• These data concern store sales volume in July of
last year (y, measured in thousands of dollars), the
number of households in the store’s area (x,
measured in thousands), and the location of the
store
12-72
• Location Dummy
Variable
DM
 1 if a store is in a mall location
 
 0 otherwise
12-73
12-74
12-75
• Consider having three categories, say A, B, and C
• Cannot code this using one dummy variable
• A=0, B=1, and C=2 would be invalid
• Assumes the difference between A and B is the same as B
and C
• We must use multiple dummy variables
• Specifically, a categories requires a-1 dummy variables
• For A, B, and C, would need two dummy variables
• x1 is 1 for A, zero otherwise
• x2 is 1 for B, zero otherwise
• If x1 and x2 are zero, must be C
• This is why the third dummy variable is not needed
12-76
• Geometrical Interpretation of the Sales Volume
Model y = β0 1 β1x + β2DM + β3xDM + ɛ
12-77
12-78
• So far, have only considered dummy variables as standalone variables
• Model so far is
y = b0 + b1x + b2D + e, where D is dummy variable
• However, can also look at interaction between dummy
variable and other variables
• That model would take the for
y = b0 + b1x + b2D + b3xD+ e
• With an interaction term, both the intercept and slope are
shifted
12-79
• So far, we have seen dummy variables used to
code categorical variables
• Dummy variables can also be used to flag unusual
events that have an important impact on the
dependent variable
• These unusual events can be one-time events
• Impact of a strike on sales
• Impact of major sporting event coming to town
• Or they can be reoccurring events
• Hot temperatures on soft drink sales
• Cold temperatures on coat sales
12-80
• So far, we have looked at testing single slope
coefficients using t test
• We have also looked at testing all the coefficients
at once using F test
• The partial F test allows us to test the significance
of any set of independent variables in a regression
model
12-81
• We can use this F test to test the significance of a
portion of a regression mode
12-82
• The model: y = b0 + b1x + b2DM + b3DD + e
• DM and DD are dummy variables
• This called the complete model
• Will now look at just the reduced model:
y = b0 + b1x + e
• Hypothesis to test
• H0: b2 = b3 = 0 verus
Ha: At least one of b2 and b3 does not equal zero
• The SSE for the complete model is SSEC = 443.4650
• The SSE for the reduced model is SSER = 2,467.8067
12-83
L05
F 

 SSE R
 / k  g 
SSE c / n   k  1 
 SSE
2 ,467 . 8067
c
 443 . 4650  / 2
443 . 4650 / 15  4 
 25 . 1066
• We compare F with F.01 = 7.21
• Based on k – g = 2 numerator degrees of freedom
• And n – (k + 1) = 11 denominator degrees of freedom
• Note that k – g denotes the number of regression parameters set
to 0
• Since F = 25.1066 > 7.21 we reject the null hypothesis at a = 0.01
• We conclude that it appears as though at least two locations have
different effects on mean sales volume
12-84
• The multiple regression model employs at least 2 independent
variables to relate to the dependent variable
• Some ways to judge a models overall utility are; standard error,
multiple coefficient of determination, adjusted multiple coefficient of
determination, and the overall F test
• Square terms can be used to model quadric relationships while cross
product terms can be used to model interaction relationships
• Dummy variables can use used to model qualitative independent
variables
• The partial F test can be used to evaluate a portion of the regression
model