A Broad Overview of Key Statistical Concepts

Report
Introduction to the
multiple linear regression model
Regression models with
more than one predictor (or term)
Example 1
Is brain and body size
predictive of intelligence?
• Sample of n = 38 college students
• Response (Y): intelligence based on PIQ
(performance) scores from the (revised)
Wechsler Adult Intelligence Scale.
• Potential predictor (x1): Brain size based on
MRI scans (given as count/10,000).
• Potential predictor (x2): Height in inches.
• Potential predictor (x3): Weight in pounds.
Example 1
Scatter plot matrix
130.5
91.5
PIQ
100.728
MRI
86.283
73.25
Height
65.75
170.5
Weight
127.5
.5
.5
91 130
3
8
.28 0.72
6
8
10
.7 5 3 .2 5
65
7
7.5 70.5
12
1
Scatter plot matrix
• Tells us about 2D marginal relationships
between each pair of variables without
regard to other variables.
• The challenge is how the 2D relationships
relate to how the response y depends on all
3 predictors simultaneously.
Example 1
Marginal response plots
130.5
91.5
PIQ
100.728
MRI
86.283
73.25
Height
65.75
170.5
Weight
127.5
.5
.5
91 130
3
8
.28 0.72
6
8
10
.7 5 3 .2 5
65
7
7.5 70.5
12
1
Marginal response plots
• Scatter plot of response y vs. each predictor.
• Suggest how response y depends on each
predictor without regard to other predictors.
• Provide a visual lower bound for the
goodness-of-fit that can be achieved by the
full regression model.
Example 1
A potential
multiple linear regression model
Yi  0  1 xi1  2 xi 2  3 xi 3    i
where …
• Yi is intelligence (PIQ) of student i
• xi1 is brain size (MRI) of student i
• xi2 is height (Height) of student i
• xi3 is weight (Weight) of student i
and … the independent error terms i follow a normal
distribution with mean 0 and equal variance 2.
Example 1
Potential research questions
• Which predictors explain some of the
variation in PIQ?
• What is the effect of brain size on PIQ?
• What is the PIQ of an individual with a
given brain size, height, and weight?
Predictors
• As before, the x variable. Also, called
explanatory variables or independent
variables.
• Most often numerical measurements, such
as age, weight, length, and temperature.
• But, can be categorical, such as gender,
race, and species.
Terms
Terms are functions of the predictor variables,
such as:
u1  x1 x2
u3  loge x2
u2  x
u4  x1
2
1
Linear regression model as function of terms:
Yi  0  1u1  2u2  3u3  4u4   i
Yi  0  1 x1 x2   x  3 loge x2  4 x1   i
2
2 1
Types of terms
•
•
•
•
•
The predictors themselves.
Powers of predictors.
Transformations of predictors.
Interactions.
Binary (or categorical) predictors.
Simple linear regression model
with a transformed predictor
Yi  0  1 log10 xi    i
where …
• Yi is proportion of items correctly recalled for person i
• xi is time since person i initially memorized the list
and … the independent error terms i follow a normal
distribution with mean 0 and equal variance 2.
Visualizing simple linear regression
model with a transformed predictor
Regression Plot
prop = 0.846415 - 0.182427 log10time
S = 0.0233881
R-Sq = 99.0 %
R-Sq(adj) = 98.9 %
0.9
0.8
0.7
prop
0.6
0.5
0.4
0.3
0.2
0.1
0.0
0
1
2
log10time
3
4
A first order model
with two predictors
Yi  0  1 xi1  2 xi 2    i
where …
• Yi is life of power cell i (number of cycles)
• xi1 is charge rate of power cell i (amperes)
• xi2 is ambient temperature of power cell i (celsius)
and … the independent error terms i follow a normal
distribution with mean 0 and equal variance 2.
Visualizing a first order model
with two predictors
A first order model
with more than 2 predictors
Yi  0  1 xi1  2 xi 2  3 xi 3    i
where …
• Yi is intelligence (PIQ) of student i
• xi1 is brain size (MRI) of student i
• xi2 is height (Height) of student i
• xi3 is weight (Weight) of student i
and … the independent error terms i follow a normal
distribution with mean 0 and equal variance 2.
Visualizing a first order model
with more than 2 predictors
A second order polynomial model
with one predictor
Yi  0  1 xi  11 x   i
2
i
where …
• Yi is length of bluegill (fish) i (in mm)
• xi is age of bluegill (fish) i (in years)
and … the independent error terms i follow a normal
distribution with mean 0 and equal variance 2.
Visualizing a second order
polynomial model with one predictor
Regression Plot
length = 13.6224 + 54.0493 age - 4.71866 age**2
S = 10.9061
R-Sq = 80.1 %
R-Sq(adj) = 79.6 %
200
length
150
100
1
2
3
4
age
5
6
A second order polynomial model
with 2 predictors


Yi  0  1 xi1  2 xi 2  11 xi21  22 xi22  12 xi1 xi 2   i
where …
• Yi is grade point average of student i
• xi1 is verbal test score of student i
• xi2 is math test score of student i
and … the independent error terms i follow a normal
distribution with mean 0 and equal variance 2.
Visualizing second order
polynomial model with 2 predictors
A first order model
with one binary predictor
Yi  0  1 xi1  2 xi 2    i
where …
• Yi is birth weight of baby i
• xi1 is length of gestation of baby i
• xi2 = 1, if mother smokes and xi2 = 0, if not
and … the independent error terms i follow a normal
distribution with mean 0 and equal variance 2.
Visualizing a first order model
with one binary predictor
The regression equation is
Weight = - 2390 + 143 Gest - 245 Smoking
Weight (grams)
3700
0
1
3200
2700
2200
34
35
36
37
38
39
Gestation (weeks)
40
41
42

similar documents