### ECONOMETRICS I

```ECONOMETRICS I
CHAPTER 1: THE NATURE OF
REGRESSION ANALYSIS
Textbook: Damodar N. Gujarati (2004) Basic Econometrics,
4th edition, The McGraw-Hill Companies
HISTORICAL ORIGIN OF THE TERM
REGRESSION
• The term regression is introduced by Francis
Galton.
• He found that, although there was a tendency for
tall parents to have tall children and for short
parents to have short children, the average height
of children born of parents of a given height
tended to move or “regress” toward the averge
height in the population as a whole. This
tendency is called Galton’s law of universal
regression.
THE MODERN INTERPRETATION OF
REGRESSION
• Regression analysis is concerned with the study of the
dependence of one variable, the dependent variable,
on one or more other variables, the explanatory
variables, with a view to estimating and/or predicting
the (population) mean or average value of the former
in terms of the known or fixed (in repeated sampling)
values of the latter.
Examples of Regression Analysis
1. Reconsider Galton’s law of universal
regression.
We want to find out how the average height
of sons changes, given the father’s height.
Look at the scatter diagram or scattergram
on the next slide.
Figure 1.1 Hypothetical distribution of sons’ heights
corresponding to given heights of fathers.
Examples of Regression Analysis
2. Consider the heights of boys measured at
fixed ages.
Notice that corresponding to any given age we
have a range of heights. Therefore, knowing
the age, we may be able to predict the
average height corresponding to that age.
Figure 1.2 Hypothetical distribution of heights
corresponding to selected ages.
Examples of Regression Analysis
5. A labor economist may want to study the rate
of change of money wages in relation to the
unemployment rate.
Figure 1.3
Examples of Regression Analysis
6. From monetary economics it is known that, other things remaining
the same, the higher the rate of inflation π, the lower the
proportion k of their income that people would want to hold in the
form of money, as depicted in Figure 1.4 (next slide).
A quantitative analysis of this relationship will enable the monetary
economist to predict the amount of money, as a proportion of their
income, that people would want to hold at various rates of
inflation.
Figure 1.4 Money holding in relation to
the inflation rate π
STATISTICAL AND DETERMINISTIC
RELATIONSHIPS
• In the regression analysis we are concerned
with that what is known as the statistical, not
functional or deterministic, dependence
among variables, such as those of classical
physics.
• In statistical relationships among variables we
essentially deal with random or stochastic
variables. These variables have probability
distributions.
REGRESSION VERSUS CAUSATION
• Although regression analysis deals with the
dependence of one variable on other
variables, it does not necessarily imply
causation.
• A statistical relationship per se cannot logically
imply causation.
REGRESSION VERSUS CORRELATION
• In the correlation analysis we try to measure
the strength or degree of linear association
between two variables. The correlation
coefficient measures this strength of (linear)
association
• In regression analysis we try to estimate the
average value of one variable on the basis of
the fixed values of other variables.
REGRESSION VERSUS CORRELATION
• In correlation analysis we treat any two
variables symmetrically. There is no distinction
between variables. Both variables are
considered random.
• Most of the regression theory is based on the
assumption that the dependent variable is
stochastic but the explanatory variables are
fixed or nonstochastic.
TERMINOLOGY
Dependent variable
Explanatory variable
Explained variable
Independent variable
Predictand
Predictor
Regressand
Regressor
Response
Stimulus
Endogenous
Exogenous
Outcome
Covariate
Controlled variable
Control variable
TERMINOLOGY
• In a simple (two-variable) regression analysis
we study the dependence of a variable on
only a single explanatory variable, such as that
of consumption expenditure on real income.
• In a multiple regression analysis we study the
dependence of one variable on more than one
explanatory variable, such as that of money
demand on interest rates, income, and
inflation.
TERMINOLOGY
• The term random is a synonym for the term
stochastic. A random (stochastic) variable is a
variable that can take on any set of values,
positive or negative, with a given probability.
NOTATION
•
•
•
•
•
•
Y: dependent variable
X1, X2, … , Xk : explanatory variables
Xk : kth explanatory variable
Xki : ith observation on variable Xk (cross-sectional data)
Xkt : tth observation on variable Xk (time series data)
N (or T): the total number of observations or values in
the population.
• n (or t): the total number of observations in the
sample. (time series data)
TYPES OF DATA
• There are mainly three types of data for
empirical analysis:
1. Time series data
2. Cross sectional data
3. Pooled data
Time series data
• A time series is a set of observations on the
values that a variable takes at different times.
Cross-sectional data
• Cross-sectional data are data on one or more
variables collected at the same point in time.
GPA
study hours/week
3.5
10
2.7
8
1.9
9
2.3
5
2.0
8
2.2
6
2.5
3
Pooled data
• In the pooled data there are elements of both
time and cross-sectional data.
time
GPA
study hs/week
2000
2.5
9
2000
2.7
8
2000
2.3
6
2005
1.9
5
2005
3.1
12
2010
2.4
7
2010
2.0
5
2010
3.9
11
2010
1.2
2
• Panel data is a special type of pooled data in
which the same cross-sectional unit is
surveyed over time.
person
time
GPA
study
hs/week
1
2010
2.5
9
1
2011
2.7
7
1
2012
2.3
6
2
2010
1.9
8
2
2011
3.1
12
2
2012
2.4
6
3
2010
2.0
5
3
2011
3.9
11
3
2012
1.2
2
Sources of Data
• Government agencies (Department of
Commerce...)
• International agencies (World Bank...)
• Surveys
In the social sciences the data that one generally
obtains are nonexperimental in nature, that is, not
subject to the control of the researcher.
The quality of data which are used in
economics is often not that good.
1. Possibility of observational errors.
2. Approximations and roundoffs.
3. Nonresponce to surveys may cause
selectivity bias.
4. The sampling method used in obtaining the
data may vary so widely that it might be very
difficult to compare them.
5. Economic data are generally available at a
highly aggregate level. Such highly aggregated
data may not tell us much about the individual
or micro level units (GNP...) .
6. Because of confidentiality, certain data can be
published only in highly aggregate form
(health data...).
The researcher should always keep in mind that
the results of research are only as good as
the quality of data.
```