### Chapter 2 in Undergraduate Econometrics

```2.1
Chapter 2: Probability
Random Variable (r.v.) is a variable whose
value is unknown until it is observed. The value
of a random variable results from an experiment.
Experiments can be either controlled (laboratory)
or uncontrolled (observational). Most economic
variables are random and are the result of
uncontrolled experiments.
Random Variables
2.2
A discrete random variable can take on only a finite number of values such as
•
•
•
•
The number of visits to a doctor’s office
Number of children in a household
Flip of a coin
Dummy (binary) variable: D=0 if male, D=1 if
female
A continuous random variable can take any real value (not just whole numbers)
in an interval on the real number line such as:
• Gross Domestic Product next year
• Price of a share in Microsoft
• Interest rate on a 30 year mortgage
Probability Distributions of Random Variables
• All random variables have probability distributions that
describe the values the random variable can take on and the
associated probabilities of these values.
• Knowing the probability distribution of random variable
gives us some indication of the value the r.v. may take on.
2.3
2.4
Probability Distribution for Discrete Random Variable
Expressed as a table, graph or function
1. Suppose X = # of tails when a coin is flipped twice. X can take on
the values 0, 1 or 2. Let f(x) be the associated probabilities:
Table
Graph
X f(x)
0 0.25
f (x)
1 0.50
2 0.25
0.50
Probability is
represented as height
on this bar graph
0.25
0
1
2
x
2.5
2. Suppose X is a binary variable that can take on two values: 0 or 1.
Furthermore, assume P(X=1) = p and P(X=0) = (1-p)
Function:
P(X=x) = f(x) = px(1-p)1-x for X = 0, 1
Table
X f(x)
0 (1-p)
1 p
Suppose p = 0.10
Then X takes on 0 with probability
0.90 and X takes on 1 with
probability 0.10
Facts about discrete probability distribution functions
1.
Each probability P(X=x) = f(x) must
lie between 0 and 1: 0  f(x)  1
2. The sum of the probabilities must be 1.
If X can take on n different values
then:
f(x1) + f(x2)+. . .+f(xn) = 1
2.6
2.7
Probability Distribution (Density)for Continuous Random
Variables
Expressed as a function or graph.
Continuous r.v.’s can take on an infinite number of values in a given interval
– A table isn’t appropriate to express pdf
EX: f(x) = 2x for 0  x  1
=0
otherwise
2.8
Because a continuous random variable has an uncountably infinite number of values,
the probability of one occurring is zero.
P(X = a) = 0
Instead, we ask “What is the probability that X is between a and b?
P[a < X < b] = ?
In an experiment, the probability P[a < X < b] is the proportion of the time, in many
experiments, that X will fall between a and b.
2.9
Probability is represented as area under the function.
Total area must
f(x)
be 1.0
Area of triangle
2
is 1.0
1
Probability that x lies between 0 and 1/2
P [ 0  X  1/2 ] = 0.25
[Area of any triangle is ½*Base*Height]
1/2
1
x
2.10
Uniform Random Variable: u is distributed uniformly between a and b
• p.d.f. is a line between a and b of height 1/(b-a)
• f(u) = 1/(b – a) if a  u  b
= 0
otherwise
EX: Spin a dial on a clock
a = 0 and b = 12
Find the probability that
u lies between 1 and 2
f(u)
1/12
0 1 2
12
u
2.11
In calculus, the integral of a function defines the
area under it:
P[aXb]=

b
f(x) dx
a
For continuous random variables it is the
area under f(x), and not f(x) itself, which
defines the probability of an event. We will NOT be
integrating functions; when necessary we use tables
and/or computers to calculate the necessary
probability (integral).
2.12
Rules of Summation
n
Rule 1:
Rule 2:
Rule 3:
xi = x1 + x2 + . . . + xn

i=1
n
a = na

i=1
 axi = a  xi
n
Rule 4:
n
n
i=1
i=1
xi + yi =  xi +  yi

i=1
2.13
Rules of Summation (continued)
n
Rule 5:
Rule 6:
n
n
i=1
i=1
axi + byi = a  xi + b  yi

i=1
x
n
= n  xi =
i=1
1
x1 + x2 + . . . + xn
n
From Rule 6, we can prove (in class) that:
n
xi x) = 0
i=1
2.14
Rules of Summation (continued)
n
Rule 6:
f(xi) = f(x1) + f(x2) + . . . + f(xn)

i=1
Notation:
n m
Rule 7:
n
x f(xi) = i f(xi) = i =1 f(xi)
n
[ f(xi,y1) + f(xi,y2)+. . .+ f(xi,ym)]
  f(xi,yj) = i 
=1
i=1 j=1
The order of summation does not matter :
n m
m n
f(xi,yj)
  f(xi,yj) =j =

1 i=1
i=1 j=1
2.15
The Mean of a Random Variable
The mean of a random variable is its mathematical
expectation, or expected value. For a discrete random
variable, this is:
= xif(xi)
= x1f(x1) + x2f(x2) + . . . + xnf(xn)
where n measures the number of values X can take on
E(X)
It is a probability-weighted average of the possible values the
random variable X can take on. This is a sum for discrete r.v.’s
and an integral for continuous r.v.’s
2.16
•
E(X) tells us the “long-run” average value for X. It is not the value one
would expect X to take on.
•
If you were to randomly draw values of X from its pdf an infinite
number of times and average these values, you would get E(X)
•
E(X) =  this greek letter “mu” is not used in your text but is commonly
used to denote the mean of X.
2.17
Example: Roll a fair die
6
E  X    xi f xi 
i 1
 1(1 / 6)  2(1 / 6)  3(1 / 6)  4(1 / 6)
 5(1 / 6)  6(1 / 6)
 21/ 6  3.5
Interpretation: In a large number of rolls of a fair die, onesixth of the values will be 1’s, one-sixth of the values will
be 2’s. etc., and the average of these values will be 3.5.
Mathematical Expectation
•
•
2.18
Think of E(.) as an operator that requires you to weight by probabilities any
expression inside the parentheses, and then sum
E(g(x)) = g(xi)f(xi)
= g(x1)f(x1) + g(x2 ) f(x2) + . . . + g(xn ) f(xn)
Rules of Mathematical Expectation
•
E(c) = c where c is a constant
•
E(cX) = cE(X) where c is a constant and X is a random variable
•
E(a + cX) = a + cE(X) where a and c are constants and X is a random
variable.
2.19
Variance of a Random Variable
•
•
•
•
Like the mean, the variance of a r.v. is an expected value, but it is the expected
value of the squared deviations from the mean
Let g(x) = (x – E(x))2
Variance 2 = Var(x) = E(x – E(x))2
= g(xi)f(xi)
= (xi – E(xi))2f(xi)
It measures the amount of dispersion in the possible values for X.
2.20
•
•
2.21
Unit of measurement is X units squared
When we create a new random variable as a linear transformation of X:
y = a + cx
We know that E(y) = a + cE(x)
But
Var(y) = c2Var(x)
(proof in class) This property tells us that the amount of variation in y is determined by:
the amount of variation in X and the constant c. The additive constant a in no way
alters the amount of variation in the values on x.
• E(x – E(x))2 = E[x2 – 2E(x)x + E(x)2]
= E(x2) – 2E(x)E(x) + E(x)2
= E(x2) – 2E(x)2 + E(x)2
= E(x2) – E(x)2
•
Run the E(.) operator thru, pulling out constants and stopping on random
variables. Remember that E(x) is itself a constant, so
•
E(E(x)) = E(x)
2.22
Standard Deviation
•
2.23
Because variance is in squared units of the r.v., we can take the square root of the
variance to obtain the standard deviation.
 =  2 =  Var(x)
Be sure to take the square root after you square and sum the deviations from the
mean.
Joint Probability
2.24
•
An experiment can randomly determine the outcome of more than one variable.
•
When there are 2 random variables of interest, we study the joint probability
density function
•
When there are more than 2 random variables of interest, we study the
multivariate probability density function.
For a discrete joint pdf, probability is expressed
in a matrix:
Let X= return on stocks, Y= return on bonds
X
Y
f(y)
-10
0
10
20
6
0
0
0.10
0.10
8
0
0.10
0.30
0.20
10
0.10
0.10
0
0
f(x)
P(X=x,Y=y) =
f(x,y)
e.g. P(X=10,Y=8) = 0.30
2.25
•
2.26
Marginal Probability Distribution: what is the probability distribution for X
regardless of what values Y takes on?
f(x) = yf(x,y)
what is the probability distribution for Y regardless of what values X takes
on?
f(y) = xf(x,y)
2.27
•
Conditional Probability Distribution:
What is the probability distribution for X given that Y takes on a particular value?
f(x|y) = f(x,y)/f(y)
What is the probability distribution for Y given that X takes on a particular value?
f(y|y\x) = f(x,y)/f(x)
2.28
• Covariance: A measure that summarizes the joint probability
distribution between two random variables.
cov(x,y)
= E[(x – E(x))(y-E(y))]
= x y (xi – E(x))(yi – E(y))f(x,y)
Ex:
2.29
It measures the joint association between 2 random variables. Try asking:
“When X is large, is Y more or less likely to also be large?”
If the answer is that Y is likely to be large when X is large, then we say X and Y
have a positive relationship. Cov(x,y) > 0
If the answer is that Y is likely to be small when X is large, then we say that X
and Y have a negative relationship. Cov(x,y) < 0.
cov(x,y) = E[(x – E(x))(y – E(y))]
= E[xy – E(x)y – xE(y) + E(x)E(y)]
= E(xy) – E(x)E(y) – E(x)E(y) + E(x)E(y)
= E(xy) – E(x)E(y)  useful!!
2.30
• Correlation
Covariance has awkward units of measurement.
Correlation removes all units of measurement by
dividing covariance by the product of the standard
deviations:
xy = Cov(x,y)/(xy)
and –1  xy  1
Ex:
2.31
What does correlation look like??
=0
=.7
=.3
=.9
Statistical Independence
Two random variables are statistically independent if knowing the value
that one will take on does not reveal anything about what value the
other may take on:
f(x|y) = f(x) or f(y|x) = f(y)
This implies that f(x,y) = f(x)f(y) if X and Y are independent.
If 2 r.v.’s are independent, then their covariance will necessarily be equal
to 0.
2.32
Functions of more than one Random Variable
2.33
Suppose that X and Y are two random variables. If we sum them together we
create a new random variable that has the following mean and variance:
Z = aX + bY 
E(Z) = E(aX + bY) = aE(x) + bE(y)
Var(Z) = Var(aX + bY)
= a2Var(X) + b2Var(Y) + 2abCov(X,Y)
If X and Y are independent 
Var(Z) = Var(aX + bY)
= a2Var(X) + b2Var(Y) see page 31
2.34
Normal Probability Distribution
• Many random variables tend to have a normal distribution (a well
known bell shape)
• Theoretically, x~N(β,2) where E(x) = β and Var(x) = 2
The probability density function is
a
 ( x  )2 
f ( x) 
exp 
,

2
2
2
 2

1

b
 x 
x
Normal Distribution (con’t)
2.35
•
A family of distributions, each with its own mean and variance. The mean
anchors the distribution’s center and the variance captures the spread of the
bell-shaped curve
•
To find area under the curve would require integrating the p.d.f – too
complicated. Computer generated table gives all the probabilities we need for a
normal r.v. that has mean 0 and variance of 1
To use the table (pg. 389), we need to take a normal
random variable x~N(,2) and transform it by subtracting the mean and dividing by
the standard
deviation. This is a linear transformation of X that creates a new random variable
that has mean 0 and variance of 1.
Z = (x - )/  where z ~N(0,1)
Statistical inference: drawing conclusions about a population based on a sample
2.36
T
E ( X )     xi f ( xi )
X
 Xt
t 1
T
2
(
x

x
)
s x2   i
T 1
Var ( X )   2  E ( X  E ( X )) 2
 E ( X   )2
 x   x 2  Var ( X )
sx  sx2
Cov ( X ,Y )  E ( X   x )(Y   y )
1
S xy 
( xt  x )( yt  y )

T 1
 E ( XY )   x  y
 xy 
Cov ( X , Y )
Var ( X ) Var (Y )
r
S xy
s x2
s 2y

 ( xt  x )( yt  y )
 ( xt  x )2  ( yt  y )2
```