Chapter 6: Probability and Simulation

Chapter 6:
Probability and
The study of randomness
 Probability
is the study of chance.
 4.1 focuses on simulation since actual
observations are often not feasible.
 When we produce data by random
sampling or randomized comparative
experiments laws of probability answer
the question “what would happen if we
did this many times?”
 Probability is the basis of inference
6.1 Simulation
A couple plans to have children until they have a
girl or until they have 4 children, whichever comes
first. What are the chances that they will have a
girl among their children?
Let a flip of a fair coin represent a birth, heads = girls,
tails = boy (since both outcomes are equally likely the
coin is an accurate imitation of the situation)
Flip the coin until a head appears or 4 times,
whichever comes first.
If this coin flipping procedure is repeated many times,
then the proportion of times that a head appears
within the first 4 flips should be a good estimate of the
true likelihood of the couple’s having a girl.
What’s another tool we could use to simulate birth
 The
more repetitions, the closer a result’s
occurrence will get to it’s true likelihood.
 Independence:
When the result of one
trial (coin toss, dice roll) has no effect or
influence on the next toss.
Simulation Steps
 1.
State the problem or describe the random
 2.
 3.
Ex: Toss a coin 10 times, what is the likelihood of a run of at least 3
consecutive heads or 3 consecutive tails?
State the Assumptions (there are 2)
A head or a tail is equally likely to occur on each toss
Tosses are independent of each other
Assign digits to represent outcomes (want efficiency)
In a random # table, even and odd digits occur with the same
long-term relative frequency (50%)
 4.
One digit simulates one toss of the coin
Odd digits represent heads; even digits represent tails
Successive digits in the table simulate independent tosses
Simulate many repetitions
Looking at 10 consecutive digits in Table B simulates one repetition.
Read many groups of 1- digits from the table to simulate many rep
Be sure to keep track of whether or not the event we want (a run of
at least 3 heads or at least 3 tails) occurs on each repetitions
Here are the first 3 repetitions starting at line 1-1 in Table B.
19223 95034
Run of 3:
 5.
05756 28713
96409 12531
22 repetitions were done for a total of 25. 23 of them did have a run of 3 or more
Heads or tails.
State your conclusions
We estimate the probability of a run of size 3 by the proportion
*Of course 25 reps are not enough to be confident that our
estimate is accurate so we can tell a computer to do
thousands of repetitions (or TRIALS) for us. A long simulation
finds that the true probability is .86
Assigning digits
Some ways more efficient than others.
Example: Choose a person at random from a group of which
70% are employed. One digit simulates one person:
0, 1, 2, 3, 4, 5, 6 = employed
 7, 8, 9 = not employed
00- 69 employed and 70-99 not employed could also have
worked, but is less efficient b/c requires twice as many digits
and ten times as many numbers.
Example 2: Choose one person at random from a group of
which 73% employed
Now 00-72 = employed, 73-99 = not employed
Example 3: Choose one person at random from a group of
which 50% are employed, 20% are unemployed, and 30% are
not in the labor force:
0-4 = employed, 5-6 = unemployed, 7-9 = not in labor force.
Frozen Yogurt Sales example
 Orders
of frozen yogurt flavors (based on
sales) have the following relative
frequencies: 38% chocolate, 42% vanilla,
20% strawberry.
We want to simulate customers entering
the store and ordering yogurt.
 How
would you simulate 1- frozen yogurt sales
based on recent history using table?
Randomizing with Calculator
 Block
of 5 random digits from table
 Rolling a die 7 times
 10 numbers from 00-99
Gymnast example
Silver medalist scored 38.211 for overall
gymnast (sum of vault, parallel bars, balance
beam, floor exercise). What are the chances
that Carly Patterson will beat her?
Distributions of her past scores mean, sd:
 Bars
 Beam
 Floor
9.314 .216
9.553 .122
9.461 .203
9.543 .0999
Randomly simulate her scores from 100
 randNorm(9.314,
.216, 100)  L1(simulates 100 vault scores)
 randNorm (9.553, .122, 100)  L2 (bars)
 randNorm (9.461, 0.203, 100)  L3 (beams)
 randNorm (9.543, .099, 100)  L4 (floor)
 (L1 + L2 + L3 + L4)  L5 (simulated total score)
 Calculate
one variable stats on L5:
Mean total score is 37.854 w/sd of .207
Using the 68-95-99.7% rule, we expect 95% of Carly’s
total scores to be between 37.44 and 38.268.
A score on the top end of that range would win.
*she did win…
6.2 Probability Models
 Proportion
of heads to tails in a few tosses
will be erratic but after thousands of tosses
will approach the expected .5 probability
 Probability
models have two parts:
A list of possible outcomes
A probability for each outcome.
Sample Space
 To
specify S we must state what
constitutes an individual outcome, then
which outcomes can occur (can be
simple or complex)
Ex: coin tossing, S = {H, T}
Ex: US Census: If we draw a random sample
of 50,000 US households, as the survey does,
the S contains all 50,000
Rolling two dice
 At
a casino- 36 possible outcomes when
we roll 2 dice and record the up-faces in
order (first die, second die)
 Gamblers
care only about number of dots
face up so the sample space for that is:
S = {2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12}
Techniques for
finding outcomes
 1.
Tree diagram
 For
tossing a coin
then rolling a die
 2.
 3.
Multiplication Principle
2x6 = 12 for same example
Organized list:
H1, H2, H3, H4, H5, H6, T1, T2, T3, T4, T5, T6
With/without replacement
 If
you take a card from a deck of 52,
don’t put it back, then draw your 2nd card
etc., that’s without replacement.
Ex: how many different 3 digit numbers can
you make: 10x9x8 = 720
 If
you take a card, write it down, put it
back, draw 2nd card etc., that’s with
Ex: 10x10x10 = 1000
Probability Rules
1. Any probability is a number between 0 and 1
2. The sum of the probabilities of all possible outcomes
3. If 2 events have no outcomes in common (they
can’t occur together), the probability that one OR the
other occurs is the sum of their individual probabilities
Ex: If one event occurs in 40% of all trials, another event
happens in 25% of all trials, the 2 can never occur
together, then one or the other occurs on 65% of all trials
4. The probability that an event doesn’t occur is 1
minus the probability that it does occur
Ex: If an event happens in 70% of all trials, it fails to occur
in the other 30%
Venn diagrams help!
Ex: Probability of rolling a 5?
B/c P(roll a 5 with 2 die) =
P(1,4) + P(3,2) + P(2,3) + P(4,1)
= 1/36 + 1/36 + 1/36 + 1/36
= 1/9 or .111
Independence &
the Multiplication Rule
To find the probability for BOTH events A and B
 Example: Suppose you plan to toss a coin twice, and
want to find the probability of rolling a head on both
 A = first toss is a head, B = second toss is a head. So
(1/2)(1/2) = ¼. We expect to flip 2 heads on 25% of
all trials. The more times we repeat this, the closer our
average probability will get to 25%.
The multiplication rule applies only to independent
events; can’t use it if events are not independent!
Independent or not?
Coin toss
Drawing from deck of cards
I: Coin has no memory and coin tossers cannot
influence fall of coin
NI: First pick, probability of red is 26/52 or .5.
Once we see the first card is red, the probability
of a red card in the 2nd pick is now 25/51 = .49
Taking an IQ test twice in succession
More applications of
Probability Rules
 If
two events A and B are independent,
then their complements are also
Ex: 75% of voters in a district are
Republicans. If an interviewer chooses 2
voters at random, the probability that the
first is a Republican and the 2nd is not a
republican is .75 x .25 = .1875
6.3 General Probability Rules
Addition Rule for Disjoint events
General Addition rule for
Unions of 2 events
Deb and Matt are waiting anxiously to hear if
they’ve been promoted. Deb guesses her
probability of getting promoted is .7 and
Matt’s is .5, and both of them being
promoted is .3. The probability that at least
one is promoted = .7 + .5 - .3 which is .9. The
probability neither is promoted is .1.
The simultaneous occurrence of 2 events
(called a joint event, such as deb and matt
getting promoted) is called a joint probability.
Conditional Probability
The probability that we assign to an event
can change if we know some other event has
P(A|B): Probability that event A will happen
under the condition that event B has occurred.
Ex: Probability of drawing an ace is 4/52 or 1/13.
If your are dealt 4 cards and one of them is an
ace, probability of getting an ace on the 5th
card dealt is 3/48 or 1/16 (conditional
probability- getting an Ace given that one was
dealt in the first 4).
In words, this says that for both of 2 events to occur,
first one must occur, and then, given that the first
event has occurred, the second must occur.
Remember: B is the event whose probability
we are computing and A represents the info
we are given.
Extended Multiplication rules
 The
union of a collection of events is the
event that ANY of them occur
 The Intersection of any collection of
events is the event that ALL of them occur
Only 5% of male high school basketball, baseball, and football
players go on to play at the college level. Of these only 1.7%
enter major league professional sports. About 40% of the
athletes who compete in college and then reach the pros have
a career of more than 3 years. Define these events:
A = competes in college
P(A) = .05
P(B|A) = .017
P(C|A and B) = .400
What is the probability a HS athlete will have a pro career more
than 3 years? The probability we want is therefore
B = competes pro C = pro career longer than 3
P(A and B and C) = P(A)P(B|A)P(C|A and B)
= .05 x .017 x .40 = .00034
So, only 3 of every 10,000 high school athletes can expect to
compete in college and have a pro career of more than 3 years.
Extended tree diagram + chat
room example
47% of 18 to 29 age chat online, 21% of 30 to 49
and 7% of 50+
Also, need to know that 29% of all internet users
are 18-29 (event A1), 47% are 30 to 49 (A2) and
the remaining 24% are 50 and over (A3).
What is the probability that a randomly chosen
user of the internet participates in chat rooms
(event C)?
Tree diagram- probability written on each
segment is the conditional probability of an
internet user following that segment, given that he
or she has reached the node from which it
(final outcome is adding all the chatting
probabilities which = .2518)
Bayes Rule
Another question we might ask- what percent of
adult chat room participants are age 18 to 29?
P(A1|C) = P(A1 and C) / P(C)
= .1363/.2518 = .5413
*since 29% of internet users are 18-29, knowing that
someone chats increases the probability that they
are young!
Formula sans tree diagram:
P(C) = P(A1)P(C|A1) + P(A2)P(C|A2) + P(A3)P(C|A3)
6.3 Need to Know summary(print)
Complement of an event A contains all outcomes not in A
Union (A U B) of events A and B = all outcomes in A, in B, or in
both A and B
Intersection(A^B) contains all outcomes that are in both A and
B, but not in A alone or B alone.
General Addition Rule: P(AUB) = P(A) + P(B) – P(A^B)
Multiplication Rule: P(A^B) = P(A)P(B|A)
Conditional Probability P(B|A) of an event B, given that event A
has occurred: P(B|A) = P(A^B)/P(A) when P(A) > 0
If A and B are disjoint (mutually exclusive) then P(A^B) = 0 and
P(AUB) = P(A) + P(B)
A and B are independent when P(B|A) = P(B)
Venn diagram or tree diagrams useful for organization.

similar documents