Report

Chapter 9 Estimating ABILITY with Confidence Intervals Objectives Students will be able to: 1) Construct confidence intervals to estimate a proportion or a mean 2) Construct confidence intervals to estimate a difference between two proportions or a difference between two means 3) Interpret confidence intervals in the context of the data 4) Calculate the margin of error for a confidence interval 5) Use technology to calculate confidence intervals • In the 2013 NFL regular season, quarterback Matt Ryan completed 439 out of 651 pass attempts, for a completion percentage of 67.4%. • Would you be confident in making a claim that in the 2013 NFL season, Matt Ryan had an ABILITY to complete a pass of exactly 67.4%? Why or why not? Discuss using previous concepts learned in this course. • Would you be more or less confident if you made the claim that Ryan’s ABILITY to complete a pass fell between two values, say between 55% and 76%? • In this chapter, we’re going to look at how we can use an athlete’s PERFORMANCES to estimate their ABILITY by creating a interval of values that their ABILITY will be between. • Some examples: – In 2010, Josh Hamilton’s ABILITY to get a hit was somewhere between 0.317 and 0.401. – NFL teams have the ABILITY to score somewhere between 2.1 and 4.5 more points at home than on the road. The Idea of a Confidence Interval • Remember that the law of large numbers says it is impossible to know an athlete’s ABILITY exactly unless we could observe an infinite number of PERFORMANCES in the same context. • Instead of claiming to know an athlete’s exact ABILITY, we can provide an interval of values that their ABILITY is between. • The interval of plausible values for an athlete’s ABILITY or an interval of plausible values for the difference in an athlete’s ABILITY in two different contexts is known as a confidence interval. • We use an interval of plausible values rather than a single value to increase our chances of arriving at the correct estimate of an athlete’s ABILITY. • Obviously we would be pretty confident in this interval. However, it doesn’t really tell us anything about the weather. • We don’t know if it will be hot or cold. • Similarly, we wouldn’t want to create a confidence interval saying something like LeBron’s ABILITY to make a three-pointer is between 0% and 100%. This really doesn’t tell us much about what LeBron can do on a basketball court. • Later on we will learn about how to calculate a confidence interval. • Right now, let’s concentrate on what information a confidence interval provides. Interpreting Confidence Intervals • A confidence interval is constructed so that we know how much confidence we should have in the interval. • All of the intervals we construct in this course will utilize a 95% confidence level. Meaning, if we were to calculate intervals for lots and lots of athletes using our methods, about 95% of the intervals will succeed in containing the ABILITY of the athlete for whom the interval was calculated. • In 2010, Josh Hamilton had a batting average of 0.359 (186 hits in 518 at-bats). • Since his average is based on only 518 at-bats (not an infinite amount), it is very unlikely that his ABILITY to get a hit was 0.359. • We would probably have close to 0% confidence that his ABILITY to get a hit would be exactly 0.359 if Hamilton had millions and millions of at-bats under the same conditions. • We can calculate a 95% confidence interval of Hamilton’s ABILITY to get a hit. • Using his 2010 PERFORMANCES, the interval is 0.317 to 0.401. • This means that in millions and millions of atbats under the same conditions, we would expect his batting average to end up being between 0.317 and 0.401. • In short, we can say that we are 95% confident that the interval of plausible values from 0.317 to 0.401 contains Hamilton’s ABILITY to get a hit in 2010. • The generic statement you can use to interpret a confidence interval for an athlete’s ABILITY is: We are 95% confident that the interval of plausible values from to includes ‘s ABILITY to . • We can also use confidence intervals to estimate the difference in an athlete’s ABILITY in two different contexts. • Example: In 2010, Hamilton hit 0.271 when facing lefties, as opposed to 0.401 when facing righties. He PERFORMED 0.130 batting average points better when facing righthanded pitchers. • Again, since this is based off of 518 total atbats, it’s unlikely that this is the exact difference in his ABILITY. • A 95% confidence interval for his difference in ABILITY is from 0.043 to 0.217. • Interpretation: We are 95% confident that the interval of plausible values from 0.043 to 0.217 contains the true difference in Hamilton’s ABILITY to get a hit against right-handed pitchers and his ABILITY to get a hit against left-handed pitchers. • Generic statement: We are 95% confident that the interval of plausible values from to includes the difference in ‘s ABILITY to in context 1 and context 2. Using Confidence Intervals to Make Decisions • Do Josh Hamilton’s PERFORMANCES in 2010 provide convincing evidence that he has a greater ABILITY to get a hit against righthanded pitchers than against left-handed pitchers? What would our hypotheses be for this question? • Remember, his average against lefties was .271 and his average against righties was .401. The test-statistic will be the difference in batting averages (0.130). • The p-value for this test is approximately 0. What does that tell us? – Reject the null, and we have convincing evidence that Hamilton was a better hitter against right-handed pitchers. • We can also use confidence intervals to address the hypotheses. • The null hypothesis says that Hamilton has the same ABILITY to get a hit vs righties and lefties. Therefore, if this null is correct, what should be the true difference in his ABILITY (righty – lefty)? • 0. If there is no difference in his ABILITY, then when you take the difference, you should get 0. • If the alternative hypothesis is correct, then the true difference (righty – lefty) should be greater than 0. • The confidence interval for Hamilton’s difference in ABILITY was between 0.043 and 0.217. • This entire interval is positive. 0 is not a part of the interval. Every value in the interval suggests that his ABILITY to get a hit was higher against righties than lefties. • The hypothesis test and the confidence interval gave us the same conclusion. However, the confidence interval gives us more information. It tells us just how better he was against righties (between .043 and .217 batting average points). • Was Hamilton a better hitter at home than on the road? • His home PERFORMANCE was .390. • His road PERFORMANCE was .327. • A 95% confidence interval for the true difference in his ABILITY to get a hit at home and his ABILITY to get a hit on the road is -0.021 to .063. • This interval includes the value 0, so it is possible there is no difference in his ABILITY. The Structure of Confidence Intervals • Confidence intervals contain two components: – 1) A single-value estimate (a single number that represents our best guess for an athlete’s ABILITY) – 2) A margin of error (value that is added and subtracted from the single-value estimate) • A confidence interval is: • A 95% confidence interval for LeBron James’s ABILITY to make a three-point shot in the 20072008 regular season is: • This means that the interval goes from 0.266 to 0.364. • Interpretation: We are 95% confident that the interval of plausible values from 0.266 to 0.364 contains LeBron’s ABILITY to make a three-point shot in the 2007-2008 regular season. • The single-value estimate of 0.315 is LeBron’s observed three-point shooting percentage in the 2007-2008 regular season (his PERFORMANCE during the season). • His observed PERFORMANCE is certainly our best guess for his true ABILITY. However, it’s likely incorrect due to RANDOM CHANCE being involved. • To compensate for RANDOM CHANCE, we include the margin of error. • Keep in mind we are making a 95% confidence interval. Let’s think about a Normal distribution for a minute. What do we know about 95% of the data in a Normal distribution? – 95% of the data is within two standard deviations of the mean. • Therefore, to calculate the margin of error, we need to estimate the standard deviation, and multiply that value by 2. • For our purposes, we’ll use a formula to estimate the standard deviation. • However, the formula for standard deviation changes based on the quantity we are trying to estimate. – For example, the formula to estimate the standard deviation for LeBron’s ABILITY to make a threepointer is different from the formula for estimating the difference in ABILITY to make a three-point shot when playing at home and when playing on the road. Calculating a Confidence Interval for a Proportion • Let’s say we want to estimate LeBron’s ABILITY to make a three-point shot. We would have to try and estimate the proportion of three-point shots that he would make if he could keep shooting three-pointers indefinitely under the same conditions as in the 2007-2008 regular season. Notation and Formula • In the 2007-2008 regular season, James made 113 out of 359 three-point shots. • Therefore, we now know our variables. • From the previous slide, we can see our confidence interval goes from 0.266 to 0.364. • Thus, we are 95% confident that the interval of plausible values from 0.266 to 0.364 contains LeBron’s ABILITY to make a threepointer during the 2007-2008 season. • Keep in mind that this formula will only work when we are estimating ABILITY with a single proportion (think categorical variables: Ch 2). • It will not work for estimating ABILITY that is measured by a mean or even a difference in proportions. • Also, for the formula you must use the proportion (decimal) equivalent for the PERFORMANCE; not the percent equivalent. • Note: there must be at least 15 successes and 15 failures to use this interval. • Since this is so much fun, let’s try some more! In the first two games of the 2009-2010 regular season, Kobe Bryant had made 17 of 45 shot attempts. Calculate a 95% confidence interval for Kobe’s ABILITY to make a shot in the 2009-2010 regular season. • We are 95% confident that the interval of plausible values from 0.233 to 0.523 contains Kobe’s ABILITY to make a shot during the 2009-2010 regular season. • For this interval we only used Kobe’s PERFORMANCES for two games. What do you think would happen to the width of the interval if we used Kobe’s PERFORMANCES from the entire season? Would the interval get wider or narrower? For the entire season, Kobe made 716 of 1569 shot attempts. Calculate a new 95% confidence interval. • When using n=45, our interval went from 0.233 to 0.523. • When using n=1569, our interval went from 0.431 to 0.481. • The margin of error will be smaller when the interval is calculated using a larger number of observations. • Increasing sample size decreases variability, resulting in a more precise interval. Calculating a Confidence Interval for a Mean • In the previous section, we used categorical data to create a confidence interval. • We can also use numerical data to create confidence intervals for a player’s ABILITY. • One of the most common numerical measures of ABILITY in basketball is scoring average, or mean points per game. Notation and Formula • Keep in mind that just as with calculating a 95% confidence interval for a proportion, we want to have a large enough sample size in order for our “95% confidence” to be accurate. • Generally speaking, a sample size of at least 30 is recommended, especially for a distribution that is not Normal. • Let’s create a confidence interval for LeBron’s ABILITY to score points in 2007-2008 regular season. • During the regular season, he scored 2250 points in 75 games. That’s an average of 30.0 points per game. This is our best estimate at his ABILITY. • His observed standard deviation was 8.04 points. We now have all the information we need. • We are 95% confident that the interval of plausible values from 28.14 points per game to 31.86 points per game contains LeBron James’s ABILITY to score points in the 20072008 regular season. • Let’s try another. Here are the rushing totals (in yards) for each game of the 1985-1986 NFL season for running back Walter Payton, of the Chicago Bears. 120 39 62 6 63 132 112 118 192 107 132 102 121 111 53 81 Calculate a 95% confidence interval for Payton’s rushing ABILITY in the 1985-1986 regular season. We need to calculate our mean and standard deviation. 120 39 62 6 63 132 112 118 192 107 132 102 121 111 53 81 Create a list to calculate the mean and st. dev. • We are 95% confident that the interval of plausible values from 74.6 to 119.2 yards contains Payton’s rushing ABILITY in the 19851986 regular season. Confidence Intervals for Difference of Two Proportions • We will create a confidence interval for a difference in two proportions when 1) we are using categorical data and 2) we are interested in comparing an athlete’s ABILITY in two different contexts (examples: home vs road; day vs night; grass vs turf). 95% Confidence Interval Formula Note: As with confidence intervals for a single proportion, this formula will only work if we have at least 15 successes and 15 failures in each setting. • In football, does “icing the kicker” really work? • From 2000 to 2009, kickers in the NFL made 377 of 488 field goal attempts (77.3%) without a timeout being called before the attempt (without being iced). This compares to kickers making 157 of 197 field goal attempts (79.7%) after being iced. • Based on these numbers, it looks like icing the kicker might actually be a bad strategy. • Let’s construct a 95% confidence interval for the true difference in the ABILITY of kickers to make a field goal when they are iced and when they are not. • Let’s first determine our values. • Instead of using “A” and “B”, I’ll use “I” for iced and “N” for not iced. • We are 95% confident that the interval of plausible values from -0.045 to 0.093 includes the true difference in the ABILITY of kickers to make a field goal when iced and their ABILITY to make a field goal when not iced. • The negative value would indicate that kickers have a lower ABILITY when iced. So this interval means the true success rate for iced kickers could be lower by up to 0.045 or higher by up to 0.093. • You could have reversed the order of subtraction when constructing the interval. If you did, the interval would have went from -0.093 to 0.045, but the interpretation would be the same. • Let’s try creating another interval. • In the 2009 regular season, first baseman and left-handed hitter Ryan Howard of the Philadelphia Phillies had 126 hits in 394 atbats when facing a right-handed pitcher and only 46 hits in 222 at-bats when facing a lefthanded pitcher. Calculate a 95% confidence interval for the difference in Howard’s ABILITY to get a hit against a right-handed pitcher and his ABILITY to get a hit against a left-handed pitcher. • Let’s determine our values. Remember, Howard has 126 hits in 394 at-bats when facing a right-handed pitcher and only 46 hits in 222 at-bats when facing a left-handed pitcher. • We are 95% confident that the interval of plausible values from 0.041 to 0.185 contains the true difference in Howard’s ABILITY to get a hit against right-handed pitchers and his ABILITY to get a hit against left-handed pitchers. Confidence Intervals for the Difference of Two Means • We will create a confidence interval for the difference in two means when 1) we are using numerical data and 2) we are interested in comparing an athlete’s ABILITY in two different contexts (examples: home vs road; day vs night; grass vs turf). • The difference with confidence intervals for two proportions is that they deal with categorical data. 95% Confidence Interval Formula • Note: We need both sample sizes to be large enough for the 95% to be accurate. Samples are considered large if they each have 30 or more observations. • In the modern NFL, the passing game rules. But was this always the case? Let’s comparing using data from 2009 and 1979. • In 2009, the mean passing PERFORMANCE of the 32 NFL teams was 218.5 yards, with a standard deviation of 44.1 yards. • In 1979, the mean passing PERFORMANCE of the 28 NFL teams was 180.4 yards, with a standard deviation of 35.7 yards. • We are 95% confident that the interval of plausible values from 17.5 to 58.7 contains the true difference in passing ABILITY for NFL teams in 2009 and 1979. • Because all of the plausible values are greater than 0, we have convincing evidence that teams in 2009 have a greater ABILITY to pass than did the teams in 1979. • Let’s try another. • Here are the points allowed by the New England Patriots in their 2009 regular season games at home and on the road: Home: 24 10 21 0 17 14 10 7 Away: 16 20 7 35 38 22 10 34 • Calculate and interpret a 95% confidence interval for the difference in the Patriots’ ABILITY to play defense at home and their ABILITY to play defense on the road. • Let’s get our numbers. Again, here is the distribution: Home: 24 10 21 0 17 14 10 7 Away: 16 20 7 35 38 22 10 34 • We are 95% confident that the interval of plausible values from -19.869 to 0.119 includes the true difference in the Patriots’ ABILITY to play defense at home and their ABILITY to play defense on the road. Using Technology to Calculate Confidence Intervals • The TI-84 calculator can calculate confidence intervals for us. Let’s look at how we can do this. • Note: Confidence intervals calculated with technology will be slightly different from confidence intervals calculated by hand due to technology being more precise. Confidence Interval for a Proportion • Let’s use Josh Hamilton’s numbers in 2010 for our example: – 186 hits in 518 at-bats 1) Press STAT, scroll to TESTS, choose A: 1-PropZInt 2) Enter the number of successful PERFORMANCES for “x”, the total number of attempts for “n”, and the desired confidence level for “C-level”. 3) Press calculate Confidence Interval for Mean Let’s use Walter Payton’s numbers from a previous example. We have to enter the data into a list. 120 39 62 6 63 132 112 118 192 107 132 102 121 111 53 81 1) Press STAT, scroll to TESTS, choose 8: Tinterval. 2) Inpt: Data, select the list, freq: 1, C-Level: .95 (if you have the mean and st. dev., choose Stats instead of Data). Then press calculate. Confidence Interval for a Difference in Proportions • Hamilton vs righties: 141 hits in 352 at-bats • Hamilton vs lefties: 45 hits in 166 at-bats 1) Press STAT, scroll to TESTS, choose B: 2-PropZInt 2) Enter the first context data for x1 and n1, and the second context data for x2 and n2. 3) Press calculate. Confidence Interval for a Difference in Means 1)Enter data sets into two lists. 2) Press STAT, scroll to TESTS, choose 0: 2-SampTInt 3) Choose Data, select the lists, make sure both Freq are 1, choose “No” for pooled data. 4) Press Calculate