Chapter 18: Sampling Distribution Models AP Statistics Overview of Chapter • We have already discussed samples and descriptive statistics, like sample proportions and sample means. • We know that if we take a large enough sample, our results should be close to what we would get if we asked the entire population (as long as sample is random, etc) • In this chapter, we look at many samples of to help us do many things—maybe most important of those things is to determine what is statistically significant Modeling the Distribution of Sample Proportions Suppose that a poll was conducted in September in which 1000 people were asked if they supported sending more troops to Afghanistan and 45% said yes. A few days later, a different polling organization asked the same question to 1000 people and instead found that 42% said yes. Which one is correct? Should we be surprised with these different results? Why or why not? Modeling the Distribution of Sample Proportions What would have to do to answer those questions, is to assume that one of those proportions is “correct” and then imagine what would happen if I looked at the results of many, many different samples of 1000 people. How much would those samples differ? What would the distribution of those who said yes look like? Modeling the Distribution of Sample Proportions What we would find out is that the distribution of those many, many samples would be symmetric and unimodal—centering on the true population proportion (or what you are calling the true proportion). From this symmetric and unimodal distribution, we can then model the sample proportions as a normal model—AS LONG AS CERTAIN ASSUMPTIONS AND CONDITIONS ARE SATISFIED!!!! Modeling the Distribution of Sample Proportions Once we can establish the use of the normal model, we are then able to find the standard deviation of the distribution and therefore, our model has the parameters N p, pq n Modeling the Distribution of Sample Proportions Visual of How A Model of a Sampling Distribution of Proportions is Formed Summary of Modeling the Distribution of Sample Proportions Summary of Modeling the Distribution of Sample Proportions Normal Model for the Distribution of the Percent of American Who Believe we Should Send More Troops to Afghanistan Assumptions and Conditions We can only use the Normal Model for the Distribution of Sample Proportions IF two assumptions are met: 1. The sampled values must be independent of each other. 2. The sample size, n, must be large enough Assumptions and Conditions It is difficult (if not sometimes impossible) to check or satisfy those assumptions. Therefore, we can verify certain conditions that provide information about the assumptions. Those conditions are 1. Randomization Condition 2. 10% Condition 3. Success/Failure Condition The Three Conditions for using a Normal Model for Sampling Distribution of Proportions Randomization Condition: The sample should be an SRS (or at least very confident it is not biased) 10% Condition: If the sample has not been made with replacement, the sample size must be no larger than 10% of the population. Success/Failure Condition: The sample size has to be big enough so that both np and nq are at least 10. Thoughts about Sampling Distribution Models • No longer is a proportion something we just compute, we now see it as a random quantity that has a distribution. • These models now can tell us the amount of variation to expect if we sample (and what we shouldn’t expect) • Sampling Distributions act as a bridge between the real world of data and an imaginary model. This bridge and the model that results has huge implications in statistics Example #1 Assume that 30% of all students at a university wear contact lenses. We randomly pick 100 student and want to know the approximate probability that more than one-third of those students wear contacts. (In the process of answering this question, specify the appropriate model, the mean and the standard deviation. Be sure the verify that the conditions are met.) Modeling Distributions of Sample Means Below is the distribution of the numbers on the face of a die if 10,000 dice were rolled. Modeling Distributions of Sample Means Below are the distributions of rolling 2, 3, 5 and 20 dice and taking the mean of the rolls. What do you notice? Modeling Distributions of Sample Means The Distribution of Sample Means (like Sample Proportions) will produce a symmetric and unimodal distribution. As long as a few assumptions/conditions are met, then that distribution can be modeled using the Normal Model. This concept (along with a few other important points) is called the Central Limit Theorem (CLT). Sometimes, because of its importance, it is called the Fundamental Theorem of Statistics. The Central Limit Theorem Very simply, the Central Limit Theorem states: The mean of a random sample has a sampling distribution whose shape can be approximated by a Normal Model. The larger the sample size, the better the approximation will be. The Central Limit Theorem • The sampling distribution of any mean becomes more nearly normal as the sample size grows. • The distribution of the population does NOT matter—the distribution of sample means will always approximate the Normal Curve. • Need to verify two assumptions: the observations are independent and collected with randomization. We use conditions to help us satisfy those important assumptions. Conditions for Central Limit Theorem In order to justify those assumptions, you can check these three conditions: 1. Randomization Condition: Data must be sampled randomly. 2. 10% Condition: If the sample has not been made with replacement, the sample size must be no larger than 10% of the population. This satisfies the Independence Assumption. 3. Large Enough Sample Size: This gets discussed more in chapter 24, but for now just think about how your sample size relates to the population size. Important Information about Central Limit Theorem The CLT does NOT talk about the distribution of the data from the sample. It talks about the sample means and sample proportions of many different randeomsamples drawn from the same population Normal Model for the Distribution of Sample Means A few things to remember: * Will be centered at the population mean. • Means have smaller standard deviations than individuals. • The standard deviation of the sample mean falls as the sample size grows. The relationship between the standard deviation of the mean and the sample size can be shown by the formula: y SD y n Normal Model for the Distribution of Sample Means The Normal Model for the Distribution of Sample Means has the parameters N , n Normal Model for the Distribution of Sample Means Example #2 Assume that SAT scores are normally distributed with a mean of 500 and a standard deviation of 10. Describe the distribution of sample means if we randomly pick 50 students. Verify that conditions are met. Do you think it would be unreasonable to have a randomly selected group of 50 students who had mean of 550? Justify, using statistics. Standard Error The Standard Error is what we call our estimation of the standard deviation of a sampling distribution when we don’t know the population proportion or the standard deviation. For sampling distribution of sample proportions: SE pˆ pˆ qˆ n For sampling distribution of sample means: SE y s n Sampling Distribution Models (Visual of Logic) Problems to Look Out For • Don’t confuse the sampling distribution with the distribution of the sample. • Beware of observations that are not independent. • Watch out for small samples from skewed populations. --Will take large sample sizes to “undo” the skewness and create symmetric sampling distributions.