STAT 110 - Section 5 Lecture 18 Professor Hao Wang University of South Carolina Spring 2012 Last time • How to measure spread: variance and standard deviation • Density curve • Normal density curve: Mean and SD http://www.stat.tamu.edu/jhardin/applets/signe d/Normal.html The 68-95-99.7 Rule 68% of the data falls within 1 std deviation of the mean 95% of the data falls within 2 std deviations of the mean 99.7% of the data falls within 3 std deviations of the mean Apply the 68-95-99.7 Rule • The Health and Nutrition Examination Study of 1976-1980 (HANES) studied the heights of adults (aged 18-24) and found that the heights follow a normal distribution with the following: • Women Mean (): 65.0 inches standard deviation (): 2.5 inches • Men Mean (): 70.0 inches standard deviation (): 2.8 inches • Using 68-95-99.7 rule, what can we say about the population of heights ? The 68-95-99.7 Rule Approximately what percent of a standard normal distribution will fall between -1 and 1? A – 16% B – 32% C – 64% D – 68% E – 95% Approximately what percent of a standard normal distribution will fall between 0 and 1? A – 16% B – 32% C – 34% D – 68% E – 95% Example – IQ Scores for 12 Year Olds IQ scores for 12 year olds follow a normal distribution with a mean of 100 and a standard deviation of 16. What percent of 12 year olds will have an IQ between 84 and 116? A – 5% B – 32% C – 68% D – 95% E – 99.7% IQ scores for 12 year olds follow a normal distribution with a mean of 100 and a standard deviation of 16. What percent of 12 year olds will have an IQ higher than 132? A – 32% B – 16% C – 5% D – 2.5% E – 0.3% IQ scores for 12 year olds follow a normal distribution with a mean of 100 and a standard deviation of 16. What percent will have IQ scores lower than 116? A – 16% B – 32% C – 50% D – 68% E – 84% The starting salaries in a field are approximately normally distributed with a mean of $40,000 and a standard deviation of $5,000. What can we say about the percent of people who make between $30,000 and $50,000? A) Could be any percent B) Is approximately 68% C) Must be at least 75% D) Must be at least 88.9% E) Is approximately 95% • Observations expressed in terms of standard deviations above or below the mean are called Standard Scores. • The standard score is the number of standard deviations above or below the mean at which an observation is located. • If the observation is below the mean, the standard score will be negative. • If the observation is above the mean, the standard score will be positive. Use standard score • Jennie scored 600 on the verbal part of the SAT. Her friend Gerald took the ACT and scored a 21 on the verbal part. SAT scores are normally distributed with mean 500 and standard deviation 100. ACT scores are normally distributed with mean 18 and standard deviation 6. Assuming that both tests measure the same kind of ability, who has the higher score? • Who performs better ? A woman is told her weight has a standard score of 1. This means her weight is A B C D 1 pound above the mean 1 pound below the mean 1 standard deviation above the mean 1 standard deviation below the mean • Math SAT scores follow a normal distribution with a mean of 500 and standard deviation of 100. • Calculate the standard score for a score of 630. A. B. C. D. 1.3 1.1 -1.3 -1.1 • Two students get a 65 on two different tests. Student A has a standard score of -1 while Student B has a standard score of -2. Which student had the better performance on the test? A. Student A B. Student B C. Both students gave equal performances. Percentiles (revisited) pth percentile – a value such that at least p% percent of the observations lies below it and at least (100-p)% percent lie above it. Approximately what value does a standard normal need to be at to be at the 97.5th percentile? A – -2 B – -1 C–0 D–1 E–2 • Recall the distribution of SAT math scores follows a normal distribution with a mean of 500 and a standard deviation of 100. What score do you need to be at so that only 2.5% did better than you? • Recall the distribution of SAT math scores follows a normal distribution with a mean of 500 and a standard deviation of 100. What score are you at if 84% did better than you? • Recall the distribution of SAT math scores follows a normal distribution with a mean of 500 and a standard deviation of 100. What percentage of people are better than you if you scored 780 ? Chapter 14 – Describing Relationships Most statistical studies examine data on more than one variable. The steps when trying to talk about two variables at once are the same as what we used earlier in the semester with just one variables: • Plot the data. • Look for overall patterns and deviations from those patterns. • Use numerical summaries. Scatterplots scatterplot – shows the relationship between two quantitative variables measured on the same individuals • Values of one variable appear on the x-axis. This is typically the one doing the explaining – the explanatory, predictor, or independent variable. • Values of the other variable appear on the y-axis. This is typically the one being explained – called the response or dependent variable. Scatterplot Example: When water flows across farmland, some of the soil is washed away, resulting in erosion. An experiment was conducted to investigate the effect of the rate of water flow on the amount of soil washed away. Flow is measured in liters/second and the eroded soil is measured in kilograms. flow rate eroded soil .31 .82 .85 1.95 1.26 2.18 2.47 3.01 3.75 6.07 Scatterplot • Is there an explanatory variable? • What’s the response variable? • Which variable should be on the x-axis? Flow Rate vs Eroded Soil Eroded Soil (kg) 7 6 5 4 3 2 1 0 0 1 2 Flow Rate (liters/sec) 3 4 Measuring Strength Through Correlation A Linear Relationship Correlation represented by the letter r: Indicator of how closely the values fall to a straight line. Measures linear relationships only; that is, it measures how close the individual points in a scatterplot are to a straight line. Correlation Example : Verbal SAT and GPA Scatterplot of GPA and verbal SAT score. The correlation is .485, indicating a moderate positive relationship. Higher verbal SAT scores tend to indicate higher GPAs as well, but the relationship is nowhere close to being exact. Example: Husbands’ and Wifes’ Ages and Heights Scatterplot of British husbands’ and wives’ ages; r = .94 Scatterplot of British husbands’ and wives’ heights (in millimeters); r = .36 Husbands’ and wives’ ages are likely to be closely related, whereas their heights are less likely to be so. Source: Marsh (1988, p. 315) and Hand et al. (1994, pp. 179-183) Occupational Prestige and Suicide Rates Plot of suicide rate versus occupational prestige for 36 occupations. Correlation of .109 – these is not much of a relationship. If outlier removed r drops to .018. Source: Labovitz (1970, Table 1) and Hand et al. (1994, pp. 395-396) Example : Professional Golfers’ Putting Success Scatterplot of distance of putt and putting success rates. Correlation r = −.94. Negative sign indicates that as distance goes up, success rate goes down. Source: Iman (1994, p. 507) Which one has r = -0.86 ? Which one has r = 0.52 ? (A was -0.86) Summary: Features of Correlations 1 Correlation of +1 indicates a perfect linear relationship between the two variables; as one increases, so does the other. All individuals fall on the same straight line (a deterministic linear relationship). 2 Correlation of –1 also indicates a perfect linear relationship between the two variables; however, as one increases, the other decreases. 3 Correlation of zero could indicate no linear relationship between the two variables, or that the best straight line through the data on a scatterplot is exactly horizontal. Summary: Features of Correlations 4 A positive correlation indicates that the variables increase together. 5 A negative correlation indicates that as one variable increases, the other decreases. 6 Correlations are unaffected if the units of measurement are changed. For example, the correlation between weight and height remains the same regardless of whether height is expressed in inches, feet or millimeters (as long as it isn’t rounded off).