Report

Introductory Statistics for Laboratorians dealing with High Throughput Data sets Centers for Disease Control Interpreting Scores What do the numbers mean? Johnny came home from 4th grade and told his mother he’d made 100 on his test. • • • • That’s good! But it was a 200 point test. That’s bad! But it was a very difficult test and Johnny’s score was one of the highest in the district. • That’s good! • But Johnny wasn’t the only one who got 100, the average score on the test was 100. • That’s not so good. What have we learned? • The fact is that a raw score by itself is meaningless. • To interpret a persons score you must know how everybody else scored. • For a score to have meaning, you have to know where that score is in the distribution. The two main things we need to know to interpret a score are: • How far is is from the mean • How spread out are the scores The Deviation Score • Deviation score commonly used in statistics to make a score more interpretable. • Deviation score: how far the score is from the mean Some Notation • In statistics the raw score is symbolized by a UPPER CASE X • The mean of the raw scores is symbolized by X • The deviation score is symbolized by a lower case • The deviation score is computed by subtracting the mean from the score: x xX X • If someone scores at the mean, the deviation score would be zero. • If someone scores above average, the deviation score will be a positive number. • If the score is below the mean the deviation score will be a negative number • If Johnny had come home and told his mother that his deviation score on the test was 0, she would have known immediately that he was average. • (Johnny’s mother is a professor of statistics at the local college) But that is not all. • While the distance a persons score is from the mean is more meaningful than the raw score, the interpretation of the distance from the mean depends on how spread out the scores are. The importance of Dispersion • For example, if Johnny tells his mother he scored 10 points above the mean on a test, we know right away that he is above average. • Question is, how much above average. • If the average score on the test is 55 and Johnny scores 65 and that is the highest score on the test then scoring 10 points above the mean is very good. (see figure1) • If on the other hand, the highest score on the test is 100, then a 65 is not so great. 20 15 Johnny's Score = 65 10 5 0 55 -5 Score So? • What we really need is a way to express a score that takes into account both how far the score is from the mean and how spread out the scores are. z-Scores • The standard deviation is the parameter that measures the dispersion or spread of the distribution. • z-scores measure the distance from the mean in standard deviation units. X X z s z-Scores • If a person scores 1 standard deviation (SD) above the mean, the z-score will be +1 • If they score 1 SD below the mean the z-score will be –1 • If they score 2 SD’s above the mean the z-score will be +2 • If they score at the mean the z-score will be zero. • Etc. Areas Under the Normal Curve • The proportion of the area under the normal curve can be interpreted as the probability that a score appears in that area. • Areas here are shown for standard deviation units. Areas Under the Curve • As shown here, the percentage of the distribution in a standard deviation band is the same regardless of the shape of the distribution Problem 10: Compute z-Scores Subject Score S1 1 S2 4 S3 4 S4 5 S5 5 S6 6 S7 7 S8 8 N= Total = Mean = x = X - Mean x2 z score SS = s= s SS N XX x z s s Problem 10: Compute z-Scores Subject Score x = X - Mean x2 z score S1 1 -4 16 -2 S2 4 -1 1 -0.5 S3 4 -1 1 -0.5 S4 5 0 0 0 S5 5 0 0 0 S6 6 1 1 0.5 S7 7 2 4 1 S8 8 3 9 1.5 N= 8 Total = 40 Mean = 5 SS = 32 s=2 s SS N XX x z s s Problem 11: Properties of z-Scores Subject z – Scores (from Problem 10) Deviation score of Squared the z’s deviations of z’s S1 S2 S3 S4 S5 S6 S7 S8 N= Total of z’s = Mean of z’s = SS of z’s = Standard deviation of z’s = Problem 11: Properties of z-Scores Subject z – Scores (from Problem 10) Deviation score of Squared the z’s deviations of z’s S1 -2 -2 4 S2 -0.5 -0.5 0.25 S3 -0.5 -0.5 0.25 S4 0 0 0 S5 0 0 0 S6 0.5 0.5 0.25 S7 1 1 1 S8 1.5 1.5 2.25 N= 8 Total of z’s = 0 Mean of z’s = 0 SS of z’s = 8 Standard deviation of z’s = 1 Using the Standard Normal Distribution • Because all Normal distributions share the same properties, we can us the standard normal distribution (the distribution of zscores) for our computations and get the same results. • In the distribution with mean of 64.5 and standard deviation of 2.5, 68% of the distribution is between 62 and 67 (-1 SD to +1 SD). • In the standard normal distribution (with mean 0 and standard deviation 1), 68% of the distribution is between -1 SD and +1 SD. N(64.5, 2.5) N(0,1) => x z Standardized height (no units) Problem 12: Women’s Heights • The average woman is 64.5 inches tall. • Mean = 64.5 • Standard Deviation = 2.5 Problem 12: Women’s Heights • Maria is 67 inches tall (5’ 7”). • What is Maria’s zscore? • What percent of women are shorter than Maria? • What percent are taller? Problem 12: Women’s Heights • Alexis is 62 inches tall (5’ 2”). • What is Alexis’ zscore? • What percent of women are shorter than Alexis? • What percent are taller? Problem 12: Women’s Heights • Barbie is 69.5 inches tall (5’ 9.5”). • What is Barbie’s z-score? • What percent of women are shorter than Barbie? • What percent are between Alexis and Barbie? Problem 12: Women’s Heights • Leela is 68 ¾ inches tall (5’ 8 ¾ ”). • What is Leela’s zscore? • Can we compute the percent of women who are shorter than Leela? • Why or why not? Problem 12: Women’s Heights • Leela is 68 ¾ inches tall. • Her z-score is 1.5 • Use http://davidmlane.com/hyperstat/z_table.html to compute the percent of women who are shorter than Leela. Problem 12: Women’s Heights • How tall do you have to be to be taller than 50% of the women? • How tall do you have to be to be taller than 84% of the women? • How tall do you have to be to be taller than 97.6% of the women? Problem 12: Women’s Heights • Use http://davidmlane.com/hyperstat/z_table.html for the following problems: • How tall do you have to be to be taller than 95% of the women? • How tall do you have to be to be taller than 99% of the women? • We can be sure that 95% of the women are between what heights? Problem 13 • Use http://davidmlane.com/hyperstat/z_table.html for the following problem: • You have been timing how long it takes to get to work in the mornings. The mean is 22.6 minutes with a standard deviation of 8.16 minutes. • You have to be at work at 8:30 am at the latest. • How many minutes before 8:30 do you have to leave to be 95% confident that you will get there at or before 8:30? • When do you have to leave to be 99% sure you’ll be there by 8:30?