Please check, just in case… Announcements 1. Terminology Treasure Hunt due in two weeks (Oct 29). Please check the resources provided in the folder on e-reserves and bring them with you to class NEXT WEEK. 2. I need two volunteers to meet me at my office next week before class to help with materials. Volunteers needed after class too. 3. Make an appointment to meet with me with any questions you have about upcoming assignments or class topics/concepts. Quick questions, quandaries, comments, or concerns? APA Tip of the Day: Ampersand When there are two authors for a reference you cite, you need to cite both of them every time. When you cite them in a sentence, but not within parentheses, use “and.” When you put the citation within a parenthesis, use an ampersand (“&”). Examples of ampersand and and: • According to Gomez and Garcia (2012), “this is very interesting” (p. 107). • “This is very interesting” (Gomez & Garcia, 2012, p. 107). Topic: Psychometric terminology October 8, 2013 Tonight’s Terminology • • • • • • Reliability (review) Validity (review) Chronological age Raw score Norm-referenced (quick review) Normal curve Definitions Reliability: 1) “Reliability refers to the results obtained with an assessment instrument and not to the instrument itself.” 2) “An estimate of reliability always refers to a particular type of consistency” (i.e. over time, interrater reliability, with different tasks). Definitions, cont. Reliability: 3) “Reliability is a necessary but not sufficient condition for validity.” 4) “Reliability is primarily statistical.” (Linn & Gronlund, 2000, pp. 108-109) Definitions, cont. Validity: 1.) “Validity refers to the appropriateness of the interpretation of the results of an assessment procedure for a given group of individuals, not to the procedure itself.” 2.) “Validity is a matter of degree; it does not exist on an all-or-none basis.” Definitions, cont. Validity: 3.) “Validity is always specific to some particular use or interpretation. No assessment is valid for all purposes.” 4.) “Validity is viewed as a unitary concept based on various kinds of evidence.” Definitions, cont. Validity: 5.) “Validity involves an overall evaluative judgment. It requires an evaluation of the degree to which interpretations and use of assessment results are justified by supporting evidence and in terms of the consequences of those interpretations and uses.” (Linn & Gronlund, 2000, pp.75-76 ) Definitions, cont. “Validity is an evaluation of the adequacy and appropriateness of the interpretations and use of assessment results.” (Linn & Gronlund, 2000, p. 73) Chronological Age????? Chronos = sequential time Raw score The number of items correct on a test. Without other information on the test, the raw score is meaningless – it must be interpreted for each child using the information from the test manual. Norm-referenced Tests Describe “performance in terms of the relative position held in some known group (e.g., typed better than 90 percent of the class members).” (Linn & Gronlund, 2000, p. 42) NR assessments compare individual performance against others’ performance. What is a Normal Distribution? It is the idea that for a number of human characteristics, such as height or weight, most of the “cases” will cluster around the middle, with fewer at the high and low ends. So, if you plot the number of cases at each possible data point, you end up with the “bell curve.” Ways to talk about the “middle” (measures of central tendency) • Mean: the statistical average • Median: the score where half fall above and half fall below. • Mode: the most frequent score In a normal distribution, the mean, median, and mode are exactly the same. Why is a “normal curve” important? • Norm-referenced assessments are based on the assumption that human abilities, like height and shoe size, are “normally distributed.” • A normal distribution allows us to calculate a number of important statistics, which allow us to compare an individual’s score with that of their norm group. Testing terminology related to the “normal curve”: Measures of Variability: • Range • Standard deviation Standard score • Z-score • Deviation IQ • Stanine • Percentile rank • Age equivalency • Grade equivalency Measures of Variability These tell us how spread out (or dispersed) the scores are. #1 - Range • A very rough measure of variability. • It tells us the distance between the highest and the lowest score. • If there are unusually high and/or low scores, this can be misleading. • What is the range of height in our class? #2 - Standard Deviation “A measure of the variability, or dispersion, of a distribution of scores” (Harcourt, 2000, p. 8). Standard Deviation: A standard deviation corresponds to a particular percentage of the scores, both above and below the mean. Two thirds (68%) of scores will fall within 1 standard deviation of the mean. More than 95% of all scores fall between 2 standard deviations above and below average. Less than 2.5% of scores would be below 2 standard deviations below the mean. Standard Deviation This tells us how closely grouped together or how far apart the raw scores on a particular test are. A standard deviation corresponds to a particular percentage of the scores, both above and below the average score. For example, over 2/3 (68%) of scores will fall within 1 standard deviation of the average. More than 95% of all scores fall between 2 standard deviations above and below average. Standard Deviation, cont. So, if the scores are very closely grouped, there will not be a lot of distance between the high and low raw scores for the majority of the students. If there is a large amount of difference between most students’ scores (variance), one standard deviation will include a wider range of raw scores. Standard Deviation, cont. Standard Deviation, cont. “Standard deviation is such an accurate measure of variability that if a distribution is reasonably normal, then by knowing only two numbers, the mean and the standard deviation, it is possible to reconstruct and redraw the distribution. Whenever the shape of a distribution of scores approaches normal, the standard deviation can be used as a measuring rod to lay off distances from the mean.” http://library.athabascau.ca/caap6/613M1lesson1.pdf “Normal” has a statistical definition: scores within average range – not really, really high and not really, really low. Really high and really low are usually set in terms of standard deviations. More than two standard deviation above or below the mean are scores which typically qualify someone for special education (either as gifted or with a disability) because more than 95% of scores would fall within 2 standard deviations of the mean. So… Standard deviations are a way of considering a particular score with reference to distance from the mean and percentage of scores which fall within that distance. It helps us figure out how high or low a particular score is, in comparison to other Also… Standard deviation is an important statistic, because it lets us calculate “standard scores.” Standard Score: This is a way of representing raw scores that tells us how far above or below average that score is, in a way that is comparable across tests and across time. Standard scores “translate” raw scores into a common way of representing scores. Different Types of Standard Scores: • • • • • • • Z-Score Deviation IQ Stanine T-score Percentile Age Equivalency Grade Equivalency Z-Scores This tells you how manyis If a simply student’s raw score standard deviations a student’s 1 ½ standard deviations score above or below the mean below the mean, what is his/her score fell. his/her z-score? Standard Deviation & z-scores Deviation IQ People popularly refer to this as someone’s “IQ.” This is where 100 equals the average score, and each standard deviation equals 15 points. Deviation IQ This is where 100 equals the average score, and each standard deviation equals 15 points. So, hypothetically 68% (2/3) of the population would receive a deviation IQ score of between 85 and 115. People popularly refer to this number as someone’s “IQ”. It is really just one of a number of ways of changing a raw score into a different kind of score that can be compared across tests or people. Stanine This is another kind of standard score, where the entire range of possible raw scores is divided up into nine groups. The lowest group of scores fall in Stanine 1. This highest group of scores fall in Stanine 9. Stanine 5 includes scores right in the middle (average) -- scores that fall in between the 40th and 60th percentile would be in Stanine 5. Stanines • Good for BROAD measurement of performance. • Pros • Can compare scores across tests • Intervals are equal • Cons • Rough estimate of performance (not as sensitive as other standard scores) Percentile Rank This tells us what percentage of children scored at or below your child on a particular measure (i.e. height, weight, test score). Percentile Rank This tells us what percentage of children scored below your child on a particular measure (i.e. height, weight, test score). For example, if your infant daughter is in the 35th percentile for weight, that means that 35% of girls at the same age as your daughter weigh less than her. 65% of girls at the same age weigh the a same or more. Standard Deviation & Percentile Rank Percentile Ranks Pros • Generally easy to understand. Cons • Need to be careful, because the intervals between ranks are not equal. For example, the difference between 50th & 60th percentile is much smaller than the difference between 70th & 80th percentiles. Age Equivalency This tells us that a student’s score is similar to the average score of children at X age. Ex. If your son’s language development (as measured by a particular test) is at the age equivalent of 4-3, it means that he scored on that particular test at the same level as the average child who was 4 years and 3 months old. Age Equivalents Pros • Easy for parents to understand. Cons • Not equal units of measurement. • Basically just a “ballpark” figure. • Gives the erroneous idea that individuals with developmental delays are just like child, but in an older body. Grade Level Equivalency • Very misunderstood concept. • This compares a child’s score to the average performance of students at different grade levels, if they were to take the same test as your student. Grade Level Equivalency Example If your second grader has a grade level equivalency of 3.1 in language arts, it means that your child scores the same as the average third grader, in the first month of the year, would have scored on the second grade test. This does not mean that your child is doing third grade work or that he should be moved up into the third grade. It does mean that he is performing, on second grade work, more like a third grader would. Please take a minute for the minute paper.