Preliminary Concepts Why Do We Need Statistics? Statistics is about making decisions Consider the examples below You would like to know whether your English skills are good enough to take psychology courses in English. Should you focus on English practices, and take more courses on reading and writing? The simplest way is to take a test and see your score. Let’s say it is 110. Is it a good score? You need to buy a simple calculator. How much will you pay for it? Is there any shop in your neighborhood that you can buy the same calculator at a lower price? Let’s say there are sixteen shops in your town. How will you decide the cheapest one? Why Do We Need Statistics? Which is a better teaching technique: giving out the course notes and presentations or requiring students to take notes during class? You would like to know which goalkeeper performed better last season. You need to count each goalkeepers’ number of times that s/he stopped goals, saved penalty kicks and number of the games that s/he played. You hate ÖSYM and university exam.You believe that LYS has nothing to say about a student’s future success in the university. That is, students’ GPA (Grade Point Average) cannot be predicted from their LYS score. How can you prove it? Why Do We Need Statistics? Even in everyday life, we need to decide in ambiguous conditions or under the conditions in which there is a huge amount of information. In such conditions, we need an effective tool to organize the information that we have. Statistics provides such a mathematical tool by which we can summarize the existent information and/or make predictions or inferences. Why Do We Need Statistics? Statistics is the science of learning from data, and of measuring, controlling, and communicating uncertainty; and it thereby provides the navigation essential for controlling the course of scientific and societal advances (Davidian, M. and Louis, T. A., 10.1126/science.1218685). Statisticians apply statistical thinking and methods to a wide variety of scientific, social, and business endeavors in such areas as astronomy, biology, education, economics, engineering, genetics, marketing, medicine, psychology, public health, sports, among many. "The best thing about being a statistician is that you get to play in everyone else's backyard." (John Tukey, Bell Labs, Princeton University) Basic Concepts Descriptive and Inferential Two kinds of statistics could be differentiated Descriptive statistics (deduction) is the discipline of quantitatively describing the main features of a collection of data Suppose that you visited each shop in your town and checked the prices of the calculators. Basic Concepts Descriptive Statistics Let’s say you decided to buy Casio. Now we need to rearrange our table to see the lowest price for Casio Casio ZX (TL) Sharp Q (TL) Yumatu 1.0 (TL) Teknosa 5 6 2 Mediamarkt 6 4 3 Migros 6 5 3 BİM 4 6 1 KİPA 5 4 3 Selim Kırtasiye 7 5 3 Cafer’in Yeri 6 5 3 Elektro world 5 5 2 Tchibo 7 5 3 Ayfer Kırtasiye 6 4 4 Cümbüş Kırtasiye 6 5 3 Koçtaş 7 5 2 Abdurrahman Abinin Tezgahı 4 5 3 Bizim sokaktaki tezgah 3 5 2 Sizin sokaktaki tezgah 4 4 3 Basic Concepts Descriptive Statistics As you can see, Bizim Sokaktaki tezgah offers the best price. Looking closely to the table, we can see other characteristics of the distribution. The most common price for Casio is 6 TL. Thirty three percent of the shops sell Casio for 6 TL (5/15*100=33.33). The highest price for Casio is 7 TL and twenty percent of the shops offers that price (3/15*100=20.00) Based on the present table, several deductions could be made. For instance, which shop offers the best price for Yumatu or in which shop, we can see the biggest difference between the prices of Sharp and Yumatu. Casio ZX (TL) Sharp Q (TL) Yumatu 1.0 (TL) Selim Kırtasiye 7 5 3 Tchibo 7 5 3 Koçtaş 7 5 2 Mediamarkt 6 4 3 Migros 6 5 3 Cafer’in Yeri 6 5 3 Ayfer Kırtasiye 6 4 4 Cümbüş Kırtasiye 6 5 3 Teknosa 5 6 2 KİPA 5 4 3 Elektro world 5 5 2 BİM 4 6 1 Abdurrahman Abinin Tezgahı 4 5 3 Sizin sokaktaki tezgah 4 4 3 Bizim sokaktaki tezgah 3 5 2 Basic Concepts Inferential Statistics Inferential Statistics (induction) is aimed to make predictions based on the analysis of numeric data. Inferential statistics is about the probability. By the aid of the inferential statistics, we can see whether our predictions are better than chance. Basic Concepts Inferential Statistics Let’s turn back to our example about your English Skills. When you get a certain score from a test (110 points for our example), at least three questions arises: Q1: Is this your true score? Q2: What is the meaning of your score? Q3: Can we take the score in this test as a predictor of prospective (future) success in Psychology courses? Basic Concepts Inferential Statistics Q1: Is this your true score? Were you tired when you took the test? Or, did the test cover the subjects that you are very familiar. Or were you simply lucky (lucky guess is an inevitable part of multiple choice tests). Basic Concepts Inferential Statistics One way to see whether your score was affected by chance or other factors is to complete an identical test or the same test. Of course, you would learn the items if you took the same test. Let’s say you find identical tests and completed ten of them. Test Number Score 1 110 2 96 3 98 4 96 5 112 6 98 7 106 8 106 9 96 10 98 11 89 Mean 100,45 Basic Concepts Inferential Statistics So, which of them is your true score? Should we accept the mean as your true score? But, you should note that you never got 100.45 and it seems not possible to take such a score. Test Number Score 1 110 2 96 3 98 4 96 5 112 6 98 7 106 8 106 9 96 10 98 11 89 Mean 100,45 So, what we need to do first is to find out (predict) your true score. Basic Concepts Inferential Statistics Q2: What is the meaning of your score? What is the rage of scores, which could be taken from the test? Let’s say the possible range for the scores is between 25 and 150. Is it enough to say your score is OK? What you need to decide is a reference point. If you find a way to compare your score with a special score, you can decide whether your English is good or bad. There could be two kinds of reference points Basic Concepts Inferential Statistics You can ask your classmates to complete the same test and you simply evaluate your rank among their scores. Let’s say, you are better than sixty percent of the classmates in that test. Shall we take that as an evidence of your superiority in English? Let’s examine the table Basic Concepts Inferential Statistics As you can see, %30 of your classmates got the highest scores. The difference between the higher score and your score is 36 point. The difference between the lower score and your score is 1 point. 1 Ayda 2 Funda 3 Melda 4 You 5 Selda 6 Selma 7 Şeyma 8 Ceyda 9 Arda 10 Hülya Cumulative Test Score Percent 148 100 146 90 146 90 110 70 109 60 109 60 108 40 108 40 108 40 107 10 Do you still think you proficiency is better than most of your classmates? Basic Concepts Inferential Statistics As a second way, you can compare your score with a national cut point(s). For instance, test developers might publish a chart to interpret the scores: 20-70 beginner, 71-90 intermediate, 91-110 upperintermediate, 111-150 advanced Your original score was 110. That score is the upper limit for upper-intermediate. That is your score is at the edge of the border between upper-intermediate and advanced. According to the manual, you should be categorized as upperintermediate. Do you agree with that? So, the second thing that we infer is whether your score significantly differs from a meaningful reference point. Basic Concepts Inferential Statistics 3) Can we take the score in this test as a predictor of prospective (future) success in Psychology courses? On which bases? Let’s say some famous Psychologist took the same test just beginning of the first semester in Çağ University Basic Concepts Inferential Statistics As it could be seen at the table, there is a relation between the test scores and GPA. As the proficiency scores increase, GPA increases. This pattern is called positive correlation. In the case of negative correlation, one score decreases as the other score increases. If the correlation (relation) between proficiency and GPA is strong enough, then we can infer your future success. English Proficiency Erikson Skinner Sigmund Freud Pavlov Reich GPA 1st Year 145 85 135 82 95 78 90 77 90 76 Basic Concepts Inferential Statistics Considering the table, we can see that your proficiency score is between Skinner and Freud. So, your GPA will be most probably between 78 and 82. Congratulations, you have the potential to become a better psychologist than Freud English Proficiency Erikson Skinner Sigmund Freud Pavlov Reich GPA 1st Year 145 85 135 82 95 78 90 77 90 76 So, the third thing that we need to infer is whether we can predict your GPA from your proficiency score. Basic Concepts Inferential Statistics In sum, Descriptive Statistics is about describing certain characteristics of the sample or population. However, Inferential Statistics is about predicting certain characteristics of the population by evaluating the characteristics of the sample. At this point, we need to define what is sample and population Basic Concepts Population and Sample A population can be defined as including all people or items with the characteristic one wishes to understand. Let’s say, you believe that blonde girls are not that clever. In this example, all blonde girls in the world are your population. The characteristic that you are interested in is their level of intelligent. Basic Concepts Population and Sample To clarify your hypothesis, you need to limit your population. So, are you also interested in the girls changed their hair color into blonde? Probably not. Then, you should restate your argument: Inherently blonde girls are not that clever. Once you define your population properly, you can start collecting data on the characteristic that you are interested. Basic Concepts Population and Sample Sample is a subset of the population, which we can reach and collect data. Let’s say you are realy eager to conduct a study on the level of intelligence of inherently blonde girls. Since, it is not possible to reach each blonde girl in the world, you need to find a subgroup and give them your IQ test. Basic Concepts Population and Sample Sampling is a vital issue for statistics and research methods. The main purpose of sampling is to reach the most representative subset of the population. If your sample is not representative, your findings will not be valid. Basic Concepts Population and Sample In the 1936 American presidential election Roosevelt, a Democrat, was being challenged by Republican Alf Landon. One of the leading magazines of the day , Literary Digest, surveyed voter preferences by mailing questionnaires to 10 million people whose names were gathered from list of automobile and telephone owners. Over the two million people responded and the results indicated that Landon would beat Roosevelt by a landslide. Basic Concepts Population and Sample In fact, Roosevelt beat Landon by one of the largest margins ever. This was one of the largest surveys ever taken. How could it have been so wrong? The US was in the middle of Great Depression in 1936 and only a minority of people was financially secure enough to own a car or telephone. They tended to vote Republican. Most other Americans were worried about buying enough food to feed their families, and they tend to vote Democratic. Basic Concepts Population and Sample To ensure representativeness, inferential statistics require random sampling. By random sampling, we ensure that each possible sample of the same size has an equal probability of being selected from the population. For instance, suppose that we wish to select five person five persons random from our current statistics class. What we need to do is to write the name of each class member on a slip of paper, put those slips in a gallon jar, shake and tumble the contents of the jar well, and withdraw five slips from the lot. Basic Concepts Variables and Constants A variable is a characteristic that could take on different values. Considering our hypothesis about blondes, you can see that the variable that we are interested in is the level of intelligence. When we measure blonde girls’ intelligence, we can see that their scores are not identical. In fact, statistics is about variability. By the aid of the statistical techniques, we try to organize and understand the variability in nature. Basic Concepts Variables and Constants A constant is a characteristic which is identical for the each member or the sample. For instance, hair color and gender would be constant for our hypothetical study on the level of intelligence of blonde girls. Additionally, constants delimit applicability of our findings. Even if we observe an intelligence deficiency in blonde girls, it doesn’t say anything about red-heads or blonde boys. Scales of Measurement Measurement is the process of assigning numbers to observations. Let’s discuss about how we measure the properties below Weight of a box: a weighing machine Length of a table: a ruler Beauty of a competitor in a beauty contest: (?) Gender of a participant: (?) Success of a football team in the league: (?) Scales of Measurement What about the meaning of the numbers that we assign. Are they same? If the weighing machine show zero, can we take that number as an indicator of no weight at all? What about the judge in the beauty contest? If he assign zero to a competitor, does it mean she has no beauty? Let’s say Galatasaray won 30 games last year, and Fiskobirlik won 15 games. Can we say Galatasaray won twice as many games as Fiskobirlik? Let’s say the rank of Galatasaray is 2 and of Fiskobirlik is 12. Does that mean Galatasaray’s rank of success is 6 times higher than Fiskobirlik’s? What about the beauty contest? If Aylin wins the contest and Jale gets the third, does it mean Aylin is three times more beautiful than Jale? Scales of Measurement Apparently, numbers have different meanings in these situations. To distinguish the different kinds of situations, we need to identify four kinds of measures. Nominal Scales Ordinal Scales Interval Scales Ratio Scales Scales of Measurement Nominal Scales Nominal scales are the simplest kinds of scales. Some variables are qualitative in their nature rather than quantitative. For instance, biological sex, types of cheese, brand names of the cell phones, etc. Numbers in nominal scales has no meaning rather than indication of differing categories. If we assign 1 to males and 2 to females, there is no implication that females “more than” male in some dimension. Scales of Measurement Nominal Scales Nominal Scales has only two reguirements: The categories have to be mutually exclusive: the observations can not fall into more than one category The categories have to be exhaustive: there must be enough categories for all observations Examples Male and Female are mutually exclusive and exhaustive categories for biological sex. What about Gender (social sex). Some individuals in biological female category might feel much more like they are male. So, we need to include other categories like Gay, Lesbian, transsexual etc. Scales of Measurement Ordinal Scales A more complex scale than nominal ones The categories must still be mutually exclusive and exhaustive They are also indicate the order of magnitude of some variable The outcome of ordinal scales is a set of ranks Socio-economic Status: Low-Middle-High College students: Freshman, Sophomores, Juniors, and Seniors Numbers can be assigned to the categories, but that numbers has no meaning than the rank of numbers. Let’s consider our example of SES Is the difference between Low and Middle equal to the difference between Middle and High? Scales of Measurement Interval Scales The next major level of complexity is the interval scales Interval Scales have all the properties that ordinal scales have. Additionally, The interval (distances) between scores has the same meaning anywhere on the scale. Examples: Level of depression on Beck Depression Scale Pain temperature scales Celsius and Fahrenheit scales Scales of Measurement Interval Scales Let’s discuss about Celsius scale The difference between 10C and 20C is equal to 20C and 30C. That is, energy you need to increase heat of a certain amount of water from 10 to 20 is equal to the amount of energy for an increase from 20 to 30. What about 0C? Does it mean there is no heat? Scales of Measurement Ratio Scales The most complex and advanced scales Ratio scales posses all the properties of interval scales and in addition has a absolute zero point Gram for weight and centimeter for height are some examples. If something is zero grams, then it has no mass. If something is zero centimeters, then it has no length. Kelvin is a good example. Differing from Celsius and Fahrenheit, Kelvin has an absolute zero point. That is, at zero Kelvin substance would have no molecular motion (energy) and, therefore, no heat Why does absolute zero point matter? Imagine we want to measure the temperature of our classroom with a Celsius scale. Let’s say it is 30C. One of our friends would say it was 15C last winter. So, does it mean it is now twice hotter than last winter? Why does absolute zero point matter? No it doesn’t Since, the zero point is not absolute in Celsius scale; we can move it up or down. Let’s say we decided to move it 10C lower. Thus, our new Celsius Scale would show 40C for the current temperature, and 25 for last winter. So, it is not meaningful to assert that a temperature of 30C is twice hot as one of 15 or that a rise from 30C to 33C is a 10% increase. Final notes about scales The ratio scale subsumes all other scales Ratio>Interval>Ordinal>Nominal Computation with the scores Nominal scales: Clustering Ordinal Scales: Clustering and rank order Interval Scales: addition and subtraction Ratio Scales: addition, subtraction, multiplication and division Variables and Computational Accuracy Variables may be either discrete (kesikli) or continuous (sürekli). Discrete Variable The variables which can take on only certain values For instance, number of the students in our classroom is discrete. It is 43 this week, but it was 42 last week. But no value can be between these two. Continuous variables can take on any value. For instance, temperature can be 29C, 29.4C or 30C Variables and Computational Accuracy Even though a variable continuous in theory, the process of measurement always reduce it to a discrete one. Imagine, the true weight of a tomato were 0.23138 kilogram. A standard weighing machine is not that sensitive. It would measure weight to the nearest hundred of a kilogram. So, it would show 0.231 Is that a problem? Variables and Computational Accuracy Within the limits of recording equipment, it is up to the investigator to determine the degree of accuracy appropriate to the problem at hand If you want to buy a tomato, 0.00038 kilogram is not important. What if you would like to buy gold? 1 kg tomato is 1.20 TL. So, 0.00038 kg is 0.000456 TL 1 g Gold is 101 TL. So, 0.00038 kg is 3.838 TL Variables and Computational Accuracy In Psycohology, we also need to be very carefull in computational Accuracy. If a psychologist works on an theoretical construct which is not directly related to individuals’ wellbeing, accuracy will not be that important On a paper-pencil attitude measure, it will not be important if a participant rate his/her favorability toward an attitute object as 7 while his/her true attitude is 8 What about intelligence, apptitude, or skills?