Section 1-1 Overview Created by Tom Wegleitner, Centreville, Virginia Overview A common goal of surveys and other data collecting tools is to collect data from a smaller part of a larger group so we can learn something about the larger group. In this section we will look at some of ways to describe data. Definitions Data observations (such as measurements, genders, survey responses) that have been collected. Definitions Statistics a collection of methods for planning experiments, obtaining data, and then then organizing, summarizing, presenting, analyzing, interpreting, and drawing conclusions based on the data. Definitions Population the complete collection of all elements (scores, people, measurements, and so on) to be studied. The collection is complete in the sense that it includes all subjects to be studied. Definitions Census the collection of data from every member of the population. Sample a sub-collection of elements drawn from a population. A researcher wants to study the effects of smoking on cholesterol level in Jackson County. What would be his population? All adults in Jackson County who smoke at least one pack per day. What could possibly be his sample? Some reasonable number of smokers in Jackson County who smoke one pack per day. A sociologist hypothesizes that the average annual income of households in Marianna is less than $25,000 per year. To test her hypothesis, she samples 500 households in the city and determines the income of each. Describe the population. The set of all households in Marianna. Describe the sample. The sample must be a subset of the population. In this case, it is the 500 households selected by the sociologist. “Cola War” is the popular term for the intense competition between Coca-Cola and Pepsi displayed in their marketing campaigns. Their campaigns have featured movie and television stars, rock videos, athletic endorsements, and claims of consumer preference based on taste tests. Suppose, as part of a Pepsi marketing campaign, 1,000 cola consumers are given a blind taste test. Each consumer is asked to state a preference for Brand A or Brand B. What is the population? The population of interest is the set of all consumers of “cola” products. What is the sample? The sample is the 1,000 cola consumers selected from the population of all cola consumers. Section 1-2 Types of Data Created by Tom Wegleitner, Centreville, Virginia Definitions Parameter a numerical measurement describing some characteristic of a population population parameter Definitions Statistic a numerical measurement describing some characteristic of a sample. sample statistic Definitions Quantitative data numbers representing counts or measurements. Example: weights of supermodels. Definitions Qualitative (or categorical or attribute) data can be separated into different categories that are distinguished by some nonnumeric characteristics. Example: genders (male/female) of professional athletes. Classify each variable as qualitative or quantitative. • Colors of automobiles in a dealer’s showroom. • Number of seats in movie theaters. • Classification of patients based on nursing care needed(complete,partial, or self care) • Lengths of newborn cats of a certain species. • Number of complaint letters received by an airline per month. Working with Quantitative Data Quantitative data can further be distinguished between discrete and continuous types. Definitions Discrete data result when the number of possible values is either a finite number or a ‘countable’ number of possible values. 0, 1, 2, 3, . . . Example: The number of eggs that hens lay. Definitions Continuous (numerical) data result from infinitely many possible values that correspond to some continuous scale that covers a range of 2 3 values without gaps, interruptions, or jumps. Example: The amount of milk that a cow produces; e.g. 2.343115 gallons per day. Classify each variable as discrete or continuous. • Number of cartons of milk manufactured each day. • Temperatures of airplane interiors at a given airport. • Incomes of college students on work study programs. • Weights of newborn calfs. • Number of tomatoes on each plant in a field. Levels of Measurement Another way to classify data is to use use levels of measurement. Four of these levels are discussed in the following slides. Definitions nominal level of measurement characterized by data that consist of names, labels, or categories only. The data cannot be arranged in an ordering scheme (such as low to high) Example: survey responses yes, no, Definitions ordinal level of measurement involves data that may be arranged in some order, but differences between data values either cannot be determined or are meaningless Example: Course grades A, B, C, D, or F Definitions interval level of measurement like the ordinal level, with the additional property that the difference between any two data values is meaningful. However, there is no natural zero starting point (where none of the quantity is present) Example: Years 1000, 2000, 1776, and 1492 Definitions ratio level of measurement the interval level modified to include the natural zero starting point (where zero indicates that none of the quantity is present). For values at this level, differences and ratios are meaningful. Summary Levels of Measurement Nominal - categories only Ordinal - categories with some order Interval - differences but no natural starting point Ratio - differences and a natural starting point Classify each as nominal, ordinal, interval, or ratio level data. • Horsepower of motorcycle engines. • Ratings of newscasts in Houston(poor, fair,good, excellent) • Temperature of automatic popcorn poppers • Time required be drivers to complete a course • Marital status of respondents to a survey o savings accounts. Recap In Sections 1-1 and 1-2 we have looked at: Basic definitions and terms describing data Parameters versus statistics Types of data (quantitative and qualitative) Levels of measurement Key Concepts Sample data must be collected in an appropriate way, such as through a process of random selection. If sample data are not collected in an appropriate way, the data may be so completely useless that no amount of statistical torturing can salvage them.