SAMPLING ISSUES: PART I LEONIE HUDDY, STONY BROOK UNIVERSITY [email protected] MATTHEW BAUM HARVARD UNIVERSITY OUTLINE I. Major Sources of Survey Error II. Coverage Error 1. Coverage Problems in US (Phone & Web) 2. Coverage Issues in Sweden 3. Implications for Experiments III. Non-Response Error 1. Rates in the US 2. Factors Influencing Response Rates 3. Response rates in Sweden IV. Survey Mode Errors I. MAJOR SOURCES OF SURVEY ERROR (ALWIN/GROVES) 1. Coverage error: Error due to failure to include some elements of the population in the sampling frame (e.g, cell phones in RDD landline study in the US, non-computer households in a web survey) 2. Sampling error: Errors due to sampling a subset rather than the entire population. 3. Non-response error: Error due to failure to obtain data from all selected population elements (young males harder to reach; Latinos reluctant) 4. Measurement error: Error that occurs when observed value is different from the true value (higher reports of voter turnout in ANES) These errors also apply to survey experiments II. COVERAGE ERROR (GROVES) DEFINITIONS Sampling frame: set of lists or procedures intended to identify all elements of the target population; e.g., RDD, national registry (SPAR), US Postal Mail Delivery System Coverage: Undercoverage - some population elements are missing from the sample frame (e.g., cell phone users who are disproportionately young and less affluent in an RDD landline study; older respondents who lack computers or broadband in a web survey) Ineligible units (non-working phone #s) Clustering of elements at a single frame element (several people with 1 phone number) Duplication: single target element linked to multiple frame units (a person listed more than once in a national registry) 2. NON-COVERAGE PROBLEMS IN TELEPHONE SAMPLES NON-LANDLINE HOUSEHOLDS • younger, more mobile, less affluent • more ethnic and racial minorities • live in rural areas, south, central cities • Reaching 25% of the US population Solution? • Base sample on a mix of cell phone-only and landline households and eliminate those with landlines from the cell phone sample • Post-stratification weights based on demographic factors THE GROWING CELL-ONLY POPULATION BY AGE (PEW 2010) AGE COMPOSITION OF LANDLINE PHONE SAMPLES (PEW, 2010) 3. Non-Coverage on Web: % Broadband at Home (Pew) All adult Americans Gender Male Female Age 18-29 30-49 50-64 65+ Race/Ethnicity White (not Hispanic) Black (not Hispanic) Education Less than high school High school grad Some college College + Income Under $30K $30K-50K $50K-$75K Over $75K 2005 2006 2007 2009 30% 42% 47% 60% 31 27 45 38 50 44 61 58 38 36 27 8 55 50 38 13 63 59 40 15 76 67 56 26 31 14 42 31 48 40 63 52 10 20 35 47 17 31 47 62 21 34 58 70 24 46 73 83 15 27 35 57 21 43 48 68 30 46 58 76 42 62 73 83 4. NON-COVERAGE ISSUES IN SWEDEN Telephone & Mail • Sweden has a low-rate of cell-only households; ≤ 5% (Hecke & Weise 2012; in Telephone telephone surveys in Europe, ed. Häider, Häider, & Kϋhne; Springer, Heidelberg ) Non-coveraqe is far less of a problem in Sweden because samples are drawn from the national SPAR registry • No sample frame is ever 100% so there may still be minor non-coverage issues Internet • “Sweden has a unique position in the world when it comes to Internet use, not only because it is one of the countries with highest share of Internet users in the world but also because Internet use is more widely spread in Swedish society compared to other countries, in terms of age and educational level (Findahl 2007; 2008b). Among younger Swedes 16– 25 years old almost all (97%) use the Internet at least once a month; among older Swedes 56–65 years old Internet use is currently as high as 75%. The corresponding figure among individuals 66–75 years old is lower, however, at 51% (Findahl 2008a).” (quoted in Kallmen et al ) SPAR – NATIONAL SWEDISH POPULATION REGISTRY Statens personadressregister, SPAR includes all persons who are registered as resident in Sweden. • The data in SPAR is updated each day with data from the Swedish Population Register. • SPAR is specifically regulated in Swedish Law by the Act of (1998:527) statens personadressregister and by the Regulation (1998:1234) of statens personadressregister and the Swedish Tax Agency Regulation on handing out data from SPAR (SKVFS 2011:06). The aim of SPAR is clear from the purposes set out in article 3 of the Act. It states that personal data in SPAR may be processed to: • update, supplement and verify personal information or • select names and addresses for direct marketing, public service announcements or other comparable activities. Processing data in this respect is the same as handing out the data electronically. Data in SPAR are, after decision by the Swedish Tax Agency, electronically handed out at cost price. III. NON-RESPONSE ERROR Two key types of non-response: • Non-contact: the failure to reach the chosen respondent • Refusal: chosen respondent does not cooperate Rates have declined precipitously in the US over the last 2 decades; • Non-contact rates by telephone dropped dramatically after 2000 and the introduction of caller ID • Refusal rates are higher in urban areas RESPONSE RATES IN THE U.S. Response Rate = Number of people who completed an interview/total number of eligible respondents contacted (including not at home, refused, etc.) • Household CAPI or IN-PERSON surveys: in the U.S. these are around 50-60% in university research centers. • Telephone surveys: In the US, 40-50% at university centers using very stringent and expensive methods; lower for typical phone surveys at university centers (25-35%) much lower for marketing and media (6-20%) • Mail surveys: very variable; possible to get 15-20% RR with follow up; but depends on the population. • Web Surveys: Depends on the population. Could be as high as 50-70% within an organization with a known email list and organizational support, or <1% with a random group (e.g., banner ad recruitment). WHAT INFLUENCES RESPONSE RATES (RR)? 1. SURVEY MODE : highest for household in-person interview, generally lowest on the web. 2. RESPONDENT SUBGROUP. Non-response is often higher in cities; can also vary with age (young are harder to contact), and gender (men are harder to contact). 3. TYPE OF SURVEY ORGANIZATION: academic polls vs commercial. RR Typically higher when conducted by an academic or non-profit organization. 4. UNIT OF INTERVIEW: Higher RR if anyone in the home or a surrogate can be interviewed. • US National Health Interview Survey (NHIS): non-response rate is xx% • Sweden Census (SCB) contacts relatives of respondents to increase RR (from Jacob Sohlberg) 5. EFFORT TO REACH NON-RESPONDDENTS : Greater number of contact attempts, use of financial incentives, refusal conversion, longer interviewing period, all increase costs. 6. SURVEY TOPIC AND RESPONDENT INTEREST : Slightly higher RR on topical surveys in the news or those in which respondent is very involved and interested. WHO IS MISSING?—US NON-RESPONDENTS 1. Age: Underestimate the young. • Largely due to non-contact 2. Gender: Under represent men • More difficult to contact and refuse more 3. Race/Ethnicity: • Oversample blacks by phone • Undersample blacks in household in-person Typical Solution: Weight respondents to demographic population benchmarks NON-RESPONSE RATES IN SWEDEN • Varies by sample mode (mail, phone, web, IVR) • RR Remains high in mail surveys RESPONSE RATES WEB VS. MAIL: SWEDEN KÄLLMÉN ET AL 2011 Two random samples of 1250 individuals each were drawn from the same national register (DAFA-SPAR) over all Swedish individuals (aged 17-71) having a registered address. 1. Electronic, web-based response group, received a postcard with the same introductory text, an URLlink and a log-in code to the electronic version of the questionnaire 2. Paper-and-pen response group, two reminders were sent, three and six weeks after the main mailing. After the first mailing, 314 individuals (25%) responded to the AUDIT paper version and 167 (13%) responded to the web-based version. Following the first reminder, the total number of responses was 483 (39%) in the paper group and 230 (18%) in the web-based group. After the second reminder the final number of responses for the paper version was 663 (53.6%), 276 men and 344 women (43 did not disclose their gender). For the web-based version of the AUDIT, the final number of responses was 324 (26.2%), 140 men and 184 women. WEB VS. INTERACTIVE VOICE RESPONSE (IVR) SINADINOVIC ET AL, 2011 Initial sampling pool Incorrect addresses Final sampling pool Cumulative n Response no reminder After 1 reminder After 2 reminders Response rate Web IVR Respondent Total choice Internet/IVR 2 000 24 1 976 2 000 40 1 960 1 000 14 986 5 000 78 4 922 393 639 753 38.1% 312 557 665 33.9% 226 380 443 44.9% 931 1 576 1 861 37.8% 5. SO WHAT? IMPLICATIONS FOR SURVEY EXPERIMENTS Major problem with coverage and non-response errors is sample bias – overly educated, too sophisticated, older, etc. Does this matter when running an experiment with random assignment? It depends on : 1. Heterogeneous experimental treatment effect 2. Well-theorized and well-measured sources of experimental treatment heterogeneity (an issue to which we will return when discussing measurement issues) The following slides cover 2 examples concerning heterogeneous experimental treatment effects that depend on level of political sophistication (involvement or partisanship). FROM DRUCKMAN AND KAM, 2011 The external validity of a single experimental study must be assessed in light of an entire research agenda, and in light of the goal of the study (e.g., testing a theory or searching for facts). Assessment of external validity involves multiple-dimensions including the sample, context, time, and conceptual operationalization. There is no reason per se to prioritize the sample as the source of an inferential problem. The nature of the sample—and the use of students—matters in certain cases. However, a necessary condition is: a heterogeneous (or moderated) treatment effect. Then the impact depends on: o If the heterogeneous effect is theorized, the sample only matters if there is virtually no variance on the moderator. The range of heterogeneous, non-theorized cases may be much smaller than often thought. Indeed, when it comes to a host of politically relevant variables, student samples do not significantly differ from non-student samples. There are cases where student samples are desirable since they facilitate causal tests or make for more challenging assessments. SOURCE OF SURVEY ERROR (ALWIN) Non-observed (bias) Non-observed (variance) -coverage bias -coverage error variance -sampling bias -sampling error variance -nonresponse bias -nonresponse error variance Observed (bias) Observed (variance) -interviewer bias -interviewer error variance -respondent bias -respondent error variance -instrument bias -instrument error variance -mode bias -mode error variance EFFECTS OF SAMPLE BIAS (COVERAGE AND / OR NON-RESPONSE): UNPREDICTABLE EFFECTS IN EXPERIMENTS Bias can either enhance, dampen, or have no effect on the experimental outcome Example 1: From The Ambivalent Partisan (Lavine et al) Most sophisticated LEAST affected by ideology in presence of a partisan cue In the following example, researchers are interested in whether partisan labels would override ideological content in support of a policy. The answer varies with the mix of ambivalent vs. towards strong, univalent partisans in the partisan. Bias in the sample towards strong partisans would lead to stronger overall effects of a partisan cue. KNOWLEDGE NETWORKS POLICY STUDY; THE AMBIVALENT PARTISAN (LAVINE, JOHNSON, STEENBERGEN IN PRESS) Policy Only Condition: Congress has recently debated two policy measures dealing with benefits to social welfare recipients. The first policy, POLICY 1, calls for $1000 per month for a family of one child, with an additional $200 dollars for each additional child. These benefits are intended to last 7 years. Recipients would also receive $2,000 a year in food stamps and extra subsidies for housing and child care. (Generous) The second policy, POLICY 2, calls for $400 per month for a family of one child, with an additional $50 dollars for one additional child. These benefits are intended to last for 3 years. Recipients would also receive $500 a year in food stamps but no extra subsidies for housing or child care. (Less Generous) Policy + Cue Condition: Democrats and Republicans in Congress have recently debated two policy measures dealing with benefits to social welfare recipients. The first policy, POLICY 1, proposed by Republicans, calls for $1000 per month for a family of one child, with an additional $200 dollars for each additional child. These benefits are intended to last 7 years. Under this Republican plan, recipients would also receive $2,000 a year in food stamps and extra subsidies for housing and child care. Generous The second policy, POLICY 2, proposed by Democrats, calls for $400 per month for a family of one child, with an additional $50 dollars for one additional child. These benefits are intended to last for 3 years. Under this Democratic plan, recipients would also receive $500 a year in food stamps but no extra subsidies for housing or child care. Less Generous PREDICTED MARGINAL EFFECT OF LIBERAL VS. CONSERVATIVE POLITICAL ORIENTATION ON PREFERENCE FOR THE MORE GENEROUS POLICY PROPOSAL: KNOWLEDGE NETWORKS PANEL 0.1 0.0 −0.1 Marginal Effect 0.2 Policy Only Policy + Cue Univalent Ambivalent EFFECTS OF SAMPLE BIAS: EXAMPLE 2 Policy Support and Emotive Visual Imagery (Huddy & Gunthosdottir, 2000) Highly involved MOST affected by visual cue In this example, the goal was to understand the impact on policy support of a positive or negative image of an animal that would be saved by an environmental policy The effects varied with one’s position on environmental issues and so the findings would be stronger in a sample with a bias towards pro-environment views STIMULUS MATERIALS The design of this study is a 2 (pro or anti-environment message) times 5 (no animal, cute mammal, ugly mammal, cute insect, ugly insect) between subjects factorial design. The stimulus material consisted of flyers emulating pro and antienvironment fundraising letters. All flyers, whether pro or antienvironment, were about the same fictitious environmental dilemma, in which mining would assist an impoverished population living in the Guatemalan rainforest but would destroy the habitat of a geographically restricted animal. The pro-environment flyer argued for the protection of the animal; the anti-environment flyer argued that human needs outweigh environmental concerns. Both the name of the fictitious animal, Guatemalan Cobyx, and the fictitious organization, Club Berneaud International (CBI), were held constant Figure 1 Emotive Visual Imagery: Cute and Ugly Animals in the Pro and Anti-Environment Flyer Cute Insect Cute Mammal Ugly Insect Ugly Mammal Predicted Levels of Action for a Pro-Environment Organization Among Strongest Environment Supporters 9 8 7 6 5 High Involvement Low Involvement 4 3 2 1 0 No Picture Monkey Butterfly Bat Bug Emotive Image Note: Predicted levels of action calculated at a value of .25 on the pro-environment scale. IV. SURVEY MODE ERRORS: NONRESPONSE, NON-COVERAGE, AND MEASUREMENT ERROR 1. Survey Mode Errors Can Conflate Several Sources of Error In practice mode effects can reflect a different sample population, non-coverage, and non-response errors. Can eliminate population differences by randomly assigning respondents to mode from within the same population (e.g., SPAR) Still get large differences in response rate by mode in Sweden; e.g., Kallmen et al. DIFFERENCES DUE TO RESPONSE RATE & SURVEY MODE: AUDIT SCORES TO IDENTIFY PROBLEM DRINKING, > 8 FOR MEN; > 6 FOR WOMEN (KÄLLMÉN ET AL 2011) Gender Response n Mode Meanaudit score Std. Dev. Size of difference Men Electronic 140 5.80 4.77 .25 Paper 239 4.73 4.20 Electronic 184 4.12 4.29 Paper 294 3.39 2.59 Women .21 2. MODE & MEASUREMENT ERROR Origins of measurement differences by mode (1)Interviewers affect responses (e.g., telephone vs. web), Get decreased reporting of undesirable attitudes and behavior in personal interiviews (2) Comprehension affected by aural (phone) vs. visual (web) mode • Get visual layout effects, primacy, recency • Typically get a primacy effect on paper, recency on phone • More positive responses to scales on phone (when do not see the scale) (3) Ask different types of questions in different modes. • On the web use different kinds of responses for multiple vs. single responses (not comparable to phone) e.g., checklists and grids • Show cards in personal interviews • Can include longer lists of response options in person, mail, or web MODE BIAS ALTERS LINK BETWEEN GENDER & # OF SEXUAL PARTNERS, TOURANGEAU ET AL 2000 3. MOVE TO MIXED MODE SURVEY DESIGNS (DILLMAN) Benefits of Mixed Mode Deisgns: • Lower Cost; Start with least expensive method • Improve Timeliness • In 2003 NSF earned degrees survey, asked which mode best and used it in 2006. Improved response time. • Reduce Coverage Error; • Access to different kinds of people • Easier to Provide Incentives in some Modes • By mail in an initial mailing • Improve RR and Reduce Non-response Error • Do it in sequence • Reduce Measurement Error on sensitive questions But creates numerous complications for survey experiments SPECIALIZED POPULATIONS ON THE WEB • On occasion, may need to seek out special populations which are readily accessible on the web. Mediator and Participant Recruitment Details, SMIS Studies 1 Culture Wars (2006) 2 Partisan Identity (2007) 3 Partisan Identity (2008) 4 5 Campaign Political Blog Ads Metaphors Average (2007) (2007) 6 Political Metaphors (2008) Data Collection Dates 6/6-7/31, 2006 5/16-6/4, 2007 3/17-5/2, 2008 3/10-5/5, 2007 6/23-7/15, 2007 -- 4/15-5/13, 2008 Mediator Type Blogs/ Forums Blogs/ Forums Blogs/ Forums Blogs/ Forums Blogs/ Forums1 Blogs/ Forums RAs3 Mediators Contacted 100 100 178 198 50 125.5 4 Mediators Participated 24 4 23 18 6 15 4 Mediator Response Rate 24% 4% 13% 9% 12% 12.4% 100% Participants (N) 2248 630 3219 1452 2972 1569.2 141 Yield: Particip. / # Mediators 93.7 157.5 140.0 80.7 49.5 104.3 35.3 1 Culture Wars, 2006 2 Partisan Identity, 2007 3 Partisan Identity, 2008 37.7 49.3 76.5 40.1 33.4 ----91.7 ------39.9 35.6 29.7 27.2 96.5 ------54.7 37.3 30.8 20.7 98.3 7.8 16.2 41.2 10.3 7.2 3.7 .74** .78** .83** -.45** -.61** .90** .92** .92** -.35** -.58** .65** 66** .65** -.25** -.35** .56** .75** .58** -.16** -.18** ANES, 2008 Political Participation Attend political meetings, rallies Campaign button, ticker Persuade other voters Candidate donation – 2004/2008a Party donation – 2004/2008b Volunteer for pres. candidatec Volunteer for party/organizationc Vote – 2004, 2008d 70.6 Constraintd PID & Ideology PID & Democratic Vote Choice Ideology & Democratic Vote Church Attendance & Dem. Vote Biblical Orthodox& Dem Vote SPECIALIZED POPULATIONS S15.3 European MSM Internet Survey (EMIS): differences in sexually transmissible infection testing in European countries U Marcus1, et al. Sex Transm Infect 2011;87:A19 doi:10.1136/sextrans-2011050102.64 Methods From June through August 2010, the European MSM Internet Survey (EMIS) mobilised more than 180 000 respondents from 38 European countries to complete an online questionnaire in one of 25 languages. The questionnaire covered sexual happiness, HIV and STI-testing and diagnoses, unmet prevention needs, intervention performance, HIV-related stigma and gay-related discrimination. Recruitment was organised predominantly online, through gay social media, and links and banners on more than 100 websites for MSM all over Europe. REFERENCES Druckman, James N. and Cindy D, Kam. 2011. “Students as Experimental Participants: A Defense of the ‘Narrow Data Base.’” In James N. Druckman, Donald P. Green, James H. Kuklinski, and Arthur Lupia, eds., Handbook of Experimental Political Science. Cassese, Huddy, Hartman, Mason & Weber. 2012. Socially-Mediated Internet Surveys (SMIS): Recruiting Participants for Online Experiments, under review. Don A. Dillman. 2009. Internet, Mail and Mixed Mode Surveys: The Tailored Design Method. 3rd ed. Hoboken, NJ: Wiley. ISBN: 9780471698685 (cloth) Håkan Källmén & Kristina Sinadinovic & Anne H. Berman & Peter Wennberg; NORDIC STUDIES ON ALCOHOL AND DRUGS V O L . 28. 2011 Groves, Robert M. et al. 2009. Survey Methodology. 2nd edition., Hoboken, NJ: John Wiley & Sons. Hecke & Weise, 2012. In Telephone Surveys in Europe, ed. Häider, Häider, & Kϋhne; Springer, Heidelberg. \ REFERENCES Kristina Sinadinovic, Peter Wennberg, Anne H. Berman Drug and Alcohol Dependence, 2011, 114:55-60 Lavine, Johnson, Steenberge. In press. The Ambivalent Partisan. Tourangeau, Roger, Lance Rips and Kenneth Rasinski. 2000. The Psychology of Survey Response. New York: Cambridge University Press. ISBN: 0521576296. Huddy, Leonie and Anna Gunthorsdottir. 2000. The Persuasive Effects of Emotive Visual Imagery: Superficial Manipulation or A Deepening of Conviction? Political Psychology. 21:745-778.