Sampling Issues

Report
SAMPLING
ISSUES:
PART I
LEONIE HUDDY, STONY BROOK UNIVERSITY
[email protected]
MATTHEW BAUM
HARVARD UNIVERSITY
OUTLINE
I.
Major Sources of Survey Error
II. Coverage Error
1. Coverage Problems in US (Phone & Web)
2. Coverage Issues in Sweden
3. Implications for Experiments
III. Non-Response Error
1. Rates in the US
2. Factors Influencing Response Rates
3. Response rates in Sweden
IV. Survey Mode Errors
I. MAJOR SOURCES OF SURVEY
ERROR (ALWIN/GROVES)
1. Coverage error: Error due to failure to include some
elements of the population in the sampling frame (e.g, cell
phones in RDD landline study in the US, non-computer
households in a web survey)
2. Sampling error: Errors due to sampling a subset rather
than the entire population.
3. Non-response error: Error due to failure to obtain data
from all selected population elements (young males harder to
reach; Latinos reluctant)
4. Measurement error: Error that occurs when observed value
is different from the true value (higher reports of voter
turnout in ANES)
These errors also apply to survey experiments
II. COVERAGE ERROR (GROVES)
DEFINITIONS
Sampling frame: set of lists or procedures intended to identify all
elements of the target population; e.g., RDD, national registry
(SPAR), US Postal Mail Delivery System
Coverage:
Undercoverage - some population elements are missing from the
sample frame (e.g., cell phone users who are disproportionately
young and less affluent in an RDD landline study; older
respondents who lack computers or broadband in a web survey)
Ineligible units (non-working phone #s)
Clustering of elements at a single frame element (several people
with 1 phone number)
Duplication: single target element linked to multiple frame units (a
person listed more than once in a national registry)
2. NON-COVERAGE PROBLEMS
IN TELEPHONE SAMPLES
NON-LANDLINE HOUSEHOLDS
• younger, more mobile, less affluent
• more ethnic and racial minorities
• live in rural areas, south, central cities
• Reaching 25% of the US population
Solution?
• Base sample on a mix of cell phone-only and
landline households and eliminate those with
landlines from the cell phone sample
• Post-stratification weights based on
demographic factors
THE GROWING CELL-ONLY POPULATION BY AGE (PEW
2010)
AGE COMPOSITION OF LANDLINE
PHONE SAMPLES (PEW, 2010)
3. Non-Coverage on Web:
% Broadband at Home (Pew)
All adult Americans
Gender
Male
Female
Age
18-29
30-49
50-64
65+
Race/Ethnicity
White (not Hispanic)
Black (not Hispanic)
Education
Less than high school
High school grad
Some college
College +
Income
Under $30K
$30K-50K
$50K-$75K
Over $75K
2005
2006
2007
2009
30%
42%
47%
60%
31
27
45
38
50
44
61
58
38
36
27
8
55
50
38
13
63
59
40
15
76
67
56
26
31
14
42
31
48
40
63
52
10
20
35
47
17
31
47
62
21
34
58
70
24
46
73
83
15
27
35
57
21
43
48
68
30
46
58
76
42
62
73
83
4. NON-COVERAGE
ISSUES IN SWEDEN
Telephone & Mail
•
Sweden has a low-rate of cell-only households; ≤ 5% (Hecke & Weise
2012; in Telephone telephone surveys in Europe, ed. Häider, Häider, &
Kϋhne; Springer, Heidelberg ) Non-coveraqe is far less of a problem in
Sweden because samples are drawn from the national SPAR registry
•
No sample frame is ever 100% so there may still be minor non-coverage
issues
Internet
•
“Sweden has a unique position in the world when it comes to Internet
use, not only because it is one of the countries with highest share of
Internet users in the world but also because Internet use is more widely
spread in Swedish society compared to other countries, in terms of age
and educational level (Findahl 2007; 2008b). Among younger Swedes 16–
25 years old almost all (97%) use the Internet at least once a month;
among older Swedes 56–65 years old Internet use is currently as high as
75%. The corresponding figure among individuals 66–75 years old is
lower, however, at 51% (Findahl 2008a).” (quoted in Kallmen et al )
SPAR – NATIONAL SWEDISH
POPULATION REGISTRY
Statens personadressregister, SPAR includes all persons who are
registered as resident in Sweden.
•
The data in SPAR is updated each day with data from the Swedish
Population Register.
•
SPAR is specifically regulated in Swedish Law by the Act of
(1998:527) statens personadressregister and by the Regulation
(1998:1234) of statens personadressregister and the Swedish Tax
Agency Regulation on handing out data from SPAR (SKVFS
2011:06).
The aim of SPAR is clear from the purposes set out in article 3 of the
Act. It states that personal data in SPAR may be processed to:
•
update, supplement and verify personal information or
•
select names and addresses for direct marketing, public service
announcements or other comparable activities.
Processing data in this respect is the same as handing out the data
electronically. Data in SPAR are, after decision by the Swedish Tax
Agency, electronically handed out at cost price.
III. NON-RESPONSE
ERROR
Two key types of non-response:
• Non-contact: the failure to reach the chosen respondent
• Refusal: chosen respondent does not cooperate
Rates have declined precipitously in the US over the last 2
decades;
• Non-contact rates by telephone dropped dramatically after
2000 and the introduction of caller ID
• Refusal rates are higher in urban areas
RESPONSE RATES IN THE U.S.
Response Rate = Number of people who completed an
interview/total number of eligible respondents
contacted (including not at home, refused, etc.)
• Household CAPI or IN-PERSON surveys: in the U.S. these are
around 50-60% in university research centers.
• Telephone surveys: In the US, 40-50% at university centers
using very stringent and expensive methods; lower for typical
phone surveys at university centers (25-35%) much lower for
marketing and media (6-20%)
• Mail surveys: very variable; possible to get 15-20% RR with
follow up; but depends on the population.
• Web Surveys: Depends on the population. Could be as high as
50-70% within an organization with a known email list and
organizational support, or <1% with a random group (e.g.,
banner ad recruitment).
WHAT INFLUENCES RESPONSE RATES
(RR)?
1. SURVEY MODE : highest for household in-person interview, generally lowest
on the web.
2. RESPONDENT SUBGROUP. Non-response is often higher in cities; can also
vary with age (young are harder to contact), and gender (men are harder to
contact).
3. TYPE OF SURVEY ORGANIZATION: academic polls vs commercial. RR
Typically higher when conducted by an academic or non-profit organization.
4. UNIT OF INTERVIEW: Higher RR if anyone in the home or a surrogate can be
interviewed.
•
US National Health Interview Survey (NHIS): non-response rate is xx%
•
Sweden Census (SCB) contacts relatives of respondents to increase RR (from
Jacob Sohlberg)
5. EFFORT TO REACH NON-RESPONDDENTS : Greater number of contact
attempts, use of financial incentives, refusal conversion, longer interviewing period,
all increase costs.
6. SURVEY TOPIC AND RESPONDENT INTEREST : Slightly higher RR on topical
surveys in the news or those in which respondent is very involved and interested.
WHO IS MISSING?—US
NON-RESPONDENTS
1. Age: Underestimate the young.
• Largely due to non-contact
2. Gender: Under represent men
• More difficult to contact and refuse more
3. Race/Ethnicity:
• Oversample blacks by phone
• Undersample blacks in household in-person
Typical Solution: Weight respondents to demographic
population benchmarks
NON-RESPONSE
RATES IN SWEDEN
• Varies by sample mode (mail, phone, web, IVR)
• RR Remains high in mail surveys
RESPONSE RATES WEB
VS. MAIL: SWEDEN
KÄLLMÉN ET AL 2011
Two random samples of 1250 individuals each were drawn from the
same national register (DAFA-SPAR) over all Swedish individuals (aged
17-71) having a registered address.
1.
Electronic, web-based response group, received a postcard with
the same introductory text, an URLlink and a log-in code to the
electronic version of the questionnaire
2.
Paper-and-pen response group, two reminders were sent, three and
six weeks after the main mailing.
After the first mailing, 314 individuals (25%) responded to the AUDIT
paper version and 167 (13%) responded to the web-based version.
Following the first reminder, the total number of responses was 483
(39%) in the paper group and 230 (18%) in the web-based group.
After the second reminder the final number of responses for the paper
version was 663 (53.6%), 276 men and 344 women (43 did not disclose
their gender). For the web-based version of the AUDIT, the final number
of responses was 324 (26.2%), 140 men and 184 women.
WEB VS. INTERACTIVE VOICE RESPONSE (IVR)
SINADINOVIC ET AL, 2011
Initial sampling pool
Incorrect addresses
Final sampling pool
Cumulative n
Response no reminder
After 1 reminder
After 2 reminders
Response rate
Web
IVR
Respondent Total
choice
Internet/IVR
2 000
24
1 976
2 000
40
1 960
1 000
14
986
5 000
78
4 922
393
639
753
38.1%
312
557
665
33.9%
226
380
443
44.9%
931
1 576
1 861
37.8%
5. SO WHAT?
IMPLICATIONS FOR
SURVEY EXPERIMENTS
Major problem with coverage and non-response errors is
sample bias – overly educated, too sophisticated, older, etc.
Does this matter when running an experiment with random
assignment?
It depends on :
1. Heterogeneous experimental treatment effect
2. Well-theorized and well-measured sources of
experimental treatment heterogeneity (an issue to which
we will return when discussing measurement issues)
The following slides cover 2 examples concerning
heterogeneous experimental treatment effects that depend
on level of political sophistication (involvement or
partisanship).
FROM DRUCKMAN AND KAM, 2011
The external validity of a single experimental study must be assessed in light of an
entire research agenda, and in light of the goal of the study (e.g., testing a theory
or searching for facts).
Assessment of external validity involves multiple-dimensions including the
sample, context,
time, and conceptual operationalization. There is no reason per se to prioritize the
sample as the source of an inferential problem.
The nature of the sample—and the use of students—matters in certain cases.
However, a necessary condition is: a heterogeneous (or moderated) treatment
effect. Then the impact depends on:
o If the heterogeneous effect is theorized, the sample only matters if there is
virtually no variance on the moderator.
The range of heterogeneous, non-theorized cases may be much smaller than
often thought. Indeed, when it comes to a host of politically relevant variables,
student samples do not significantly differ from non-student samples.
There are cases where student samples are desirable since they facilitate causal
tests or make for more challenging assessments.
SOURCE OF SURVEY ERROR
(ALWIN)
Non-observed (bias)
Non-observed (variance)
-coverage bias
-coverage error variance
-sampling bias
-sampling error variance
-nonresponse bias
-nonresponse error variance
Observed (bias)
Observed (variance)
-interviewer bias
-interviewer error variance
-respondent bias
-respondent error variance
-instrument bias
-instrument error variance
-mode bias
-mode error variance
EFFECTS OF SAMPLE BIAS (COVERAGE AND / OR
NON-RESPONSE): UNPREDICTABLE EFFECTS IN
EXPERIMENTS
Bias can either enhance, dampen, or have no effect on the
experimental outcome
Example 1: From The Ambivalent Partisan (Lavine et al)
Most sophisticated LEAST affected by ideology in presence
of a partisan cue
In the following example, researchers are interested in
whether partisan labels would override ideological content in
support of a policy.
The answer varies with the mix of ambivalent vs. towards
strong, univalent partisans in the partisan. Bias in the sample
towards strong partisans would lead to stronger overall
effects of a partisan cue.
KNOWLEDGE NETWORKS POLICY STUDY; THE
AMBIVALENT PARTISAN (LAVINE, JOHNSON,
STEENBERGEN IN PRESS)
Policy Only Condition: Congress has recently debated two policy measures dealing with
benefits to social welfare recipients.
The first policy, POLICY 1, calls for $1000 per month for a family of one child, with an
additional $200 dollars for each additional child. These benefits are intended to last 7 years.
Recipients would also receive $2,000 a year in food stamps and extra subsidies for housing
and child care. (Generous)
The second policy, POLICY 2, calls for $400 per month for a family of one child, with an
additional $50 dollars for one additional child. These benefits are intended to last for 3 years.
Recipients would also receive $500 a year in food stamps but no extra subsidies for housing or
child care. (Less Generous)
Policy + Cue Condition: Democrats and Republicans in Congress have recently debated two
policy measures dealing with benefits to social welfare recipients.
The first policy, POLICY 1, proposed by Republicans, calls for $1000 per month for a family of
one child, with an additional $200 dollars for each additional child. These benefits are intended
to last 7 years. Under this Republican plan, recipients would also receive $2,000 a year in food
stamps and extra subsidies for housing and child care. Generous
The second policy, POLICY 2, proposed by Democrats, calls for $400 per month for a family of
one child, with an additional $50 dollars for one additional child. These benefits are intended to
last for 3 years. Under this Democratic plan, recipients would also receive $500 a year in food
stamps but no extra subsidies for housing or child care. Less Generous
PREDICTED MARGINAL EFFECT OF LIBERAL VS.
CONSERVATIVE POLITICAL ORIENTATION ON PREFERENCE
FOR THE MORE GENEROUS POLICY PROPOSAL:
KNOWLEDGE NETWORKS PANEL
0.1
0.0
−0.1
Marginal Effect
0.2
Policy Only
Policy + Cue
Univalent
Ambivalent
EFFECTS OF SAMPLE
BIAS: EXAMPLE 2
Policy Support and Emotive Visual Imagery (Huddy &
Gunthosdottir, 2000)
Highly involved MOST affected by visual cue
In this example, the goal was to understand the impact on
policy support of a positive or negative image of an animal
that would be saved by an environmental policy
The effects varied with one’s position on environmental
issues and so the findings would be stronger in a sample
with a bias towards pro-environment views
STIMULUS
MATERIALS
The design of this study is a 2 (pro or anti-environment
message) times 5 (no animal, cute mammal, ugly mammal, cute
insect, ugly insect) between subjects factorial design.
The stimulus material consisted of flyers emulating pro and antienvironment fundraising letters. All flyers, whether pro or antienvironment, were about the same fictitious environmental
dilemma, in which mining would assist an impoverished
population living in the Guatemalan rainforest but would destroy
the habitat of a geographically restricted animal.
The pro-environment flyer argued for the protection of the
animal; the anti-environment flyer argued that human needs
outweigh environmental concerns.
Both the name of the fictitious animal, Guatemalan Cobyx, and
the fictitious organization, Club Berneaud International (CBI),
were held constant
Figure 1
Emotive Visual Imagery: Cute and Ugly Animals
in the Pro and Anti-Environment Flyer
Cute Insect
Cute Mammal
Ugly Insect
Ugly Mammal
Predicted Levels of Action for a Pro-Environment
Organization Among Strongest Environment Supporters
9
8
7
6
5
High Involvement
Low Involvement
4
3
2
1
0
No Picture
Monkey
Butterfly
Bat
Bug
Emotive Image
Note: Predicted levels of action calculated at a value of .25 on the pro-environment
scale.
IV. SURVEY MODE ERRORS: NONRESPONSE, NON-COVERAGE, AND
MEASUREMENT ERROR
1. Survey Mode Errors Can Conflate Several
Sources of Error
In practice mode effects can reflect a different sample
population, non-coverage, and non-response errors.
Can eliminate population differences by randomly assigning
respondents to mode from within the same population (e.g.,
SPAR)
Still get large differences in response rate by mode in
Sweden; e.g., Kallmen et al.
DIFFERENCES DUE TO RESPONSE RATE & SURVEY MODE:
AUDIT SCORES TO IDENTIFY PROBLEM DRINKING, > 8 FOR
MEN; > 6 FOR WOMEN (KÄLLMÉN ET AL 2011)
Gender
Response n
Mode
Meanaudit
score
Std. Dev.
Size of
difference
Men
Electronic
140
5.80
4.77
.25
Paper
239
4.73
4.20
Electronic
184
4.12
4.29
Paper
294
3.39
2.59
Women
.21
2. MODE & MEASUREMENT ERROR
Origins of measurement differences by mode
(1)Interviewers affect responses (e.g., telephone vs. web),
Get decreased reporting of undesirable attitudes and behavior in personal interiviews
(2) Comprehension affected by aural (phone) vs. visual (web) mode
• Get visual layout effects, primacy, recency
• Typically get a primacy effect on paper, recency on phone
• More positive responses to scales on phone (when do not see the scale)
(3) Ask different types of questions in different modes.
• On the web use different kinds of responses for multiple vs. single responses (not
comparable to phone) e.g., checklists and grids
• Show cards in personal interviews
• Can include longer lists of response options in person, mail, or web
MODE BIAS ALTERS LINK BETWEEN GENDER & # OF
SEXUAL PARTNERS, TOURANGEAU ET AL 2000
3. MOVE TO MIXED MODE SURVEY DESIGNS
(DILLMAN)
Benefits of Mixed Mode Deisgns:
• Lower Cost; Start with least expensive method
• Improve Timeliness
• In 2003 NSF earned degrees survey, asked which mode best and used it in 2006.
Improved response time.
• Reduce Coverage Error;
• Access to different kinds of people
• Easier to Provide Incentives in some Modes
• By mail in an initial mailing
• Improve RR and Reduce Non-response Error
• Do it in sequence
• Reduce Measurement Error on sensitive questions
But creates numerous complications for survey experiments
SPECIALIZED POPULATIONS
ON THE WEB
• On occasion, may need to seek out special populations
which are readily accessible on the web.
Mediator and Participant Recruitment Details, SMIS Studies
1
Culture
Wars
(2006)
2
Partisan
Identity
(2007)
3
Partisan
Identity
(2008)
4
5
Campaign Political Blog
Ads
Metaphors Average
(2007)
(2007)
6
Political
Metaphors
(2008)
Data Collection
Dates
6/6-7/31,
2006
5/16-6/4,
2007
3/17-5/2,
2008
3/10-5/5,
2007
6/23-7/15,
2007
--
4/15-5/13,
2008
Mediator Type
Blogs/
Forums
Blogs/
Forums
Blogs/
Forums
Blogs/
Forums
Blogs/
Forums1
Blogs/
Forums
RAs3
Mediators
Contacted
100
100
178
198
50
125.5
4
Mediators
Participated
24
4
23
18
6
15
4
Mediator
Response Rate
24%
4%
13%
9%
12%
12.4%
100%
Participants (N)
2248
630
3219
1452
2972
1569.2
141
Yield: Particip. /
# Mediators
93.7
157.5
140.0
80.7
49.5
104.3
35.3
1
Culture Wars,
2006
2
Partisan
Identity,
2007
3
Partisan
Identity,
2008
37.7
49.3
76.5
40.1
33.4
----91.7
------39.9
35.6
29.7
27.2
96.5
------54.7
37.3
30.8
20.7
98.3
7.8
16.2
41.2
10.3
7.2
3.7
.74**
.78**
.83**
-.45**
-.61**
.90**
.92**
.92**
-.35**
-.58**
.65**
66**
.65**
-.25**
-.35**
.56**
.75**
.58**
-.16**
-.18**
ANES,
2008
Political Participation
Attend political meetings, rallies
Campaign button, ticker
Persuade other voters
Candidate donation – 2004/2008a
Party donation – 2004/2008b
Volunteer for pres. candidatec
Volunteer for party/organizationc
Vote – 2004, 2008d
70.6
Constraintd
PID & Ideology
PID & Democratic Vote Choice
Ideology & Democratic Vote
Church Attendance & Dem. Vote
Biblical Orthodox& Dem Vote
SPECIALIZED
POPULATIONS
S15.3 European MSM Internet Survey (EMIS): differences in
sexually transmissible infection testing in European countries
U Marcus1, et al.
Sex Transm Infect 2011;87:A19 doi:10.1136/sextrans-2011050102.64
Methods From June through August 2010, the European MSM
Internet Survey (EMIS) mobilised more than 180 000
respondents from 38 European countries to complete an online
questionnaire in one of 25 languages. The questionnaire
covered sexual happiness, HIV and STI-testing and diagnoses,
unmet prevention needs, intervention performance, HIV-related
stigma and gay-related discrimination. Recruitment was
organised predominantly online, through gay social media, and
links and banners on more than 100 websites for MSM all over
Europe.
REFERENCES
Druckman, James N. and Cindy D, Kam. 2011. “Students as
Experimental Participants: A Defense of the ‘Narrow Data Base.’” In
James N. Druckman, Donald P. Green, James H. Kuklinski, and Arthur
Lupia, eds., Handbook of Experimental Political Science.
Cassese, Huddy, Hartman, Mason & Weber. 2012. Socially-Mediated
Internet Surveys (SMIS): Recruiting Participants for Online Experiments,
under review.
Don A. Dillman. 2009. Internet, Mail and Mixed Mode Surveys: The
Tailored Design Method. 3rd ed. Hoboken, NJ: Wiley. ISBN:
9780471698685 (cloth)
Håkan Källmén & Kristina Sinadinovic & Anne H. Berman & Peter
Wennberg; NORDIC STUDIES ON ALCOHOL AND DRUGS V O L . 28.
2011
Groves, Robert M. et al. 2009. Survey Methodology. 2nd edition.,
Hoboken, NJ: John Wiley & Sons.
Hecke & Weise, 2012. In Telephone Surveys in Europe, ed. Häider,
Häider, & Kϋhne; Springer, Heidelberg. \
REFERENCES
Kristina Sinadinovic, Peter Wennberg, Anne H. Berman
Drug and Alcohol Dependence, 2011, 114:55-60
Lavine, Johnson, Steenberge. In press. The Ambivalent
Partisan.
Tourangeau, Roger, Lance Rips and Kenneth Rasinski. 2000.
The Psychology of Survey Response. New York: Cambridge
University Press. ISBN: 0521576296.
Huddy, Leonie and Anna Gunthorsdottir. 2000. The
Persuasive Effects of Emotive Visual Imagery: Superficial
Manipulation or A Deepening of Conviction? Political
Psychology. 21:745-778.

similar documents