Chap 9 Analyzing Bias and Assuring Fairness

Report
Chapter 9 Analyzing Bias and Assuring
Fairness p206
• Unfair Discrimination
• Item & Test Bias
• Test-Score Banding
Chapater 9 Analyzing Bias and Assuring
Fairness
1
– Bias defined
• “Systematic group differences in item responses, test scores,
or other assessments for reasons unrelated to the trait.”
– Cultural bias defined
• “ if an acceptable response depends on skills or information
common in one culture but not in the other.”
– Discrimination defined
• “Making distinctions”
– – not same as unfair discrimination
• Define “unfair” discrimination
• What’s the differences between the two –give an example
Chapater 9 Analyzing Bias and Assuring
Fairness
2
DISCRIMINATION
• Discrimination Based on Group Membership
– Protected groups
•
•
•
•
•
•
Race
Color
Religion
Gender
Nat’l origin
LGBT?
Chapater 9 Analyzing Bias and Assuring
Fairness
3
Distributional Differences
Group Mean Differences (Give an example for each below)
1. Two groups are biased samples (from respective
populations)
E.g. extensive uncritical recruiting for lower scoring group
Would not be biased (why not?)
2. Two groups are representative (not biased if actually differ
on the trait)
3. Test items require experiences not common to lower
scoring group (not biased if experiences required)
4. Test administration conditions differ for the two groups
Chapater 9 Analyzing Bias and Assuring
Fairness
4
Racial Differences in IQ
• Few believe there are no race differences
– Means for:
• East Asians 105
• Europeans (Whites) 100
• Blacks 85
– Cohen effect size
• Hispanics .6 to .8 SD < Whites
• Blacks
1 SD
<Whites
• Many argue about the causes
• Predictability of IQ for is comparable for blacks
and whites
Chapater 9 Analyzing Bias and Assuring
Fairness
5
Race Differences in IQ (Furnham ’08, p 207)
• Three plausible explanations
1. Evidence of biological & genetic differences
between races
2. Evidence of sociocultural, economic & political
forces for differences
-distinct from racial characteristics
-But confounded with them
3. Differences are only artifacts of test design,
administration, or measurement
-no real differences
Chapater 9 Analyzing Bias and Assuring
Fairness
6
Black-White Racial Differences in IQ
• Greater variation within groups than between
– 16% Blacks score above the White mean
– For a cutoff of 70 score for special education
• There will be 1 White for every 7 Blacks
– Black/White differences are constant over time
and life span
– Differences are present prior to school entry
– Differences are not constant for diff types of
measures of intelligence
Chapater 9 Analyzing Bias and Assuring
Fairness
7
Black & White Differences in IQ
(implications for workforce) Gottfredson (2002)
• 22% Whites & 59% of Blacks have IQ < 90
– Considerably fewer Blacks (proportionately) are
competitive for mid-level jobs:
• fire fighting, skilled trades, many clerical jobs
– Mean IQ is about 100 (1 SD above Whites)
– 80 is the threshold for being competitive in lowest level jobs
» 4 times as many Blacks (30%) cf Whites (7%) fall bellow
that threshold
Chapater 9 Analyzing Bias and Assuring
Fairness
8
Implications for Black / White IQ
Differences
• On the higher end of the distribution (IQ =125)
– Score of 125 = mean for professionals (e.g. lawyers,
physicians, engineers, high-level executives etc.)
• Black / White ratio is only 1:30 at this level
• Conclusion: Disparate impact
• with legal and political tension…
• Is “particularly acute in the most complex, most socially
desirable jobs” (Gottfredson, ’02, p. 41).
Chapater 9 Analyzing Bias and Assuring
Fairness
9
• Differences in Other Distributional Characteristics
(table 9.1 p211)
– Note: group means are different, but variability is
greater
– At lower selection ratios, differences in proportions may
disappear.
• Discrimination as Systematic Measurement Error
– If discrimination error is systematic and more for one
group than the other (e.g. test taking habits)
– can be unfair even if not illegal
Chapater 9 Analyzing Bias and Assuring
Fairness
10
ANALYSIS OF BIAS AND ADVERSE
IMPACT IN TEST USE
• Test bias
• Unwanted sources of variance in scores from different
groups
• Adverse impact
• Social, political or legal term (effects of test use)
Chapater 9 Analyzing Bias and Assuring
Fairness
11
ANALYSIS OF BIAS AND ADVERSE
IMPACT IN TEST USE
• Test Bias as Differential Psychometric Validity
– Bias = “when groups matched on the trait have different
scores because of one or more sources of variances related
to group membership”
1.
It is the “Meaning inferred” from scores may or may not be
biased (Not the test itself)
2. It is group related (not just for a single individual)
3. Groups must be assumed to be equal on the trait
4. Definition emphasizes sources of group variances
(potentially identifiable) (not on group means)
-e.g. “stereotype threat” (Steele & Aronson, ‘95)
Chapater 9 Analyzing Bias and Assuring
Fairness
12
ANALYSIS OF BIAS AND ADVERSE IMPACT
IN TEST USE
• Adverse Impact (legal term, not statistical)
– Mean differences alone do not indicate bias
• How does this “attitude problem” force adversarial roles?
• What’s a better term?
– Adverse impact reasons:
1.
2.
3.
4.
5.
6.
Chance (not due to bias)
Measurement problems
Nature of test use
Differences in distribution sizes
Reliable sub-group approaches to test taking
True population differences in trait (not due to bias)
1. NOTE TABLE 9.2 P 216
• Criterion Bias (criterion must be valid)
Chapater 9 Analyzing Bias and Assuring
Fairness
13
DIFFERENTIAL ITEM FUNCTIONING
(DIF)
• DIF preferred over ‘bias’
– “Simple minded item difficulty statistics”
• You can’t consider the item itself (dependent upon the
trait distribution –thus confounded with it)
– Court cases:
• Golden Rule Insurance Company v. Washburn (‘84)
– Mandated that group item difficulty could not differ by more
than .15!!
• Allen v. Alabama State Board of Education (‘85)
– More restrictive – not more than .05 max difference!!!
Chapater 9 Analyzing Bias and Assuring
Fairness
14
ACTING ON THE FINDINGS
• Corrective Actions (4) Under the Uniform Guidelines – p 218
– Should we maximize the criterion performance or avoid the appearance of
discriminatory practice?
– To ease tensions how should the Ferguson police dept deal with the imbalance in
B &W police officers as it reflects the population’s racial mix?
• Score Adjustments
– Race norming in U.S . Employment Service (GATB)
• Scores of Hispanics, Blacks and Whites were % ile ranks within groups
• What effect did this have ?
– Employment Quotas
• USTES
• Are quotas acceptable in other countries?
Chapater 9 Analyzing Bias and Assuring
Fairness
15
Analysis of Bias (con’t)
• “Ranges of Indifference” in Test Score Bands
– Band Width
• They exist whatever you do…so how to decide?
• Standard error of the difference in scores (sd = sm √ 2 )
• Adjustment in band with should be based on judgments re:
loss of utility
–
–
–
–
Decisions Within Bands
Fixed Bands (don’t slither down)
Sliding Bands (slither down)
Rubber Bands
• What are these used for?
Chapater 9 Analyzing Bias and Assuring
Fairness
16

similar documents