Report

Introduction to Statistics: Political Science (Class 2) Central Limit Theorem, T-statistics, and using split sample analysis and multivariate regression to deal with confounds Today… • A review of what standard errors and Tstatistics tell us • Multivariate regression The goal of statistical analysis? • We want to know: *true* “population” mean or relationship • What we have: sample of the units we are interested in • Thus we estimate the mean or relationship – What is an estimate? Actually we estimate 2 things • Estimate of mean or relationship – We know how to get this (calculate the mean or find the best fit line) • Estimate of uncertainty – Often (typically?): How confident can we be that a mean or relationship is not zero – We can’t measure our uncertainty directly (we’re uncertain – duh!) The Central Limit Theorem • In repeated sampling (if we redrew over and over and over and recalculated)… – the average of the estimates will be centered on the population (“true”) mean – the distribution of estimates will be approximately normal… Like this This width depends on: 1. Variance in population (more wider) 2. Number of cases sampled (more narrower) Number of Samples 9 Coin toss 8 7 6 5 4 3 2 1 0 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% Mean ideology of the American public • How would you rate yourself on the following scale? 1. 2. 3. 4. 5. 6. 7. Very Liberal Liberal Somewhat Liberal Middle of the Road Somewhat Conservative Conservative Very Conservative • If we were omniscient (or could ask every single person) we would know that the true average is 5.0 • but we’re not/we can’t…Instead we call 100 people at random… and then we do that again and again… Estimating Mean Ideology Sample Mean SE LB (Mean-2SEs) UB (Mean+2SEs) 1 4.8 0.167 4.466 5.134 2 5.1 0.176 4.748 5.452 3 5.3 0.19 4.92 5.68 4 4.9 0.18 4.54 5.26 5 4.7 0.168 4.364 5.036 6 5 0.176 4.648 5.352 7 5.1 0.148 4.804 5.396 8 5.2 0.2 4.8 5.6 9 4.7 0.168 4.364 5.036 10 4.9 0.124 4.652 5.148 In any given sample we would be about 95% confident that the true population mean was somewhere within this range Another way to think about this is that 95% of the time, our estimates of the mean will be within about +/- two standard errors of the population value One Standard Error 5.0 Same idea with regression coefficient • If we were able to redraw new samples over and over and re-estimate β… • Typically (always for our purposes here) we’re testing whether a coefficient = 0 So T can be thought of as: how many SEs from 0 that the coefficient is Democracy Scores Constant Coef 0.259 23.21 SE Coef 0.023 0.253 T 11.34 91.82 P 0.000 0.000 0 T = -11.34 T = 11.34 If the true relationship was 0 (no relationship), getting an estimated coefficient with a T-value with an absolute value greater than 11.34 by chance would be extremely unlikely (about 1 in 1,000,000,000,000,000,000,000,000,000,000) So we can be confident rejecting the null hypotheses (What’s the null? Why do we set things up this way?) 1 v. 2-tailed tests 1-tailed: You have strong prior expectations about direction of relationship (if relationship turns out to be in the other direction you can’t reject the null – even w/a large t-statistic) 2-tailed: No strong priors about direction of relationship – more conservative test Causal relationships • Identifying associations is nice, but usually we want to identify causality • Two primary threats – Reverse causation (we’ll table this for now and talk about it in a few weeks) – Confounding variables Need to rule out alternative explanations Bush was particularly unpopular at the end of his presidency… How much did bad feelings about Bush help Obama? ? Feelings about Bush Feelings about Obama Measuring “reverse coattails” effect • …I'll read the name of a person and I'd like you to rate that person using something we call the feeling thermometer. Ratings between 50 degrees and 100 degrees mean that you feel favorable and warm toward the person. Ratings between 0 degrees and 50 degrees mean that you don't feel favorable toward the person and that you don't care too much for that person. You would rate the person at the 50 degree mark if you don't feel particularly warm or cold toward the person. • Bivariate regression Υ = β0 + β1X + u SO… Obama FT = β0 + β1(Bush FT) + u Obama FT = 80.4 + (-0.43*Bush FT) Obama Feeling Thermometer 80 60 40 Bush FT Constant Coef. SE T -.43 .018 -24.12 80.4 .852 94.37 R-squared P-value 0.000 0.000 = 0.203 20 0 100 Bush Feeling Thermometer What else might explain this (strong!) relationship? • Other factors that might affect evaluations of both Obama and Bush? Party Identification? Bush Feeling Thermometer Obama Feeling Thermometer Party Identification Party Identification • Generally speaking, do you usually think of yourself as a Democrat, a Republican, an Independent, or what? -3 = Strong Republican -2 = Weak Republican -1 = Lean Republican 0 = Independent 1 = Lean Democrat 2 = Weak Democrat 3 = Strong Democrat Party Identification FTs Predict Obama Feeling Thermometer Coef. SE T Party Identification 8.71 .234 37.16 Constant 58.1 .507 114.71 P-value 0.000 0.000 Predict Bush Feeling Thermometer Coef. SE T Party Identification -8.19 .259 -31.58 Constant 43.3 .560 77.38 P-value 0.000 0.000 Accounting for a confound by splitting the sample… • Among Democrats: – Mean evaluation of Bush: 24.7 – Mean evaluation of Obama: 79.2 • Among Republicans: – Mean evaluation of Bush: 65.9 – Mean evaluation of Obama: 35.5 • Let’s see what happens when we run separate regressions for Democrats and Republicans… Model with all respondents Obama FT = 80.4 + (-0.43*Bush FT) Obama Feeling Thermometer 80 Democrats Obama FT = 83.6 + (-0.18*Bush FT) 60 Republicans Obama FT = 50.4 + (-0.23*Bush FT) 40 20 0 100 Bush Feeling Thermometer Party ID as Confound Bush Feeling Thermometer (X) Not this part Obama Feeling Thermometer (Y) Party Identification (Z) We only want to give Bush FT explanatory “credit” for this part of the relationship Multivariate Regression Υ = β0 + β1X + β2X + u Obama FT = β0 + β1(Bush FT) + β2(Party Identification) + u (party identification -3=strong Republican; 3=strong Democrat) Multivariate Regression Coef. Bush FT -.165 Party Identification 7.354 Constant 65.28 St.Err T .019 -8.72 .278 26.44 .962 67.89 P 0.000 0.000 0.000 Language: relationship between X1 and Y controlling for X2 (OR holding X2 constant) (more precisely: “controlling for the linear relationship between X2 and Y”) Bivariate Bush regression: FT Bush onlyFT gets gets “credit” “credit” for for this all part of of thisthe overlap overlap Bush Feeling Thermometer No variable gets “credit” for this part, (but it does affect the R-squared) Obama Feeling Thermometer Party Affiliation Party Affiliation only gets “credit” for this part of the overlap Getting predicted values Coef. Bush FT -.165 Party Identification 7.354 Constant 65.28 St.Err T .019 -8.72 .278 26.44 .962 67.89 P 0.000 0.000 0.000 Obama FT = β0 + β1(Bush FT) + β2 (Party Identification) + u Getting predicted values Coef. Bush FT -.165 Party Identification 7.354 Constant 65.28 St.Err T .019 -8.72 .278 26.44 .962 67.89 P 0.000 0.000 0.000 Obama FT = 65.28 + (-.165)(Bush FT) + 7.354(Party Identification) + u What does the coefficient on the constant mean? Expected Value for a Strong Democrat who gave Bush a feeling thermometer rating of 50? Notes and Next Time • No Class on Tuesday • Remember to look at the homework assignment in time to get TA office hour help before it’s due next Thursday! • Next time: – R-squared – Non-continuous explanatory variables – Joint significance of variables (F-tests)