Report

SPSS Workshop Research Support Center Chongming Yang Causal Inference • If A, then B, under condition C • If A, 95% Probability B, under condition C Student T Test (William S. Gossett’s pen name = student) • Assumptions – Small Sample – Normally Distributed • t distributions: t = [ x - μ ] / [ s / sqrt( n ) ] df = degrees of freedom=number of independent observations Type of T Tests • One sample – test against a specific (population) mean • Two independent samples – compare means of two independent samples that represent two populations • Paired – compare means of repeated samples One Sample T Test • Conceputally convert sample mean to t score and examine if t falls within acceptable region of distribution x u t s n Two Independent Samples t x1 x2 (n1 1)s (n2 1)s 1 1 ( ) n1 n2 2 n1 n2 2 1 2 2 Paired Observation Samples • d = difference value between first and second observations t d Sd n Multiple Group Issues • Groups A B C comparisons – AB AC BC – .95 .95 .95 • Joint Probability that one differs from another – .95*.95*.95 = .91 Analysis of Variance (ANOVA) • Completely randomized groups • Compare group variances to infer group mean difference • Sources of Total Variance – Within Groups – Between Groups SSB df1 F • F distribution SSW – SSB = between groups sum squares df 2 – SSW = within groups sum squares Fisher-Snedecor Distribution F Test • Null hypothesis: 1 = 2 = 3 . . . = • Given df1 and df2, and F value, • Determine if corresponding probability is within acceptable distribution region Issues of ANOVA • Indicates some group difference • Does not reveal which two groups differ • Needs other tests to identify specific group difference – Hypothetical comparisons Contrast – No Hypothetical comparisons Post Hoc • ANOVA has been replaced by multiple regressions, which can also be replaced by General Linear Modeling (GLM) Multiple Linear Regression • Causes cab be continuous or categorical • Effect is continuous measure y 0 1x1 2 x2 3 x3...k xk • Mild causal terms predictors • Objective identify important Assumptions of Linear Regression • • • • Y and X have linear relations Y is continuous or interval & unbounded expected or mean of = 0 = normally distributed not correlated with predictors • Predictors should not be highly correlated • No measurement error in all variables Least Squares Solution • Choose 0 , 1 , 2 , 3 , . . . to minimize the sum of square of difference between observed and model estimated/predicted ˆ ( y y ) i i 2 • Through solving many equations Explained Variance in (yi ) 2 y ( yi yˆi ) 2 n R 2 2 (yi ) yi n 2 2 i Standard Error of ( yi yiˆ ) 1 SE 2 2 n k 1 ( xi xi ) (1 R ) 2 T Test significant of • t = / SE • If t > a critical value & p <.05 • Then is significantly different from zero Confidence Intervals of Standardized Coefficient () • Make s comparable among variables on the same scale (standardized scores) stdx eta stdy Interpretation of • If x increases one unit, y increases unit, given other values of X Model Comparisons • Complete Model: y 0 1 x1 2 x2 3 x3 ...k xk • Reduced Model: y 0 1 x1 2 x2 ... g xg • Test F = Msdrop / MSE – MS = mean square – MSE = mean square error Variable Selection • Select significant from a pool of predictors • Stepwise undesirable, see http://en.wikipedia.org/wiki/Stepwise_regression • Forward • Backward (preferable) Dummy-coding of Nominal • R = Race(1=white, 2=Black, 3=Hispanic, 4=Others) R 1 1 2 2 3 3 4 4 d1 d2 1 0 1 0 0 1 0 1 0 0 0 0 0 0 0 0 d3 0 0 0 0 1 1 0 0 • Include all dummy variables in the model, even if not every one is significant. Interaction y 0 1x1 2 x2 3 x3 4 x2 x3...k xk • Create a product term X2X3 • Include X2 and X3 even effects are not significant • Interpret interaction effect: X2 effect depends on the level of X3. Plotting Interaction • Write out model with main and interaction effects, • Use standardized coefficient • Plug in some plausible numbers of interacting variables and calculate y • Use one X for X dimension and Y value for the Y dimension • See examples http://frank.itlab.us/datamodel/node104.html Diagnostic • Linear relation of predicted and observed (plotting • Collinearity • Outliers • Normality of residuals (save residual as new variable) Repeated Measures (MANOVA, GLM) • • • • • Measure(s) repeated over time Change in individual cases (within)? Group differences (between, categorical x)? Covariates effects (continuous x)? Interaction between within and between variables? Assumptions • Normality • Sphericity: Variances are equal across groups so that • Total sum of squares can be partitioned more precisely into – Within subjects – Between subjects – Error Model yij i j ij ij • = grand mean • = constant of individual i • = constant of jth treatment • = error of i under treatment j • = interaction F Test of Effects • F = MSbetween / Mswithin (simple repeated) • F = Mstreatment / Mserror (with treatment) • F = Mswithin / Msinteraction (with interaction) Four Types Sum-Squares • • • • Type I balanced design Type II adjusting for other effects Type III no empty cell unbalanced design Type VI empty cells Exercise • http://www.ats.ucla.edu/stat/spss/seminars/R epeated_Measures/default.htm • Copy data to spss syntax window, select and run • Run Repeated measures GLM