Report

Generalized Linear Model (GZLM): Overview Dependent Variables Continuous Discrete Dichotomous Polychotomous Ordinal Count Continuous Variables Quantitative variables that can take on any value within the limits of the variable Continuous Variables (cont’d) Distance, time, or length Infinite number of possible divisions between any two values, at least theoretically “Only love can be divided endlessly and still not diminish” (Anne Morrow Lindbergh) More than 11 ordered values Scores on standardized scales such as those that measure parenting attitudes, depression, family functioning, and children’s behavioral problems Discrete Variables Finite number of indivisible values; cannot take on all possible values within the limits of the variable Dichotomous Polytomous Ordinal Count Dichotomous Variables Two categories used to indicate whether an event has occurred or some characteristic is present Sometimes called binary or binomial variables “To be or not to be, that is the question..” (William Shakespeare, “Hamlet”) Dichotomous DVs Placed in foster care or not Diagnosed with a disease or not Abused or not Pregnant or not Service provided or not Polytomous Variables Three or more unordered categories Categories mutually exclusive and exhaustive Sometimes called multicategorical or sometimes multinomial variables “Inanimate objects can be classified scientifically into three major categories; those that don't work, those that break down and those that get lost” (Russell Baker) Polytomous DVs Reason for leaving welfare: marriage, stable employment, move to another state, incarceration, or death Status of foster home application: licensed to foster, discontinued application process prior to licensure, or rejected for licensure Changes in living arrangements of the elderly: newly co-residing with their children, no longer co-residing, or residing in institutions Ordinal Variables Three or more ordered categories Sometimes called ordered categorical variables or ordered polytomous variables “Good, better, best; never let it rest till your good is better and your better is best” (Anonymous) Ordinal DVs Job satisfaction: very dissatisfied, somewhat dissatisfied, neutral, somewhat satisfied, or very satisfied Severity of child abuse injury: none, mild, moderate, or severe Willingness to foster children with emotional or behavioral problems: least acceptable, willing to discuss, or most acceptable Count Variables Number of times a particular event occurs to each case, usually within a given: Time period (e.g., number of hospital visits per year) Population size (e.g., number of registered sex offenders per 100,000 population), or Geographical area (e.g., number of divorces per county or state) Whole numbers that can range from 0 through + Count Variables (cont’d) “Now I've got heartaches by the number, Troubles by the score, Every day you love me less, Each day I love you more” (Ray Price) Count DVs Number of hospital visits, outpatient visits, services used, divorces, arrests, criminal offenses, symptoms, placements, children fostered, children adopted General Linear Model (GLM) (selected models) Continuous DV Linear Regression ANOVA t-test Generalized Linear Model (GZLM) (selected regression models) GZLM Continuous DV Dichotomous DV Polytomous DV Ordinal DV Count DV Linear Regression Binary Logistic Regression Multinomial Logistic Regression Ordinal Logistic Regression Poisson or Negative Binomial Regression Generalized How? DV continuous or discrete Normal or non-normal error distributions Constant or non-constant variance Provides a unifying framework for analyzing an entire class of regression models GLM & GZLM Similarities IVs are combined in a linear fashion (α + 1X1 + 2X2 + … kXk ; a slope is estimated for each IV; each slope has an accompanying test of statistical significance and confidence interval; each slope indicates the IV’s independent contribution to the explanation or prediction of the DV; GLM & GZLM Similarities (cont’d) the sign of each slope indicates the direction of the relationship IVs can be any level of measurement; the same methods are used for coding categorical IVs (e.g., dummy coding); IVs can be entered simultaneously, sequentially or using other methods; product terms can be used to test interactions; GLM & GZLM Similarities (cont’d) powered terms (e.g., the square of an IV) can be used to test curvilinearity; overall model fit can be tested, as can incremental improvement in a model brought about by the addition or deletion of IVs (nested models); and residuals, leverage values, Cook’s D, and other indices are used to diagnose model problems. Common Assumptions Correct model specification Variables measured without error Independent errors No perfect multicollinearity Correct Model Specification Have you included relevant IVs? Have you excluded irrelevant IVs? Do the IVs that you have included have linear or non-linear relationships with your DV (or some function of your DV, as discussed below)? Are one or more of your IVs moderated by other IVs (i.e., are there interaction effects)? Variables Measured without Error Limitation of regression models, given that most often our variables contain some measurement error Independent Errors Can be result of study design, e.g.: – Clustered data, which occurs when data are collected from groups – Temporally linked data, which occurs when data are collected repeatedly over time from the same people or groups Can lead to incorrect significance tests and confidence intervals Independent Errors (cont’d) Examples of when this might not be true Effect of parenting practices on behavioral problems of children and reports of parenting practices and behavioral problems collected from both parents in two-parent families Effect of parenting practices on behavioral problems of children and information collected about behavioral problems for two or more children per family Effects of leader behaviors on group cohesion in small groups, and information collected about leader behaviors and group cohesion from all members of each group No Perfect Multicollinearity Perfect multicollinearity exists when an IV is predicted perfectly by a linear combination of the remaining IVs Typically quantified by “tolerance” or “variance inflation factor” (VIF) (1/tolerance) Even high levels of multicollinearity may pose problems (e.g., tolerance < .20 or especially < .10) Estimating Parameters (e.g., ) GLM Ordinary Least Squares (OLS) estimation • Estimates minimize sum of the squared differences between observed and estimated values of the DV http://www.ruf.rice.edu/~lane/stat_sim/reg_by_eye/index.html GZLM Maximum Likelihood (ML) estimation • Estimates have greatest likelihood (i.e., the maximum likelihood) of generating observed sample data if model assumptions are true Testing Hypotheses Overall and nested models (1 = 2 = k = 0) GLM • F GZLM • Likelihood ratio 2 Individual slopes ( = 0) GLM • t GZLM • Wald 2 or likelihood ratio 2 Estimating DV with GLM Three ways of expressing the same thing… = α + 1X1 + 2X2 + … kXk = • Assumed linear relationship = Greek letter mu Estimated mean value of DV = Greek letter eta Linear predictor Estimating DV with Poisson Regresion ln() = α + 1X1 + 2X2 + … kXk ln() = Assumed linear relationship Single (Quantitative) IV Example DV = number of foster children adopted IV = Perceived responsibility for parenting (scale scores transformed to z-scores) N = 285 foster mothers Do foster mothers who feel a greater responsibility to parent foster children adopt more foster children? Poisson Model ln() = α + X log of estimated mean count .018 + (.185)(X) Log of mean number of children adopted Does not have intuitive or substantive meaning Mathematical Functions Function √4 = 2 Inverse (reverse) function 22 = 4 Mathematical Functions (cont’d) Function ln(), natural logarithm of “Link function” Inverse (reverse) function exp(), exponential of • ex on calculator • exp(x) in SPSS and Excel “Inverse link function” Link Function ln(), log of estimated mean count Connects (i.e., links) mean value of DV to linear combination of IVs Transforms relationship between and so relationship is linear Different GZLM models use different links Does not have intuitive or substantive meaning Inverse (Reverse) Link Function Three ways of expressing the same thing… = exp(α + 1X1 + 2X2 + … kXk) = exp() = e represent values of the DV with intuitive and substantive meaning e.g., mean number of children adopted Estimated Mean DV .018 + (.185)(X) X=0 X=1 .018 + (.185)(0) = .018 e.018 = 1.018 M = 1.02 children adopted .018 + (.185)(1) = .203 e.203 = 1.225 M = 1.23 children adopted Examples of Exponentiation e0 = 1.00 e.50 = 1.65 e1.00 = 2.72 Problem For discrete DVs the relationship between the DV () and the linear predictor () is non-linear = α + 1X1 + 2X2 + … kXk = • Non-linear One-unit increase in an IV may be associated with a different amount of change in the mean DV, depending on the initial value of the IV Mean Number of Children Example Non-linear Relationship 2.00 1.50 1.00 0.50 0.00 Mean Number of Children -3 -2 -1 0 1 2 3 0.58 0.70 0.85 1.02 1.23 1.47 1.77 Standardized Parenting Responsibility Solution Linear relationship between a linear combination of one or more IVs and some function of the DV ln(Mean Number of Children) Example Linear Relationship 0.80 0.60 0.40 0.20 0.00 -0.20 -0.40 -0.60 ln(Mean Number of Children) -3 -2 -1 0 1 2 3 -0.54 -0.35 -0.17 0.02 0.20 0.39 0.57 Standardized Parenting Responsibility