An Introduction to SEM - East Carolina University

Structural Equation Modeling
Karl L. Wuensch
Dept of Psychology
East Carolina University
– “causal modeling.”
– “analysis of covariance structure.”
• Two sets of variables
– Indicators – measured (observed, manifest)
variables – diagramed within rectangles
– Latent variables – factors – diagramed within
Causal Language
• Indicators and latent variables may be
classified as “independent” or “dependent”
• Even if no variables are manipulated
• Based on the causal model being tested.
• In the diagram, indicators and latent
variables may be connected by arrows
– One-headed = unidirectional causal flow
– Two-headed = direction not specified
• “Dependent” variables have one-headed
arrows pointing to them.
• “Independent” variables do not.
• “Dependent” variables also have residuals
(are not perfectly predicted by the
– Called errors (e) for observed variables
– Disturbances (d) for latent variables
Two Models
• Measurement Model – how the measured
variables are related to the latent
• Structural Model – how the latent variables
are related to each other.
Two Variance Covariance Matrices
• The sample matrix – computed from the
sample data.
• The estimated population matrix –
estimated from the model.
• 2 test of null that the model fits the data
• More useful are goodness of fit estimators
Sample Size
• Need at least 200 cases even for a simple
• Rule of thumb: at least 10 cases per
estimated parameter.
Assumptions & Problems
• Multivariate normality.
• Linear relationships
– But can include polynomial components
• A singular matrix may crash the program
• Multicollinearity can be a problem
Simple Example from T&F
The “Independent” Variables are shaded.
Regression Parameters
• Regression coefficients for the paths
– May be “fixed” to value 0 (no path) or 1
– Or estimated from the data ().
Variance/Covariance Parameters
• Variances/Covariances of the
“independent” variables
– May be estimated
– or fixed, to 1 or
– to the variance of a “marker” measured
variable (set to 1 the path to the marker).
• Variances for “dependent” latent variables
are usually fixed to the variance of one of
the measured variables (set to 1 the path
to that measured variable).
Model Identification
• A model is “identified” if there is a unique
solution for each of the estimated
• Determine the number of input data points
(values in the sample variance/covariance
m ( m  1)
• This is
• Where m = number of measured variables.
Model Identification
• For T&F’s simple model, 5(6)/2 = 15 data
Model Identification
• If the number of data points = the number
of parameters to be estimated, the model
is “just identified,” or “saturated,” and the
fit will be perfect.
• If there are fewer data points than
parameters to be estimated, the model is
“under identified” and the analysis is
The “Over Identified” Model
• The number of input data points exceeds
the number of parameters to be estimated.
• This is the desired situation.
• For T&F’s simple model, count the number
of asterisks in the diagram. I count 11.
• 15 input data points, 11 parameters to
estimate  we have an over identified
Eleven Parameters (*)
The “Independent” Variables are shaded.
Identification of the Measurement
Model Should Be OK if
• Only one latent variable, at least three
indicators, errors not correlated.
• Two or more latent variables, each has at
least three indicators, errors not
correlated, each indicator loads on only
one latent variable, latent variables are
allowed to covary.
Identification of the Measurement
Model Should Be OK if
• Two or more latent variables, one has only
two indicators, errors are not correlated,
each indicator loads on only one latent
variable, none of the latent variables has a
variance or covariance of zero.
Identification of the Structural
Model May Be OK if
• None of the latent DVs predicts another
latent DV,
• or if one does, it is recursive
(unidirectional) and the disturbances are
not correlated
• If there are nonrecursive relationships (an
arrow from A to B and from B to A), hire an
expert in SEM.
Error in Identification
• If there is a problem, the software will
throw an error.
• The software may suggest a way to reach
• You must tinker with the model to make it
• Maximum Likelihood most common
– An iterative procedure used to maximize fit
between the sample var/cov matrix and the
estimated population var/cov matrix.
• Generalized Least Squares estimation has
also fared well in Monte Carlo
comparisons of techniques.
Modifying and Comparing Models
• May simplify a model by deleting one or
more parameters in it.
• The simplified model is nested with the
previous model.
• Calculate difference 2 = difference
between the two model’s Chi-Squares.
• df = number of parameters deleted.
LM and Wald Tests
• Lagrange Multiplier Test
– Would fit be improved by estimating a
parameter that is currently fixed?
• Wald Test
– Would fixing this parameter significantly
reduce the fit?
– Available in SAS Calis, not in Amos
Reliability of Measured Variables
• Assume that the variance in the measured
variable is due to variance in the latent
variable (the “true scores”) + random error.
• Reliability = true variance divided by (true
and error variance).
• Thus, estimated reliability = the r2 between
measured variable and latent variable.

similar documents