Multivariate Statistics
Least Squares ANOVA & ANCOV
Repeated Measures ANOVA
Cluster Analysis
Least Squares ANOVA
• Do ANOVA as a multiple regression.
• Each factor is represented by k-1
dichotomous dummy variables
• Interactions are represented as products
of dummy variables.
A x B Factorial: Dummy Variables
• Two levels of A
– A1 = 1 if at Level 1 of A, 0 if not
• Three levels of B
– B1 = 1 if at Level 1 of B, 0 if not
– B2 = 1 if at Level 2 of B, 0 if not
• A x B interaction (2 df)
– A1B1 codes the one df
– A1B2 codes the other df
A x B Factorial: The Model
• Y = a + b1A1 + b2B1 +b3B2 + b4A1B1 +
b5A1B2 + error.
• Do the multiple regression.
• The regression SS represents the
combined effects of A and B (and
Partitioning the Sums of Squares
• Drop A1 from the full model.
– The decrease in the regression SS is the SS
for the main effect of A.
• Drop B1 and B2 from the full model.
– The decrease in the regression SS is the SS
for the main effect of B.
• Drop A1B1 and A1B2 from the full model
– The decrease in the regression SS is the SS
for the interaction.
Unique Sums of Squares
• This method produces a unique sum of
squares for each effect, representing the
effect after eliminating overlap with any
other effects in the model.
• In SAS these are Type III sums of squares
• Overall and Spiegel called them Method I
sums of squares.
Analysis of Covariance
• Simply put, this is a multiple regression where
there are both categorical and continuous
• In the ideal circumstance (the grouping variables
are experimentally manipulated), there will be no
association between the covariate and the
grouping variables.
• Adding the covariate to the model may reduce
the error sum of squares and give you more
Big Error = Small F, Large p
Sums of Squares
Add Covariate, Lower Error
Sum of Squares
Big F = Happy Researcher
Sum of Squares
Confounded ANCOV
• If the data are nonexperimental, or the
covariate measured after manipulating the
independent variables, then the covariate
will be correlated with the grouping
• Including it in the model will change the
treatment sums of squares.
• And make interpretation rather slippery.
A Simple Example
• One Independent Variable (A) with three
• One covariate (C)
• Y = a + b1A1 + b2A2 +b3C + b4A1C + b5A2C
+ error.
• A1C and A2C represent the interaction
between the independent variable and the
Covariate x IV Interaction
• We drop the two interaction terms from the
• If the regression SS decreases markedly,
then the relationship between the
covariate and Y varies across levels of the
• This violates the homogeneity of
regression assumption of the traditional
Wuensch & Poteat, 1998
• Decision (stop or continue the research)
was not the only dependent variable.
• Subjects also were asked to indicate how
justified the research was.
• Predict justification scores from
– Idealism and relativism (covariates)
– Sex and purpose of research (grouping
Covariates Not Necessarily
• Psychologists often think of the covariate
as being nuisance variables.
• They want their effects taken out of error
• For my research, however, I had a
genuine interest in the effects of idealism
and relativism.
The Results
• There were no significant interactions.
• Every main effect was significant.
• Idealism was negatively related to
• Relativism was positively related to
• Men thought the research more justified
than did women.
• Purpose of the research had a significant
• The cosmetic testing and neuroscience
theory testing received mean justification
ratings significantly less than those of the
medical research.
• Hmmm, our students think the cosmetic
testing not justified, but they vote to
continue it anyhow.
Repeated Measures ANOVA
• In the traditional (“univariate”) approach,
subjects is treated as an additional
classification variable.
• A one-way RM ANOVA is really a two-way
ANOVA, with subjects being the second
• This analysis assumes sphericity.
• Suppose we have five levels of repeated
factor A.
• Find the standard error for the difference
between level j and level k.
• We assume that standard error is constant
across jk pairs.
• This assumption is frequently violated with
behavioral data.
• There are procedures that correct for
violation of the assumption of sphericity.
• They reduce the degrees of freedom,
much like done in the Welch ANOVA.
• Greenhouse-Geisser is the more
conservative procedure.
• Huynh-Feldt is the less conservative
The Multivariate Approach
• Suppose you have a one-way RM design
with five levels of the grouping variable
• You treat the scores at any one level of G
as one variable, so you now have five
variables (G1 through G5), not two
variables (G and Y).
Orthogonal Contrasts
• Behind the scenes, your statistical
program creates a complete set of
orthogonal contrasts for the RM factor.
• It then tests the null that every one of
those contrasts has a mean of zero.
• If that null is rejected, you conclude the
RM factor has a significant effect.
• There is no sphericity assumption with
the multivariate-approach analysis.
Doubly Multivariate Analysis
• Suppose that you have a design with one
or more RM factor(s)
• And you also have multiple dependent
• If you take the multivariate approach to
analysis of the RM factor(s), then you
have a doubly multivariate analysis.
Effects of Cross-Species Rearing
• Wuensch (1992)
• Newborn Mus fostered onto Mus,
Peromyscus or Rattus.
• Tested in apparatus where could visit four
tunnels which smelled like
– Clean pine shavings
– Mus
– Peromyscus
– Rattus
Mus musculus
Peromyscus maniculatus
Rattus norwegicus
The Design
• Dependent variables were
– Latency to first visit of each tunnel
– Number of visits to each tunnel
– Cumulative time spent in each tunnel
• Independent variables were
– Scent of tunnel (4 levels, within-subjects)
– Foster species (3 levels, between-subjects)
Doubly Multivariate Results
• There were significant results of Foster
Species, Scent of Tunnel, and the
• This was followed by univariate ANOVA,
Foster Species x Scent of Tunnel, on each
of the three dependent variables.
Results of the Univariate ANOVAs
• The interaction was significant for each
dependent variable.
• Conducted simple main effects analysis.
• Mus reared by Rattus had significantly
more visits to and cumulative time in the
rat-scented tunnel that did the other
groups, and shorter latencies as well.
• The other groups avoided the rat-scented
Cluster Analysis
• Goal is to cluster cases into groups based
on shared characteristics.
• Start out with each case being a one-case
• The clusters are located in k-dimensional
space, where k is the number of variables.
• Compute the squared Euclidian distance
between each case and each other case.
Squared Euclidian Distance
 X
 Yi 
i 1
• the sum across variables (from i = 1 to v)
of the squared difference between the
score on variable i for the one case (Xi)
and the score on variable i for the other
case (Yi)
• The two cases closest to each other are
agglomerated into a cluster.
• The distances between entities (clusters
and cases) are recomputed.
• The two entities closest to each other are
• This continues until all cases end up in
one cluster.
What is the Correct Solution?
• You may have theoretical reasons to
expect a certain k cluster solution.
• Look at that solution and see if it matches
your expectations.
• Alternatively, you may try to make sense
out of solutions at two or more levels of
the analysis.
Faculty Salaries
• Subjects were faculty in Psychology at
• Variables were rank, experience, number
of publications, course load, and salary.
• The 2 cluster solution was adjuncts versus
everybody else.
• Adjuncts had lower rank, experience,
number of publications, course load, and
Three Cluster Solution
• Non-adjuncts were split into senior faculty
and junior faculty.
• Senior faculty had higher salary,
experience, rank, and number of pubs.
Four Cluster Solution
• The senior faculty were split into two
groups: The acting chair of the
department and all of the rest of the senior
• The acting chair had a higher salary and
number of publications.
• Aziz & Zickar (2005)
• Workaholics may be defined as those
– High in work involvement,
– High in drive to work, and
– Low in work enjoyment.
• For each case, a score was obtained for
each of these three dimensions.
The Three Cluster Solution
• Workaholics
– High work involvement
– High drive to work
– Low work enjoyment
• Positively engaged workers (KLW)
– High work involvement
– Medium drive to work
– High work enjoyment
• Unengaged workers
– Low work involvement
– Low drive to work
– Low work enjoyment
• Past research/theory indicated there
should be six clusters, but the theorized
six clusters were not obtained.

similar documents