Repeated Measures Designs and Analyses
Between vs. Within Subjects Designs
Between (Independent) Subjects Designs:
Typical experimental procedure where each level of the independent variable is assigned to different groups of people.
Comparisons are made between different groups.
Within Subjects Designs:
The independent variable involves multiple assessments of the same group under different conditions.
Increased statistical power is achieved because it requires fewer subjects.
Other independent variables may be between subjects, creating Mixed Designs.
Basic Between-Subjects Design
Two separate groups: Observed (naturally occurring) or randomly assigned to be equivalent.
Independent Variable: One group receives the experimental condition/treatment, while the other does not.
Dependent Variable(s): Measured in both groups.
Hypothesis is tested via differences between groups.
Statistical power is lower, requiring a large number of subjects.
Reminder About Between Groups ANOVA
Between groups ANOVA culminates in a ratio comparing two mathematically distinct estimates of the population variance:
Variance estimated based on differences between the means of the groups.
Partially comprised of chance variation and the effects of the treatment (if one exists).
Variance estimated based on the average of the variances within each of the groups.
F-ratios are generally expressed as:
\frac{Systematic \ Variation + Unsystematic \ Error}{Unsystematic \ Error}
Within Subjects Designs
All participants experience the Control Condition and measurement.
All participants then experience the experimental intervention and measurement.
Hypothesis tested by differences between conditions (Observation 1 vs. Observation 2) within the group.
Statistical power is increased, requiring fewer subjects.
Logic of Repeated Measures ANOVA
Repeated measures designs have greater sensitivity to differences due to treatment.
A major source of unsystematic error variance is eliminated by using the same people in each condition.
The goal is to see if between-subjects treatments made up a substantial amount of the total variance relative to error variation.
Between-groups variance component includes a great deal of unsystematic error.
Trying to detect a signal through a lot of noise.
Each participant acts as his or her own experimental control across conditions.
Observe more directly the systematic effects of the independent variable.
Systematic error can be partialed out from the unsystematic error in the analysis.
This will be removed from the error term in the analysis.
This will make any systematic variation easier to detect.
There is less noise to contend with while looking for a signal.
The analysis must be re-structured quite a bit.
Logic of Repeated Measures ANOVA
New variance component:
Sums of Square Within Participant, SSW: This represents the variability of each individual across experimental conditions.
Degrees of freedom associated with this sum of squares is n * (k-1), where n is the number of participants and k is the number of conditions.
SSW = \sum \sum (X_p - \bar{X})^2
Logic of Repeated Measures ANOVA
Sums of Square Within Participant (i.e., the extent to which individuals tend to vary across conditions) is a function of both systematic and unsystematic factors.
'Between group' differences now measure:
Systematic differences within people across conditions.
Because the same people are compared across every condition, SSM is now relatively free of unsystematic error.
If we partial the sums of squares for our IV from the sum of squares within participants, we have a new measure of the unsystematic (i.e., error) variance.
Logic of Repeated Measures ANOVA
The error term for within-subjects comparisons will now be:
SSR = SSW − SSM
Degrees of freedom will be equal to the degrees of freedom for our SSW minus the degrees of freedom for our model.
Summary of basic steps for a Repeated Measures ANOVA
Calculate the sums of squares within participant (SSW).
Reflects both systematic variation due to treatment and unsystematic differences across individuals.
Calculate our 'between groups' or, more accurately, between condition sums of squares (SSM) like before.
Subtract SSM from SSW to get our new error term SSR.
Generate means squares for our model and error terms and compute our F-ratios.
Repeated measures example: Attribution of claimant need and aid allocations
Skitka & Tetlock, 1992
People use information about why a person is in need to decide whether they are deserving of allocations of scarce resources.
Ninety-seven participants each read information about several claimants that fell into four categories of reason of need.
Repeated measures example: Attribution of claimant need and aid allocations
Categories of Reason for Need:
Internal controllable – Despite his doctor’s repeated warnings about the damaging effects on his health and the probability of severe organ damage, this person continued to eat high cholesterol foods, smoke, and not exercise. As a result he now has a severe organ failure.
Internal uncontrollable – Genetically defective organ
External controllable – Employer knowingly exposed patient to a chemical
External uncontrollable – Medication patient took for a time had an unknown side effect that caused organ failure
Repeated measures example: Attribution of claimant need and aid allocations
After reading each scenario, participants rated the deservingness of each claimant for an organ transplant on a 1 (not at all deserving) to 7 (extremely deserving) scale.
Based on Skitka and Tetlock’s theory, we might expect that:
Internal controllable claimants will be seen as least deserving than other sources of need.
External uncontrollable claimants will be seen as most deserving.
Descriptive Statistics: Deservingness Ratings by Cause of Need
Cause of Need | Mean | Std. Deviation | N |
|---|---|---|---|
Internal Controllable | 3.8041 | 1.51263 | 97 |
Internal Uncontrollable | 7.5756 | 1.40318 | 97 |
External Controllable | 7.3325 | 1.98816 | 97 |
External Uncontrollable | 7.8634 | 1.23745 | 97 |
Mauchly's Test of Sphericity
Tests the null hypothesis that the error covariance matrix of the orthonormalized transformed dependent variables is proportional to an identity matrix.
Tests of Within-Subjects Effects
Presents Type III Sum of Squares, degrees of freedom, Mean Square, F statistic, significance, and Partial Eta Squared for CauseofNeed under different Sphericity assumptions.
Contrasts in Repeated Measures Designs
Compute MSComp and evaluate it relative to the error term for our omnibus test.
Note: there is some disagreement over whether we should do this.
The outcomes of our comparisons can be very sensitive to violations of sphericity assumptions.
Contrasts in Repeated Measures Designs
Sphericity is similar to homogeneity of variance assumptions in between subjects designs.
Field: ‘Refers to the equality of variances of the differences between treatment levels.’
If sphericity holds, the variability across people is relatively uniform across the levels of the IV.
If sphericity is violated, the omnibus error term may be too liberal for some comparisons and too conservative for others.
Contrasts in Repeated Measures Designs
With this said, many researchers still use the omnibus error term for doing their planned comparisons.
Partially out of habit.
Also, at least theoretically, the omnibus error term should be more robust.
Sometimes it will be liberal or conservative for a particular comparison.
In the long run, it should yield an overall type I error rate of .05.
A quick note about Post Hoc tests in repeated-measures designs
Even minor problems with sphericity can cause major problems for post hoc tests.
Though some variations on most post hoc tests can be conducted, generally speaking Bonferroni type corrections are generally considered safest.
Descriptive Statistics: Deservingness Ratings by Cause of Need
Cause of Need | Mean | Std. Deviation | N |
|---|---|---|---|
Internal Controllable | 3.8041 | 1.51263 | 97 |
Internal Uncontrollable | 7.5756 | 1.40318 | 97 |
External Controllable | 7.3325 | 1.98816 | 97 |
External Uncontrollable | 7.8634 | 1.23745 | 97 |
Pairwise Comparisons of Cause of Need
Shows mean difference, standard error, significance.
Adjustment for multiple comparisons: Least Significant Difference (equivalent to no adjustments).
Effects size for repeated measures
In repeated measures designs, it is more common to present partial eta squared as your measure of effects size.
It’s calculation is very simple.
\eta^2partial=\frac{SS{Effect}}{SS{Effect} + SS_{Error}}
One last note about repeated measures…
You have to make absolutely certain that there is no possibility of any carry-over effects in the design of your study.
Very often, you can test for the presence of these effects by manipulating your key variable within subjects, but counterbalancing the order in which participants complete tasks between subjects (i.e., mixed design).