Advanced Experiments: Research Methods in Psychology - Notes
Threats to Internal Validity
A number of factors can be confounded with your Independent Variable (IV). Ideally, the IV should be the only difference between your groups.
Any other difference might serve as an alternate explanation for your results.
Regression to the Mean
On average, things are average. Big differences tend to get smaller over time.
Consider whether any observed changes can be explained by regression to the mean.
Example: If you select participants based on extreme scores, their scores will likely be less extreme upon retesting.
Maturation
As time goes by, things change naturally.
Need to have an appropriate comparison group to account for these changes.
Have you ruled out placebo effects? Placebo effects are changes in outcomes due to the expectation of an effect, rather than the treatment itself.
Task Demands
The structure of the experimental task can telegraph the researchers’ intent.
Can your participants figure out your hypothesis?
How to avoid:
A plausible cover story (as in Durgin et al.) can reduce suspicion.
Blinding participants can make manipulations less apparent.
History (Third Variables)
Are there new differences that have emerged between your conditions that are not a result of the IV?
Is your manipulation changing something other than the IV?
History refers to events that occur during the course of the study that could affect the outcome.
Selection & Attrition
No one can be compelled to cooperate in research.
Consider who is choosing to sign up for and who drops out of your study. This can introduce bias.
Testing & Instrumentation
Measures (& their effectiveness) can change over time.
Testing: We get better at a test the more times we take it. Do participants in different conditions have equal opportunities to practice any tests?
Instrumentation: Are participants taking the same test? (More of an issue with longer time-frames.) Changes in the measurement instrument or procedure can affect results.
Between-Subjects vs. Within-Subjects Designs
Between-subjects: Different participants are assigned to different conditions.
Within-subjects: The same participants are used in all conditions.
Within-subjects studies have much greater statistical power but require more planning to avoid order effects (such as fatigue or practice).
Statistical power is the ability to detect a real effect if one exists.
\text{Mean A} \text{ Compare } \text{Mean B} \text{ Compare } \text{Mean difference}
Increasing Statistical Power
Statistical power is the ability to find real effects in a given study.
How can it be increased?
Conduct a within-subjects study (w/ appropriate tests).
Conduct a pretest/posttest study.
Larger effects are easier to find – can (potentially) strengthen a manipulation.
Increase sample size.
Provide room to vary in your DV measure (get off the floor/ceiling). Ensure your dependent variable has sufficient range to capture the full effect.
Control more random variables (reduce noise). Minimize extraneous factors that could obscure the true effect.
Interactions & Factorial Designs
We can combine different IVs (different factors) in one study to see the effect they have on a DV both separately and together.
a \text{ x } b \text{ x } c \text{ (x…)}
Levels of IV #1
Levels of IV #2
Levels of IV #3
And so on
Sample Study: Anonymity, Grammar, and Aggression
“How do anonymity and grammar affect people’s aggressive behavior online?”
Factor 1: Anonymity (Anonymous or not)
Factor 2: Grammar (Good or poor)
DV: Aggression
2 levels of 2 different factors = 2 x 2 experiment
Study Design
Rate the spelling/grammar of different online comments (within subjects).
e.g., “Hay check out ths supper cool video i seen!” or “I just got back from the most amazing concert!” (How many mistakes were there?)
Counterbalanced (between subjects): “Hey, check out this super cool video I just saw!” or “i just got bak frm the most amaze concrt!”
Provide opportunity to post to commenters’ walls either anonymously or with real name (between subjects).
Measure aggression through different feedback options. Example: "poop" feedback versus no response
Analyzing Results
What are the (main) effects of:
Grammar?
Anonymity?
Do these effects depend on each other? (Interaction)
“Simple effects” refer to the effect of one IV at a specific level of another IV.
Graphical Representation
A graph displaying the proportion of "poop" feedback left, with grammar on the x-axis (Good Grammar vs. Poor Grammar) and separate lines for Anonymous and Real Name conditions, can help visualize main effects and interactions.
Looking at the graph:
Main Effect of Grammar: Is there a difference in "poop" feedback between good and poor grammar, regardless of anonymity?
Main Effect of Anonymity: Is there a difference in "poop" feedback between anonymous and real name conditions, regardless of grammar?
Interaction: Does the effect of grammar on "poop" feedback depend on whether the comment is anonymous or not? If the lines are not parallel, there is likely an interaction.