Research Designs and Internal Validity Notes

Internal Validity Threats

Validities Review: Research design is evaluated across four primary types of validity:
- Construct Validity: Evaluates variables and how well they are measured or manipulated.
- Internal Validity: Evaluates the design itself and the extent to which causal claims are justified.
- External Validity: Evaluates the sample and how well the results generalize to the population.
- Statistical Validity: Evaluates the data and the strength/significance of the statistical conclusions.
Levels of Causality:
- Frequency Claims: Focus on how much or how many (e.g., percentages).
- Association Claims: Focus on covariance or differences between variables.
- Causal Claims: Focus on cause-and-effect and the concept of change.
Definition of Internal Validity: The extent to which one can assume that a causal relationship exists between variables, specifically that the Independent Variable(s) cause changes to the Dependent Variable(s).
The Breakfast Example: If a study finds that those who eat breakfast every day perform better on a midterm than those who do not, it cannot be immediately concluded that breakfast leads to better performance. This is because correlational designs leave open many threats to internal validity.
General Threats to Internal Validity: These are aspects of a study that leave open the possibility of an alternative explanation.
- In correlational designs, the relationship may be $A \rightarrow B$ , $B \rightarrow A$ , or $A$ and $B$ may be confounded by a third variable.
- Selection Threats: This occurs when comparing groups based on pre-existing or non-random criteria. The groups may differ systematically in ways other than the key independent variable, leading to bias.
False Experiments: Any design that includes a treatment condition but fails to include a comparison group is not a true experiment.
- One-group posttest only design: Measuring only after treatment with no baseline or control.
- One-group pretest-posttest only design: Measuring before and after treatment but without a separate comparison group.
- Comparison Groups: Necessary to compare participants who receive the IV with those who do not.
Specific Internal Validity Threats:
- History Threat: Other events occurring between the pretest and posttest that might explain the observed outcome.
- Maturation Threat: The participants themselves change (grow, heal, fatigue) naturally between the pretest and the posttest.
- Testing / Practice Threat: Participants improve simply because they have had prior exposure to the specific testing procedure or style.
- Instrumentation Threat: The measurement tool or the way it is used changes between the pretest and the posttest.
- Regression to the Mean: Extremely high or low scores during the pretest may simply be statistical outliers that stabilize toward the average during the posttest.
- Attrition: Participants drop out of the study between the pretest and posttest, potentially leaving a biased sample remaining.

Experiments

Core Requirements of an Experiment:
- Random Assignment: The researcher randomly assigns participants to the various IV conditions.
- Experimental Control: The researcher ensures that the only factor differing between experimental groups is the independent variable (controlling extraneous variables).
- Experimental Manipulation: The researcher creates two or more experimental conditions to form comparison groups.
Case Study: Practice Exams (Balch, 1998):
- Hypothesis: Students taking a practice exam will score higher on a final exam than those who do not.
- Theoretical Premise: Opportunities to accurately assess knowledge lead to better academic performance.
- Methodology: $134$ volunteers from an introductory psychology course ( $n = 168$ students total) taught at Pennsylvania State University, Altoona, in Fall $1996$ .
- Results: The practice-exam group scored significantly higher on a final exam one week later compared to the review-exam group.
Independent Variable (IV) Manipulation: The researcher creates different levels of the IV, known as experimental conditions or experimental groups.
- Example 1: IV = Psychotherapy (2 levels: Treatment vs. Control).
- Example 2: IV = Length of therapy (3 levels: $1$ month, $6$ months, $12$ months).
- Example 3: IV = Type of therapy (4 levels: Therapy A, B, C, and D).
Types of Control Conditions:
- No-treatment control: Receives no intervention at all.
- Placebo control: Receives a simulated treatment lacking the active elements of the IV.
- Treatment-as-usual (TAU) control: Receives a standard or alternative treatment instead of the specific one being tested.
Rationale for Active Controls: Used to mitigate:
- Placebo effects: Improvements based on participant expectations.
- Reactivity: Changes in behavior because participants know they are being observed (e.g., being shy or attention-seeking).
- Demand Characteristics: Biased responses based on the participant's expectations of the study's goals.
Random Assignment Approaches:
- Simple Random Assignment: Using a random process to assign a large number of participants.
- Block Randomization: Randomization occurs in blocks (e.g., $ABCABC$ ) to ensure equal sample sizes in each condition.
- Matched-groups Design: Participants are matched on specific important traits (like current GPA) and then randomly assigned. This is useful for small samples or when strong confounds exist.
- Example (Balch, 1998): Students were ranked by grade. Adjacent ranks (Rank $1$ and $2$ , etc.) were paired and then randomly assigned to ensure the groups' grade baselines were similar.
Design Confounds:
- Confounded Constructs: Unintentionally manipulating more than just the IV.
- Observer Biases / Expectancy Effects: Researcher expectations vary across levels.
- Solution: Careful construction of comparison groups and strict controls to isolate causal mechanisms.

Basic Experimental Designs

Between-groups (Between-subjects) Design: Each participant receives only one level of the independent variable.
- Randomized Posttest-only Design: Basic experiment where participants are randomized, then the DV is measured once after the IV manipulation.
- Randomized Pretest-posttest Design: Participants are measured on the DV before and after the IV manipulation.
- Limits: Requires large sample sizes for statistical power, carries risks of selection effects in small samples, and may face ethical hurdles regarding condition assignment.
Within-groups (Within-subjects) Design: Each participant receives multiple or all levels of the independent variable.
- Concurrent Measures Design: Participant is exposed to all levels of the IV at the same time.
- Repeated Measures Design: Participant is exposed to levels of the IV sequentially.
- Benefits: Requires a smaller sample size (participants serve as their own comparison), more ethical (everyone gets treated), and eliminates selection threats.
Limits of Within-groups Designs:
- Order Effect: Internal validity threat where the order of conditions affects responses.
- Carry Over Effect: Exposure to one level of the IV carries over and affects performance in subsequent levels.
- Practice Effect: People improve due to repeated measures.
- Fatigue Effect: People get tired or bored with repeated measurements.
- Calibration/Instrumentation errors: Measurement tools or observer accuracy may shift over time.
Counterbalancing (Reducing Order Effects):
- Full Counterbalancing: All possible order conditions are used with equal participants per order.
  - $2$ levels = $2$ orders.
  - $3$ levels = $6$ orders.
  - $4$ levels = $24$ orders.
  - $5$ levels = $120$ orders.
  - $6$ levels = $720$ orders.
- Latin Square: Creates balanced order combinations equal to the number of levels. In a $3 \times 3$ square, Level A appears 1st, 2nd, and 3rd across different orders.
- Random Counterbalancing: Used when there are too many IV levels; levels are presented in a completely random or randomly selected order.

Quasi-experimental Design

Definition: A design where the researcher manually attempts to establish causality (Temporal precedence, Covariance, Internal validity) when a true experiment is not feasible due to ethics, logistics, or generalizability goals.
Weaknesses of Non-Experimental Approaches:
- Self-reports: Limited by poor memory, illusory correlations (perceiving non-existent patterns), and history threats.
- One-group designs: No way to assess what would have happened without the treatment.
Techniques for Quasi-Experiments:
- Pretest measures: Adding these to test for selection threats when random assignment is not possible.
- Equating/Matching:
  - Randomized Matched-groups: Matching participants on traits, then randomly assigning ( $\text{Matching} \rightarrow \text{Treatment}$ ).
  - Non-equivalent Matched-groups: Matching treatment participants with similar control participants after the fact ( $\text{Treatment} \rightarrow \text{Matching}$ ).
- Switching Replication: Adding and removing the IV levels over time so participants serve as their own control.
Establishing Temporal Precedence:
- Cross-sectional studies: Comparing pre-existing groups (e.g., different ages) at one time point.
- Trend analysis: Examining patterns over time, though not always with the same participants.
- Longitudinal designs: Examining the same variables in the same group of participants over time.
- Time Series Analysis: Examining trends immediately before and after a naturalistic event or planned intervention.
Multiple Time Series Design: Combines control groups with trend observation. This design helps control for threats such as history, maturation, testing, and instrumentation. It involves a baseline period, a treatment period, and post-treatment observations for both a treatment group and a control group.