Research Designs and Internal Validity Notes

Internal Validity Threats

  • Validities Review: Research design is evaluated across four primary types of validity:

    • Construct Validity: Evaluates variables and how well they are measured or manipulated.

    • Internal Validity: Evaluates the design itself and the extent to which causal claims are justified.

    • External Validity: Evaluates the sample and how well the results generalize to the population.

    • Statistical Validity: Evaluates the data and the strength/significance of the statistical conclusions.

  • Levels of Causality:

    • Frequency Claims: Focus on how much or how many (e.g., percentages).

    • Association Claims: Focus on covariance or differences between variables.

    • Causal Claims: Focus on cause-and-effect and the concept of change.

  • Definition of Internal Validity: The extent to which one can assume that a causal relationship exists between variables, specifically that the Independent Variable(s) cause changes to the Dependent Variable(s).

  • The Breakfast Example: If a study finds that those who eat breakfast every day perform better on a midterm than those who do not, it cannot be immediately concluded that breakfast leads to better performance. This is because correlational designs leave open many threats to internal validity.

  • General Threats to Internal Validity: These are aspects of a study that leave open the possibility of an alternative explanation.

    • In correlational designs, the relationship may be ABA \rightarrow B, BAB \rightarrow A, or AA and BB may be confounded by a third variable.

    • Selection Threats: This occurs when comparing groups based on pre-existing or non-random criteria. The groups may differ systematically in ways other than the key independent variable, leading to bias.

  • False Experiments: Any design that includes a treatment condition but fails to include a comparison group is not a true experiment.

    • One-group posttest only design: Measuring only after treatment with no baseline or control.

    • One-group pretest-posttest only design: Measuring before and after treatment but without a separate comparison group.

    • Comparison Groups: Necessary to compare participants who receive the IV with those who do not.

  • Specific Internal Validity Threats:

    • History Threat: Other events occurring between the pretest and posttest that might explain the observed outcome.

    • Maturation Threat: The participants themselves change (grow, heal, fatigue) naturally between the pretest and the posttest.

    • Testing / Practice Threat: Participants improve simply because they have had prior exposure to the specific testing procedure or style.

    • Instrumentation Threat: The measurement tool or the way it is used changes between the pretest and the posttest.

    • Regression to the Mean: Extremely high or low scores during the pretest may simply be statistical outliers that stabilize toward the average during the posttest.

    • Attrition: Participants drop out of the study between the pretest and posttest, potentially leaving a biased sample remaining.

Experiments

  • Core Requirements of an Experiment:

    • Random Assignment: The researcher randomly assigns participants to the various IV conditions.

    • Experimental Control: The researcher ensures that the only factor differing between experimental groups is the independent variable (controlling extraneous variables).

    • Experimental Manipulation: The researcher creates two or more experimental conditions to form comparison groups.

  • Case Study: Practice Exams (Balch, 1998):

    • Hypothesis: Students taking a practice exam will score higher on a final exam than those who do not.

    • Theoretical Premise: Opportunities to accurately assess knowledge lead to better academic performance.

    • Methodology: 134134 volunteers from an introductory psychology course (n=168n = 168 students total) taught at Pennsylvania State University, Altoona, in Fall 19961996.

    • Results: The practice-exam group scored significantly higher on a final exam one week later compared to the review-exam group.

  • Independent Variable (IV) Manipulation: The researcher creates different levels of the IV, known as experimental conditions or experimental groups.

    • Example 1: IV = Psychotherapy (2 levels: Treatment vs. Control).

    • Example 2: IV = Length of therapy (3 levels: 11 month, 66 months, 1212 months).

    • Example 3: IV = Type of therapy (4 levels: Therapy A, B, C, and D).

  • Types of Control Conditions:

    • No-treatment control: Receives no intervention at all.

    • Placebo control: Receives a simulated treatment lacking the active elements of the IV.

    • Treatment-as-usual (TAU) control: Receives a standard or alternative treatment instead of the specific one being tested.

  • Rationale for Active Controls: Used to mitigate:

    • Placebo effects: Improvements based on participant expectations.

    • Reactivity: Changes in behavior because participants know they are being observed (e.g., being shy or attention-seeking).

    • Demand Characteristics: Biased responses based on the participant's expectations of the study's goals.

  • Random Assignment Approaches:

    • Simple Random Assignment: Using a random process to assign a large number of participants.

    • Block Randomization: Randomization occurs in blocks (e.g., ABCABCABCABC) to ensure equal sample sizes in each condition.

    • Matched-groups Design: Participants are matched on specific important traits (like current GPA) and then randomly assigned. This is useful for small samples or when strong confounds exist.

    • Example (Balch, 1998): Students were ranked by grade. Adjacent ranks (Rank 11 and 22, etc.) were paired and then randomly assigned to ensure the groups' grade baselines were similar.

  • Design Confounds:

    • Confounded Constructs: Unintentionally manipulating more than just the IV.

    • Observer Biases / Expectancy Effects: Researcher expectations vary across levels.

    • Solution: Careful construction of comparison groups and strict controls to isolate causal mechanisms.

Basic Experimental Designs

  • Between-groups (Between-subjects) Design: Each participant receives only one level of the independent variable.

    • Randomized Posttest-only Design: Basic experiment where participants are randomized, then the DV is measured once after the IV manipulation.

    • Randomized Pretest-posttest Design: Participants are measured on the DV before and after the IV manipulation.

    • Limits: Requires large sample sizes for statistical power, carries risks of selection effects in small samples, and may face ethical hurdles regarding condition assignment.

  • Within-groups (Within-subjects) Design: Each participant receives multiple or all levels of the independent variable.

    • Concurrent Measures Design: Participant is exposed to all levels of the IV at the same time.

    • Repeated Measures Design: Participant is exposed to levels of the IV sequentially.

    • Benefits: Requires a smaller sample size (participants serve as their own comparison), more ethical (everyone gets treated), and eliminates selection threats.

  • Limits of Within-groups Designs:

    • Order Effect: Internal validity threat where the order of conditions affects responses.

    • Carry Over Effect: Exposure to one level of the IV carries over and affects performance in subsequent levels.

    • Practice Effect: People improve due to repeated measures.

    • Fatigue Effect: People get tired or bored with repeated measurements.

    • Calibration/Instrumentation errors: Measurement tools or observer accuracy may shift over time.

  • Counterbalancing (Reducing Order Effects):

    • Full Counterbalancing: All possible order conditions are used with equal participants per order.

      • 22 levels = 22 orders.

      • 33 levels = 66 orders.

      • 44 levels = 2424 orders.

      • 55 levels = 120120 orders.

      • 66 levels = 720720 orders.

    • Latin Square: Creates balanced order combinations equal to the number of levels. In a 3×33 \times 3 square, Level A appears 1st, 2nd, and 3rd across different orders.

    • Random Counterbalancing: Used when there are too many IV levels; levels are presented in a completely random or randomly selected order.

Quasi-experimental Design

  • Definition: A design where the researcher manually attempts to establish causality (Temporal precedence, Covariance, Internal validity) when a true experiment is not feasible due to ethics, logistics, or generalizability goals.

  • Weaknesses of Non-Experimental Approaches:

    • Self-reports: Limited by poor memory, illusory correlations (perceiving non-existent patterns), and history threats.

    • One-group designs: No way to assess what would have happened without the treatment.

  • Techniques for Quasi-Experiments:

    • Pretest measures: Adding these to test for selection threats when random assignment is not possible.

    • Equating/Matching:

      • Randomized Matched-groups: Matching participants on traits, then randomly assigning (MatchingTreatment\text{Matching} \rightarrow \text{Treatment}).

      • Non-equivalent Matched-groups: Matching treatment participants with similar control participants after the fact (TreatmentMatching\text{Treatment} \rightarrow \text{Matching}).

    • Switching Replication: Adding and removing the IV levels over time so participants serve as their own control.

  • Establishing Temporal Precedence:

    • Cross-sectional studies: Comparing pre-existing groups (e.g., different ages) at one time point.

    • Trend analysis: Examining patterns over time, though not always with the same participants.

    • Longitudinal designs: Examining the same variables in the same group of participants over time.

    • Time Series Analysis: Examining trends immediately before and after a naturalistic event or planned intervention.

  • Multiple Time Series Design: Combines control groups with trend observation. This design helps control for threats such as history, maturation, testing, and instrumentation. It involves a baseline period, a treatment period, and post-treatment observations for both a treatment group and a control group.