Internal Validity and the Seven Threats to Experimental Research

Conceptual Overview of Internal Validity

Internal validity is defined as the degree of confidence a researcher has that the Independent Variable ( $IV$ ) is responsible for the changes observed in the Dependent Variable ( $DV$ ).
It establishes that a relationship exists where the $IV$ is the specific factor that prompted the improvement or change in the $DV$ .
There are seven primary threats to internal validity that researchers must identify and control to ensure the integrity of their findings.

Threat 1: Differential Selection Effects

Definition: These effects occur when participants are assigned to experimental and control groups in a way that is not truly random, leading to unequal groups from the outset of the study.
Influencing Factors: Differences in subject characteristics (e.g., gender, age, intelligence) can become the underlying reason for changes in the $DV$ rather than the actual intervention.
Examples and Scenarios:
- Gender: In a study using play-based therapy featuring dolls and puppets, girls may naturally gravitate toward the materials more than boys. If the experimental group has more girls, their progress might be attributed to gender preferences rather than the therapy itself.
- Age: In a vocabulary study teaching dictionary strategies, a group containing kindergarteners will not perform at the same level as middle schoolers. If kindergarteners are overrepresented in the experimental group, the true influence of the dictionary strategy ( $IV$ ) on vocabulary learning ( $DV$ ) is obscured because the strategy may not be age-appropriate.
- Intelligence (IQ): Assigning higher-functioning students with superior IQs to the experimental group and lower-functioning students to the control group ensures the experimental group will show more progress in a language study due to superior intellectual skills, not the intervention.
Control Method: The primary way to prevent differential selection is through Random Assignment. This process ensures that age, IQ, and other characteristics are mixed evenly across all groups.

Threat 2: History Effect

Definition: This refers to extraneous variables or events occurring outside of the study that influence the $DV$ during the course of the research.
Long-Term Implications: The longer a study lasts, the more likely a history effect will occur.
Example Scenario (Autism Intervention):
- A researcher evaluates a video modeling intervention for social skills in children with autism.
- During the study, the child also receives social skills training at school, extra guidance from parents at home, and private speech therapy.
- It becomes impossible to determine if the social skills improved because of the video modeling ( $IV$ ) or because of the external "history" factors (homework, school training, extra therapy).
Control Methods:
- Shortening the Study: Reducing the duration of the experiment minimizes the window for outside events to interfere.
- Environmental Control: Explicitly asking parents or participants not to engage in extra treatments or supplemental activities during the study period.

Threat 3: Maturation Effect

Definition: Maturation involves internal changes within the participant that occur naturally over time, rather than as a result of external influences.
Common Maturation Factors:
- Natural Growth: Children naturally acquire more language through immersion in their environment.
- Spontaneous Recovery: Individuals recovering from a stroke or Traumatic Brain Injury (TBI) experience natural brain recovery processes.
- Physical Healing: In voice disorders caused by vocal abuse, the voice may naturally recover if the abuse ceases.
- Mental States: Participants may experience boredom or fatigue over the course of a study.
- Physiological Cycles: Activation or wearing off of medications, hunger levels, and time of day (e.g., morning energy vs. end-of-day fatigue) can influence $DV$ performance.
Motor Fatigue Example: Principles of motor learning in apraxia treatment involve drill and multiple exposures, which can lead to motor fatigue, potentially affecting the $DV$ .
Control Methods:
- Control Groups: By including a control group, researchers can measure the "baseline" change due to maturation. For example, if a control group improves by $10\%$ without intervention, and the experimental group improves by $40\%$ , the researcher can attribute $30\%$ of the change to the $IV$ .
- Reducing Study Duration: Shortening the timeframe helps limit the impact of natural development or recovery.

Threat 4: Statistical Regression

Definition: Also known as "regression to the mean," this is the natural tendency for participants who score extremely high or extremely low on a pretest to score closer to the average (mean) on subsequent testing.
Mechanism: An extreme score on a first administration is often an outlier; on the second administration, the score is likely to be less extreme simply due to statistical probability, not the influence of the $IV$ .
Risk: Researchers might mistakenly attribute this natural score shift to the effectiveness of their intervention.
Control Method: Utilizing a Control Group allows the researcher to account for regression, as both groups should experience the same statistical tendency to return to the mean.

Threat 5: Attrition (Mortality) Effect

Definition: Attrition is the loss of participants during the course of a study which can jeopardize the equivalence of the experimental and control groups.
Reasons for Attrition:
- Death (particularly in geriatric populations).
- Illness or lack of motivation.
- Logistical barriers like lack of transportation or financial costs (missing work).
- Intimidation by technology (e.g., elderly participants dropping out of a study requiring iPhone use for memory skills).
Impact on Validity: If only the most motivated or tech-savvy individuals remain in a study, the results may not be generalizable to the broader population, and the groups are no longer equivalent.
Control Methods:
- Shortening Study Length: Less time reduces the opportunity for dropouts.
- Incentives/Support: Providing transportation or compensation to remove participation barriers.
- Screening: Selecting a specific population at the start (e.g., screening for tech literacy) to ensure the remaining sample is consistent.

Threat 6: Testing Effect

Definition: The act of taking a test can itself change how a participant performs on subsequent tests, independent of the $IV$ .
Factors of the Testing Effect:
- Practice Effects: Learning the answers or format of a pretest and remembering them for the post-test.
- Test Anxiety: Nervousness during the first test might decrease or increase by the second.
- Social Desirability: A participant (e.g., someone who stutters) might pretend to be less anxious at the end of the study to avoid disappointing the researcher.
Control Methods:
- Counterbalancing: Using two different but equivalent versions of a test (Form A and Form B). Half the participants take Form A as a pretest and Form B as a post-test, while the other half reverses the order.
- Spaced Testing: Increasing the time between test administrations so participants forget specific questions/answers.
- Eliminating Pretests: Using a control group and only performing a post-test to avoid the influence of a prior test administration.

Threat 7: Instrumentation Effect

Definition: This threat involves variations in the instruments, equipment, or human observers used to measure the $DV$ .
Technical Issues: Uncalibrated equipment (e.g., an audiometer or Visipitch) might produce inaccurate data regarding hearing levels, pitch, or speech rate.
Human Issues: Variability in how different researchers set up equipment, place electrodes, or interpret results.
Control Methods:
- Calibration: Ensuring all mechanical and electronic equipment is functioning properly and calibrated to standard specifications.
- Standardized Training: Ensuring all researchers are trained to use the equipment identically to eliminate variability in administration.

Practical Application in Research Critiques

When evaluating a research article, students should ask the following questions to assess internal validity:

Control Groups: Does the study include a control group to account for maturation, history, and regression?
Pretest-Posttest Design: If this design is used, was there a testing effect? Did they use counterbalancing (Forms A and B)?
Sample Size and Attrition: Did the sample size drop significantly (e.g., from $n = 30$ at the start to $n = 15$ at the end)? If so, how did attrition impact the results?
Study Duration: Is the study a year long? If there is no control group, could the results be due to history or maturation effects?
Baseline Scores: Did participants start with extremely low scores? If there is no control group, could the improvement be simple statistical regression to the mean?