Lectures on Causation, Bias, and Validity

Causation, Bias, and Validity in Research

Understanding Causation and Systematic Error

Causation Basis: The concept of causation hinges on data being systematically incorrect, not randomly incorrect.
Systematic Error: When data is systematically incorrect, it leads to inaccurate estimates of the true value of variables.
- Example: Asking participants about alcohol consumption for a study linking it to high blood pressure. Participants may underreport their consumption levels due to social desirability, introducing a systematic error and bias.

Sources of Bias

Social Desirability Bias: Participants may underreport behaviors perceived negatively or overreport behaviors perceived positively.
- Example: Underreporting alcohol consumption or misreporting income in surveys.
Sampling Bias: Occurs when the sample is not representative of the target population.
- Self-Selection: Individuals choosing to participate in a study or treatment group are often systematically different from those who do not.
  - Example: Healthier individuals choosing to take a fitness program, making them systematically different from the general population.
Measurement Bias: Inaccurate or systematic errors in how variables are measured.
- Example: Inaccurate self-reports of alcohol consumption or income, which are systematic rather than random errors.
Selection Bias: Occurs when individuals self-select into treatment groups, often based on factors related to outcomes.
- Problem: These self-selected individuals are systematically different from those who do not self-select.
Simultaneity Bias (Endogeneity): Occurs when independent (x) and dependent (y) variables influence each other at the same time.
- Example: Education (x) and income (y) affecting each other simultaneously.

Confounding Variables Revisited

Impact on Error Term: If confounding variables are not controlled for in a regression equation, they become part of the error term.
Correlation with Independent Variable: This means there's something in the error term that is correlated with the independent variable (x) and is also impacting the dependent variable (y).

Direct Effect

A direct effect is observed when an independent variable (x) directly influences a dependent variable (y).

Internal Validity: Did the Treatment Really Cause the Outcome?

Definition: Internal validity refers to the extent to which a study can confidently determine that the observed outcome was truly caused by the treatment or independent variable, rather than by other extraneous factors.
Threats to Internal Validity: Factors that can provide alternative explanations for observed changes, making it difficult to attribute the outcome solely to the treatment. These often cause systematic differences between treatment and control groups related to the intervention or pre-existing differences.
- History: Events or changes occurring during the study period, external to the treatment, that can independently impact participants' outcomes.
  - Example: Participants in a weight loss program also start a new diet outside the study. It becomes unclear if observed weight loss is due to the program or the new diet.
- Maturation: Natural changes or developments in participants over time, independent of the intervention.
  - Example: Testing a reading intervention program for six-year-olds over six months. Any improvement in reading skills might be due to natural development over that period, not necessarily the program.
- Regression to the Mean: The tendency for extreme scores (either very high or very low) to move closer towards the average on subsequent measurements, even without any intervention.
  - Example: Studying the effect of a math tutoring program on students who scored very poorly on an initial test. Post-tutoring improvements might be partly due to natural fluctuations, less anxiety, or better sleep on the second test, rather than solely the tutoring.
- Attrition (Mortality): Participants dropping out of a study, especially if they do so in a non-random way, resulting in the remaining sample being systematically different from the initial sample.
  - Problem: If those who remain are more motivated or have other systematically different characteristics, observed positive results cannot be solely attributed to the program.
- Testing: The effect of taking a test or measurement multiple times, which can influence participants' performance on later tests.
  - Example: Familiarity with the SAT format improves a student's ability to take the test, even if questions differ, leading to improved scores not solely attributable to learning content.
  - Mitigation: Researchers might modify test formats or control for the