Notes on Research Methods: Correlation, Causation, and Observational Approaches

Variables and the Real-World vs. Controlled Studies

  • Research design aims to control as many variables as possible to isolate effects. The more variables you control, the more the setting resembles a lab, which can diverge from real-world complexity.
  • Under highly specific conditions, researchers can infer cause-effect more confidently. In real-world settings, many uncontrolled variables reduce this certainty.
  • Balancing realism with control is a core challenge in research design.
  • Example: why do people buy more ice cream on days with more traffic?
    • Suggested explanation: warm outside weather (temperature) increases both ice cream sales and beach-going traffic.
    • This illustrates confounding variables: two things move together not because one causes the other, but because a third factor (temperature) influences both.
  • In education or other settings, you may observe correlations that prompt follow-up studies to test causality or explore mechanisms.
  • A claimed causal link can be premature when based on correlation alone; further research is needed to rule out alternative explanations.
  • Example in schools: two variables may covary (e.g., activity level and absences) but you cannot conclude one causes the other without more controlled evidence.
  • The concept of correlation without causation is central to interpreting research results responsibly.

Correlation vs Causation and the Third-Variable Problem

  • Correlation: two variables move together in time or vary together, but this does not prove one causes the other.
  • Causation: one variable directly causes a change in another.
  • Common mistake: assuming causation from correlation.
  • Third-variable problem: a third factor may influence both variables, creating a spurious correlation.
  • Example: students who work out at a wellness center and fewer illnesses may be correlated, but there could be other explanations (e.g., overall health, time management, access to resources).
  • It is possible that the relationship is real (direct causation) or that it’s due to random chance, especially with small samples.
  • Sample size matters: with smaller samples, random patterns are more likely to appear as if they reflect real relationships.

Predictive Value and Limitations of Correlations

  • The value of correlational findings often lies in prediction and generating hypotheses, not in proving causation.
  • The IQ example: IQ correlates with various life outcomes (longevity, income, academic success), but correlation does not prove that IQ causes these outcomes.
  • Correlations can guide further experimentation or deeper data analysis to uncover mechanisms.
  • When observing a correlation in applied settings (e.g., school curricula and reading scores), treat it as a starting point for investigation rather than a definitive causal claim.
  • You can’t assume causality from correlation alone; other factors or variables could be driving the observed pattern.
  • In practice, researchers use correlations to predict and then design studies to test causality, control for confounds, and establish mechanisms.

Correlation Strength, Direction, and Examples of Spurious Correlations

  • Positive correlation: as one variable increases, the other tends to increase (e.g., higher study time associated with higher test scores).
  • Negative correlation: as one variable increases, the other tends to decrease (e.g., days absent and overall academic performance in some contexts).
  • Examples of spurious correlations to illustrate random patterns:
    • Number of letters in the Scripps National Spelling Bee winner’s name vs. number of people killed by venomous spiders per year (illustrates a misleading, non-causal link that can arise by chance in charts).
  • Important caveat: correlation strength can be influenced by sample size; small samples are more prone to exhibiting misleading correlations.
  • Takeaway: not every observed correlation implies a meaningful or causal relationship; some are coincidental or due to hidden variables.

Naturalistic Observation and Its Trade-offs

  • Naturalistic observation: researchers observe behavior in its natural environment without manipulation (non-experimental).
  • Benefits:
    • High ecological validity and realism; useful when controlled experiments are infeasible or unethical.
    • Good for generating hypotheses and understanding behavior in context.
    • In education, you can observe classroom dynamics and student behaviors in real settings to identify what to study next.
  • Process described: observer goes into a classroom, remains unobtrusive, observes during typical times, and notes what happens to form a starting point for further study.
  • Direct observation can be highly specific to the question at hand (e.g., a teacher describes a behavior problem, and the observer assesses the behavior during the relevant period).
  • Key limitations:
    • Reactivity: people may change their behavior because they know they’re being watched (observer effect).
    • Observer bias: outcomes are filtered through the observer’s own perceptions and interpretations; data can be subjective.
    • Difficult to collect large amounts of data objectively; translation from what is observed to notes and analyses can introduce errors.
    • Generalizability is limited when focusing on a single case or a small number of individuals.
  • Case studies as part of naturalistic observation:
    • Useful for rare or unique cases (e.g., Phineas Gage: railroad spike through the brain) to infer potential brain-behavior relationships.
    • Limitations: poor generalizability; cannot establish causality; highly contextual.
  • Bias risks in naturalistic observation:
    • Researchers’ preconceptions can bias what is observed and how it’s interpreted.
    • Strong theoretical biases (e.g., looking for evidence to support a theory) can skew conclusions.
  • Practical considerations:
    • Ethical considerations include consent and privacy; observations in schools require careful handling of student data.

Case Studies: Depth vs Generalization

  • Phineas Gage as a classic case study example:
    • An individual with a single incident (railroad spike injury) leading to changes in personality and behavior.
    • Demonstrates how damage to specific brain regions can be linked to changes in function and temperament.
    • Provides deep insight into brain-behavior relationships but cannot establish population-wide causal claims.
  • Strengths of case studies:
    • Rich, qualitative detail; can reveal mechanisms and variables that might be missed in large-scale studies.
  • Limitations of case studies:
    • Limited generalizability; cannot infer typical patterns across broader populations; not easily replicated.

Surveys, Ethics, and Practice in Non-Experimental Research

  • Non-experimental research includes surveys and observational studies where the researcher does not manipulate variables.
  • Ethical considerations in surveys:
    • Informed consent and the option to opt out; participation should be voluntary and voluntary withdrawal must be possible.
    • Even with consent, participants may alter responses due to social desirability or wanting to present themselves in a favorable light.
    • Honesty cannot be guaranteed; respondents may lie or misreport, intentionally or unintentionally.
  • Practical issues in survey design:
    • Self-report data can be biased by memory, interpretation, and social pressures.
    • Sample size impacts reliability and generalizability; larger samples reduce random error but may raise logistical challenges.
  • Use cases for non-experimental methods:
    • When experiments are unethical or impractical (e.g., exposure to trauma in childhood, abuse effects).
    • When the goal is prediction, description, or identifying associations rather than proving causality.
  • Trade-offs between experimental and non-experimental approaches:
    • Experiments: high internal validity but may lack external validity and ethical feasibility.
    • Non-experimental methods: greater ecological validity and feasibility in many topics, but weaker ability to establish causation.

Statistical and Symbolic Considerations (Key Formulas and Notation)

  • Correlation coefficient ($r$) measures the strength and direction of a linear relationship between two variables $X$ and $Y$:
    • r = rac{ ext{cov}(X,Y) }{ \sigmaX \, \sigmaY }
    • Where $ ext{cov}(X,Y)$ is the covariance and $\sigmaX$, $\sigmaY$ are the standard deviations of $X$ and $Y$ respectively.
  • Sample sizes mentioned in examples:
    • Example sample size: n=100n = 100
    • Example sample size: n=500n = 500
  • Conceptual distinctions:
    • Correlation does not imply causation; a third variable may influence both variables, or the association may be due to random variation in a small sample.
    • Prediction vs explanation: correlations are often valuable for prediction and theory generation, but not definitive evidence of causal mechanisms.

Practical Takeaways for Research Practice

  • Treat correlations as starting points for hypothesis generation and for guiding further investigation into causality.
  • Always consider potential confounds and alternative explanations when interpreting correlations.
  • Use naturalistic observation to gain context-rich insights, but be mindful of reactivity and observer bias; corroborate with other data sources.
  • Case studies provide deep, contextual understanding of rare conditions or phenomena, but do not generalize to larger populations.
  • Non-experimental methods (surveys, observational studies) are essential when experiments are unethical or impractical; apply appropriate caution in interpreting causality.
  • Ethical considerations are central when collecting data from people, especially in educational or trauma contexts; obtain consent, protect privacy, and anticipate response biases.
  • When communicating findings, clearly distinguish between correlation, prediction, and causation; avoid overstating causal claims without robust evidence.