Notes on Research Methods: Correlation, Causation, and Observational Approaches

Variables and the Real-World vs. Controlled Studies

Research design aims to control as many variables as possible to isolate effects. The more variables you control, the more the setting resembles a lab, which can diverge from real-world complexity.
Under highly specific conditions, researchers can infer cause-effect more confidently. In real-world settings, many uncontrolled variables reduce this certainty.
Balancing realism with control is a core challenge in research design.
Example: why do people buy more ice cream on days with more traffic?
- Suggested explanation: warm outside weather (temperature) increases both ice cream sales and beach-going traffic.
- This illustrates confounding variables: two things move together not because one causes the other, but because a third factor (temperature) influences both.
In education or other settings, you may observe correlations that prompt follow-up studies to test causality or explore mechanisms.
A claimed causal link can be premature when based on correlation alone; further research is needed to rule out alternative explanations.
Example in schools: two variables may covary (e.g., activity level and absences) but you cannot conclude one causes the other without more controlled evidence.
The concept of correlation without causation is central to interpreting research results responsibly.

Correlation vs Causation and the Third-Variable Problem

Correlation: two variables move together in time or vary together, but this does not prove one causes the other.
Causation: one variable directly causes a change in another.
Common mistake: assuming causation from correlation.
Third-variable problem: a third factor may influence both variables, creating a spurious correlation.
Example: students who work out at a wellness center and fewer illnesses may be correlated, but there could be other explanations (e.g., overall health, time management, access to resources).
It is possible that the relationship is real (direct causation) or that it’s due to random chance, especially with small samples.
Sample size matters: with smaller samples, random patterns are more likely to appear as if they reflect real relationships.

Predictive Value and Limitations of Correlations

The value of correlational findings often lies in prediction and generating hypotheses, not in proving causation.
The IQ example: IQ correlates with various life outcomes (longevity, income, academic success), but correlation does not prove that IQ causes these outcomes.
Correlations can guide further experimentation or deeper data analysis to uncover mechanisms.
When observing a correlation in applied settings (e.g., school curricula and reading scores), treat it as a starting point for investigation rather than a definitive causal claim.
You can’t assume causality from correlation alone; other factors or variables could be driving the observed pattern.
In practice, researchers use correlations to predict and then design studies to test causality, control for confounds, and establish mechanisms.

Correlation Strength, Direction, and Examples of Spurious Correlations

Positive correlation: as one variable increases, the other tends to increase (e.g., higher study time associated with higher test scores).
Negative correlation: as one variable increases, the other tends to decrease (e.g., days absent and overall academic performance in some contexts).
Examples of spurious correlations to illustrate random patterns:
- Number of letters in the Scripps National Spelling Bee winner’s name vs. number of people killed by venomous spiders per year (illustrates a misleading, non-causal link that can arise by chance in charts).
Important caveat: correlation strength can be influenced by sample size; small samples are more prone to exhibiting misleading correlations.
Takeaway: not every observed correlation implies a meaningful or causal relationship; some are coincidental or due to hidden variables.

Naturalistic Observation and Its Trade-offs

Naturalistic observation: researchers observe behavior in its natural environment without manipulation (non-experimental).
Benefits:
- High ecological validity and realism; useful when controlled experiments are infeasible or unethical.
- Good for generating hypotheses and understanding behavior in context.
- In education, you can observe classroom dynamics and student behaviors in real settings to identify what to study next.
Process described: observer goes into a classroom, remains unobtrusive, observes during typical times, and notes what happens to form a starting point for further study.
Direct observation can be highly specific to the question at hand (e.g., a teacher describes a behavior problem, and the observer assesses the behavior during the relevant period).
Key limitations:
- Reactivity: people may change their behavior because they know they’re being watched (observer effect).
- Observer bias: outcomes are filtered through the observer’s own perceptions and interpretations; data can be subjective.
- Difficult to collect large amounts of data objectively; translation from what is observed to notes and analyses can introduce errors.
- Generalizability is limited when focusing on a single case or a small number of individuals.
Case studies as part of naturalistic observation:
- Useful for rare or unique cases (e.g., Phineas Gage: railroad spike through the brain) to infer potential brain-behavior relationships.
- Limitations: poor generalizability; cannot establish causality; highly contextual.
Bias risks in naturalistic observation:
- Researchers’ preconceptions can bias what is observed and how it’s interpreted.
- Strong theoretical biases (e.g., looking for evidence to support a theory) can skew conclusions.
Practical considerations:
- Ethical considerations include consent and privacy; observations in schools require careful handling of student data.

Case Studies: Depth vs Generalization

Phineas Gage as a classic case study example:
- An individual with a single incident (railroad spike injury) leading to changes in personality and behavior.
- Demonstrates how damage to specific brain regions can be linked to changes in function and temperament.
- Provides deep insight into brain-behavior relationships but cannot establish population-wide causal claims.
Strengths of case studies:
- Rich, qualitative detail; can reveal mechanisms and variables that might be missed in large-scale studies.
Limitations of case studies:
- Limited generalizability; cannot infer typical patterns across broader populations; not easily replicated.

Surveys, Ethics, and Practice in Non-Experimental Research

Non-experimental research includes surveys and observational studies where the researcher does not manipulate variables.
Ethical considerations in surveys:
- Informed consent and the option to opt out; participation should be voluntary and voluntary withdrawal must be possible.
- Even with consent, participants may alter responses due to social desirability or wanting to present themselves in a favorable light.
- Honesty cannot be guaranteed; respondents may lie or misreport, intentionally or unintentionally.
Practical issues in survey design:
- Self-report data can be biased by memory, interpretation, and social pressures.
- Sample size impacts reliability and generalizability; larger samples reduce random error but may raise logistical challenges.
Use cases for non-experimental methods:
- When experiments are unethical or impractical (e.g., exposure to trauma in childhood, abuse effects).
- When the goal is prediction, description, or identifying associations rather than proving causality.
Trade-offs between experimental and non-experimental approaches:
- Experiments: high internal validity but may lack external validity and ethical feasibility.
- Non-experimental methods: greater ecological validity and feasibility in many topics, but weaker ability to establish causation.

Statistical and Symbolic Considerations (Key Formulas and Notation)

Correlation coefficient ($r$) measures the strength and direction of a linear relationship between two variables $X$ and $Y$:
- r = rac{ ext{cov}(X,Y) }{ \sigmaX \, \sigmaY }
- Where $ ext{cov}(X,Y)$ is the covariance and $\sigmaX$, $\sigmaY$ are the standard deviations of $X$ and $Y$ respectively.
Sample sizes mentioned in examples:
- Example sample size: $n = 100$
- Example sample size: $n = 500$
Conceptual distinctions:
- Correlation does not imply causation; a third variable may influence both variables, or the association may be due to random variation in a small sample.
- Prediction vs explanation: correlations are often valuable for prediction and theory generation, but not definitive evidence of causal mechanisms.

Practical Takeaways for Research Practice

Treat correlations as starting points for hypothesis generation and for guiding further investigation into causality.
Always consider potential confounds and alternative explanations when interpreting correlations.
Use naturalistic observation to gain context-rich insights, but be mindful of reactivity and observer bias; corroborate with other data sources.
Case studies provide deep, contextual understanding of rare conditions or phenomena, but do not generalize to larger populations.
Non-experimental methods (surveys, observational studies) are essential when experiments are unethical or impractical; apply appropriate caution in interpreting causality.
Ethical considerations are central when collecting data from people, especially in educational or trauma contexts; obtain consent, protect privacy, and anticipate response biases.
When communicating findings, clearly distinguish between correlation, prediction, and causation; avoid overstating causal claims without robust evidence.