Notes on Research Methods: Correlation, Causation, and Observational Approaches
Variables and the Real-World vs. Controlled Studies
- Research design aims to control as many variables as possible to isolate effects. The more variables you control, the more the setting resembles a lab, which can diverge from real-world complexity.
- Under highly specific conditions, researchers can infer cause-effect more confidently. In real-world settings, many uncontrolled variables reduce this certainty.
- Balancing realism with control is a core challenge in research design.
- Example: why do people buy more ice cream on days with more traffic?
- Suggested explanation: warm outside weather (temperature) increases both ice cream sales and beach-going traffic.
- This illustrates confounding variables: two things move together not because one causes the other, but because a third factor (temperature) influences both.
- In education or other settings, you may observe correlations that prompt follow-up studies to test causality or explore mechanisms.
- A claimed causal link can be premature when based on correlation alone; further research is needed to rule out alternative explanations.
- Example in schools: two variables may covary (e.g., activity level and absences) but you cannot conclude one causes the other without more controlled evidence.
- The concept of correlation without causation is central to interpreting research results responsibly.
Correlation vs Causation and the Third-Variable Problem
- Correlation: two variables move together in time or vary together, but this does not prove one causes the other.
- Causation: one variable directly causes a change in another.
- Common mistake: assuming causation from correlation.
- Third-variable problem: a third factor may influence both variables, creating a spurious correlation.
- Example: students who work out at a wellness center and fewer illnesses may be correlated, but there could be other explanations (e.g., overall health, time management, access to resources).
- It is possible that the relationship is real (direct causation) or that it’s due to random chance, especially with small samples.
- Sample size matters: with smaller samples, random patterns are more likely to appear as if they reflect real relationships.
Predictive Value and Limitations of Correlations
- The value of correlational findings often lies in prediction and generating hypotheses, not in proving causation.
- The IQ example: IQ correlates with various life outcomes (longevity, income, academic success), but correlation does not prove that IQ causes these outcomes.
- Correlations can guide further experimentation or deeper data analysis to uncover mechanisms.
- When observing a correlation in applied settings (e.g., school curricula and reading scores), treat it as a starting point for investigation rather than a definitive causal claim.
- You can’t assume causality from correlation alone; other factors or variables could be driving the observed pattern.
- In practice, researchers use correlations to predict and then design studies to test causality, control for confounds, and establish mechanisms.
Correlation Strength, Direction, and Examples of Spurious Correlations
- Positive correlation: as one variable increases, the other tends to increase (e.g., higher study time associated with higher test scores).
- Negative correlation: as one variable increases, the other tends to decrease (e.g., days absent and overall academic performance in some contexts).
- Examples of spurious correlations to illustrate random patterns:
- Number of letters in the Scripps National Spelling Bee winner’s name vs. number of people killed by venomous spiders per year (illustrates a misleading, non-causal link that can arise by chance in charts).
- Important caveat: correlation strength can be influenced by sample size; small samples are more prone to exhibiting misleading correlations.
- Takeaway: not every observed correlation implies a meaningful or causal relationship; some are coincidental or due to hidden variables.
Naturalistic Observation and Its Trade-offs
- Naturalistic observation: researchers observe behavior in its natural environment without manipulation (non-experimental).
- Benefits:
- High ecological validity and realism; useful when controlled experiments are infeasible or unethical.
- Good for generating hypotheses and understanding behavior in context.
- In education, you can observe classroom dynamics and student behaviors in real settings to identify what to study next.
- Process described: observer goes into a classroom, remains unobtrusive, observes during typical times, and notes what happens to form a starting point for further study.
- Direct observation can be highly specific to the question at hand (e.g., a teacher describes a behavior problem, and the observer assesses the behavior during the relevant period).
- Key limitations:
- Reactivity: people may change their behavior because they know they’re being watched (observer effect).
- Observer bias: outcomes are filtered through the observer’s own perceptions and interpretations; data can be subjective.
- Difficult to collect large amounts of data objectively; translation from what is observed to notes and analyses can introduce errors.
- Generalizability is limited when focusing on a single case or a small number of individuals.
- Case studies as part of naturalistic observation:
- Useful for rare or unique cases (e.g., Phineas Gage: railroad spike through the brain) to infer potential brain-behavior relationships.
- Limitations: poor generalizability; cannot establish causality; highly contextual.
- Bias risks in naturalistic observation:
- Researchers’ preconceptions can bias what is observed and how it’s interpreted.
- Strong theoretical biases (e.g., looking for evidence to support a theory) can skew conclusions.
- Practical considerations:
- Ethical considerations include consent and privacy; observations in schools require careful handling of student data.
Case Studies: Depth vs Generalization
- Phineas Gage as a classic case study example:
- An individual with a single incident (railroad spike injury) leading to changes in personality and behavior.
- Demonstrates how damage to specific brain regions can be linked to changes in function and temperament.
- Provides deep insight into brain-behavior relationships but cannot establish population-wide causal claims.
- Strengths of case studies:
- Rich, qualitative detail; can reveal mechanisms and variables that might be missed in large-scale studies.
- Limitations of case studies:
- Limited generalizability; cannot infer typical patterns across broader populations; not easily replicated.
Surveys, Ethics, and Practice in Non-Experimental Research
- Non-experimental research includes surveys and observational studies where the researcher does not manipulate variables.
- Ethical considerations in surveys:
- Informed consent and the option to opt out; participation should be voluntary and voluntary withdrawal must be possible.
- Even with consent, participants may alter responses due to social desirability or wanting to present themselves in a favorable light.
- Honesty cannot be guaranteed; respondents may lie or misreport, intentionally or unintentionally.
- Practical issues in survey design:
- Self-report data can be biased by memory, interpretation, and social pressures.
- Sample size impacts reliability and generalizability; larger samples reduce random error but may raise logistical challenges.
- Use cases for non-experimental methods:
- When experiments are unethical or impractical (e.g., exposure to trauma in childhood, abuse effects).
- When the goal is prediction, description, or identifying associations rather than proving causality.
- Trade-offs between experimental and non-experimental approaches:
- Experiments: high internal validity but may lack external validity and ethical feasibility.
- Non-experimental methods: greater ecological validity and feasibility in many topics, but weaker ability to establish causation.
- Correlation coefficient ($r$) measures the strength and direction of a linear relationship between two variables $X$ and $Y$:
- r = rac{ ext{cov}(X,Y) }{ \sigmaX \, \sigmaY }
- Where $ ext{cov}(X,Y)$ is the covariance and $\sigmaX$, $\sigmaY$ are the standard deviations of $X$ and $Y$ respectively.
- Sample sizes mentioned in examples:
- Example sample size: n=100
- Example sample size: n=500
- Conceptual distinctions:
- Correlation does not imply causation; a third variable may influence both variables, or the association may be due to random variation in a small sample.
- Prediction vs explanation: correlations are often valuable for prediction and theory generation, but not definitive evidence of causal mechanisms.
Practical Takeaways for Research Practice
- Treat correlations as starting points for hypothesis generation and for guiding further investigation into causality.
- Always consider potential confounds and alternative explanations when interpreting correlations.
- Use naturalistic observation to gain context-rich insights, but be mindful of reactivity and observer bias; corroborate with other data sources.
- Case studies provide deep, contextual understanding of rare conditions or phenomena, but do not generalize to larger populations.
- Non-experimental methods (surveys, observational studies) are essential when experiments are unethical or impractical; apply appropriate caution in interpreting causality.
- Ethical considerations are central when collecting data from people, especially in educational or trauma contexts; obtain consent, protect privacy, and anticipate response biases.
- When communicating findings, clearly distinguish between correlation, prediction, and causation; avoid overstating causal claims without robust evidence.