Notes on Analyzing Findings (Correlation, Causation, and Experimental Methods)

Correlation and Correlational Research

Correlation means there is a relationship between two or more variables (e.g., ice cream consumption and crime).
Correlation does not necessarily imply causation.
When two variables are correlated, as one changes, the other tends to change as well.
We measure correlation with a statistic called the correlation coefficient.
The correlation coefficient is a number from
$-1 \le r \le +1$
that indicates strength and direction of the relationship; it is usually represented by the letter r.
The sign of r indicates the direction:
- Positive correlation: variables move in the same direction (as one increases, the other increases; as one decreases, the other decreases).
- Negative correlation: variables move in opposite directions (as one increases, the other decreases).
The magnitude (how close r is to ±1) indicates strength:
- Closer to ±1: stronger relationship and more predictable changes in one variable as the other changes.
- Closer to 0: weaker relationship and less predictability.
Example strengths:
- A correlation of $r=0.9$ indicates a much stronger relationship than $r=0.3$ .
If the variables are not related, the correlation coefficient is 0: $r = 0$ .
Real-world example: ice cream and crime can be positively correlated due to a confounding variable (e.g., temperature).
Scatterplots illustrate correlations; stronger correlations have data points closer to a straight line.
Figure reference: Scatterplots examples show (a) positive, (b) negative, (c) no correlation.

Correlation Does Not Indicate Causation

Correlational research reveals strength and direction of relationships but does not establish cause and effect.
A correlation can occur because one variable causes the other, but it can also be due to a confounding variable.
Confounding variable example: Temperature could cause both higher ice cream sales and higher crime rates.
Causation requires experimental manipulation to rule out alternative explanations.
Causal claims from correlations are common in advertisements and news, but often unjustified (e.g., cereal consumption and healthier weight).
To avoid misleading conclusions, scientists seek experimental evidence to support causal inferences.

Illusory Correlations and Confirmation Bias

Illusory correlations: perceived relationships between two things where none exist.
Classic example: moon phases and human behavior; meta-analysis suggests no relationship between the lunar cycle and behavior.
Why people fall for illusory correlations:
- Confirmation bias: seek information that confirms hunches while ignoring disconfirming data.
- Availability heuristic: rely on easily recalled information.
Illusory correlations can contribute to prejudicial attitudes and discriminatory behavior.

From Correlation to Causation: Experimental Methods

Experiments are designed to establish cause-and-effect relationships.
The Experimental Hypothesis: a precise hypothesis tested via an experiment, derived from observations or prior research.
Basic experimental design involves two groups: Experimental vs. Control.
- Experimental group receives the manipulation (treatment).
- Control group does not receive the manipulation.
- Differences between groups are attributed to the manipulation, assuming other variables are controlled.
Example framework: observing aggression in children after exposure to aggressive models (Bandura, 1961 Bobo doll study).
Operational definitions: precise, measurable definitions of abstract variables (e.g., aggression).
- Rationale: different researchers can replicate the study with the same definitions.
- Example: aggression could be defined as physical/verbal acts that harm objects or people, such as kicking the doll, throwing it, or saying, “stupid doll.”
Importance of clear operational definitions for replication and interpretation.

Experimental Design Details

Experimental manipulation: the treatment or variable being tested (e.g., exposure to aggressive modeling).
Control group should differ from the experimental group only in the manipulation, to rule out other factors.
Random assignment: every participant has an equal chance of being assigned to either group.
Random sampling vs. random assignment:
- Random sampling: selecting a representative subset from the population (for generalizability).
- Random assignment: allocating participants to groups to balance preexisting differences.
Sampling considerations:
- Populations are often large; samples (e.g., ~200 children) are used to generalize findings.
- Representative samples reflect the population in sex, ethnicity, SES, etc.
Random assignment and matching:
- Even with random assignment, groups can differ on important variables.
- Matching pairs participants on a variable of interest (e.g., baseline aggression) to balance groups.
- Bobo doll study used baseline aggression to ensure equivalence.
Random assignment, monitoring variables, and matching support the assumption that observed differences are due to the manipulation.
The role of the control condition is to isolate the effect of the independent variable.
Placebo and blind designs to control for expectancy effects:
- Placebo control: participants receive an inert treatment; the only difference is the treatment content.
- Single-blind study: participants unaware of group assignment; researchers are aware.
- Double-blind study: both participants and researchers are unaware of group assignments, reducing experimenter and participant expectancy effects.
- Placebo example: under placebo conditions, expectations can influence outcomes (placebo effect).

Variables and Measurements

Independent variable (IV): the manipulated variable believed to cause changes in the dependent variable (DV).
Dependent variable (DV): the measured outcome of interest.
Guiding question: What effect does the IV have on the DV?
In the aggression imitation example: IV = type of observed behavior (aggressive vs. non-aggressive); DV = number of imitated aggressive behaviors.
Operationalization allows precise measurements and facilitates replication.
Some variables are difficult to measure directly (e.g., helpfulness, kindness) and require workable operational definitions.

Sampling, Generalization, and Randomization

Populations vs. samples:
- Population: all individuals of interest.
- Sample: a subset of the population used in the study.
Random sampling: every member of the population has an equal chance of selection; aims to produce a representative sample.
Representativeness ensures that percentages of characteristics in the sample resemble those in the population and that differences between groups are balanced.
Population example: all preschool-aged children in a city.
Practical sampling approach: select a random sample from local preschools (e.g., around 200 children) to participate.
Sampling bias risk: using a sample from a wealthy university nursery school could bias results.
Random assignment is used after sampling to form experimental and control groups.

Reliability, Validity, and Measurement Quality

Reliability: consistency of measurement across time, raters, items, or contexts.
- Inter-rater reliability: agreement between observers.
- Internal consistency: correlation among items measuring the same construct.
- Test-retest reliability: stability of measurements over time.
- Note: High reliability does not guarantee validity.
Validity: accuracy of what a measure intends to assess.
- Ecological validity: generalizability to real-world contexts.
- Construct validity: whether the measure captures the intended construct.
- Face validity: whether the measure appears to assess what it should, on the surface.
Relationship between reliability and validity:
- A measure can be reliable but not valid.
- A valid measure is not automatically reliable, though valid measures are typically reliable.
Example related to the Bobo doll study: sex categorization and measurement challenges.

Sex, Gender, and Measurement Considerations

Biological sex vs. gender: measurement and categorization require careful definition.
Traditional methods of determining sex:
- Visual assessment (appearance) — can have low construct validity and ecological validity due to diversity in bodies (including intersex, transgender, non-binary individuals).
- Medical records or birth certificates — may not reflect biological reality for research purposes.
- Self-report — depends on categories provided and participant willingness; may mismatch biological markers.
Complexity of biological sex:
- Determinants include internal gonads, predominant hormones, chromosomal DNA (e.g., XX, XY).
- Chromosomal sex may not always align with gonadal, hormonal, or genital sex.
- Intersex conditions exist and may involve mosaic genetics or atypical anatomy.
Importance: sex should be operationally defined for research, and data collected accordingly.
In the Bobo doll study, sex categorization was likely via visual assessment or parental report, which today is recognized as potentially flawed and lacking in ecological validity.
Exploration of tools and resources for sex and gender considerations in research (UBC toolkit cited).

The Bobo Doll Study: Operationalization and Measurement Details

The Bobo doll study examined whether children imitate aggressive behavior after observing adults.
Operational definitions of aggression were critical to interpretation and replication.
In replication discussions, differences in researchers, participants, and locations can affect outcomes; replication across diverse samples strengthens causal claims.

Ethical Considerations and Limitations

Ethical constraints limit certain experimental manipulations (e.g., exposing participants to abuse).
Quasi-experimental designs: used when random assignment is unethical or impractical; causality claims are more limited.
Ethical safeguards in experiments include minimizing harm, informed consent, and debriefing where appropriate.

Analyzing Experimental Findings and Statistics

After data collection, a statistical analysis determines whether observed differences are likely due to chance.
Statistical significance is commonly set at a threshold (e.g., less than 5% chance of observed differences if groups were the same):
- p < 0.05
With random assignment, random sampling, and controlled procedures, researchers aim to claim a causal effect of the IV on the DV.
Significance supports causal inference when design controls for confounds and bias; non-significant results caution against strong conclusions.

Reporting and Validity in Research Dissemination

Scientists publish in peer-reviewed journals under APA guidelines (peer review provides quality control).
Peer reviewers assess rationale, methods, ethics, and statistical analyses, and check for over-interpretation of findings.
Replication as a core scientific practice:
- Replication tests reliability and generalizability of findings.
- A replication crisis has raised concerns about reproducibility across psychology and other social sciences.
- Studies show that a substantial portion of published results may not replicate; emphasis on preregistration and large-scale, multi-lab collaborations.
The Psychological Science Accelerator is an example of a collaborative approach to improve replication and generalizability by preregistering studies and collecting data across multiple labs.
Preregistered studies and multi-lab collaboration reduce questions of selective reporting and publication bias.

Real-World Examples and Case Studies

The Vaccine-Autism Debate:
- Early publications suggested a link between vaccines and autism, followed by large-scale epidemiological studies showing no causal link.
- Several original studies were retracted due to issues like conflict of interest and data problems.
- Public health consequences included outbreaks (e.g., measles outbreaks in 2019).
- Lesson: importance of robust methods, transparency, and retractions when warranted.
Other topics mentioned: cereal consumption and healthier weight findings with caveated interpretations; media reporting can misrepresent correlations as causations.

Practical Concepts and Takeaways

Always distinguish correlation from causation; use experiments to test causal claims.
Be wary of confounding variables that can produce spurious associations.
Consider illusory correlations and confirmation bias when evaluating data.
Understand and implement reliability and validity to ensure data quality.
Use proper operational definitions to enable replication and interpretation.
Random sampling and random assignment are essential for generalizability and causal inference.
Use ethical guidelines and consider quasi-experimental designs when random assignment is not possible.
Recognize limitations of evidence and avoid overgeneralizing findings beyond the study design.
Embrace open science initiatives ( preregistration, multi-lab replication ) to improve reliability of findings.

Key Formulas and Statistical References

Correlation coefficient range and interpretation:
- $r \in [-1, +1]$ , with
- stronger when $|r|$ is near 1, weaker when $|r|$ near 0.
Statistical significance threshold (common):
- p < 0.05 (5% chance or less that the observed difference is due to random variation).
Example value mentioned:
- Negative correlation: $r = -0.29$ (weak negative correlation between sleep duration and GPA in a specific study).

Summary of Core Concepts

Correlation shows association, not causation.
Direction and strength of association are captured by the correlation coefficient $r\in[-1,1]$ .
Positive vs negative correlations describe the direction; magnitude describes strength.
Scatterplots visually display correlations; stronger correlations align more closely to a line.
Causality requires controlled experiments to rule out confounds; correlation alone is insufficient.
Illusory correlations and confirmation bias can lead to erroneous causal inferences.
Experimental design uses independent and dependent variables, random assignment, and control groups to infer causality.
Operational definitions are essential for clarity and replication.
Reliability and validity determine the quality of measurements; validity includes ecological, construct, and face validity.
Sampling and generalization depend on random sampling and representative samples; random assignment and matching control for group differences.
Ethical constraints may necessitate quasi-experimental designs; replication and open science practices improve reliability.
Communication of findings through peer-reviewed publication relies on rigorous review and, ideally, replication across diverse samples.
Sex, gender, and biological sex considerations require careful measurement and awareness of limitations in categorization and analysis.

Quick Reference Tips

Before claiming causation, ask: Was there random assignment? Were groups equivalent at baseline? Was there control for confounds? Is there a plausible mechanism?
Check whether the study distinguishes correlation from causality in the conclusions.
Evaluate the measurement tools for reliability and validity; consider how operational definitions influence results and replication.
Be mindful of ethical constraints and the potential for bias in data collection and interpretation.
In discussions of real-world data, distinguish descriptive correlations from predictive relationships and causal inferences.