Notes on Correlation and Causation
Correlation vs. Causation
- Smoking and lung cancer correlation observed but does not confirm causation.
- Statistical probability:
- Increased risk of lung cancer with smoking.
- Not guaranteed to get cancer from a single cigarette or heavy smoking.
Defining Correlation
- Correlation: degree of association between two variables (e.g., smoking and lung cancer).
- Examples of correlated variables:
- Height and weight
- Age and TV-watching
Causation Requirements
- Temporal Order: cause precedes effect.
- Consistent Association: reliable link exists between cause and effect.
- Elimination of Plausible Alternatives: best explanation for the relationship.
Visual Representation
- Scatterplots show the relationship between two variables.
- Dots represent data points, indicating strength and direction of correlation.
Types of Correlation
- Positive Correlation: as one variable increases, so does the other.
- Negative Correlation: as one variable increases, the other decreases.
- Perfect Correlation: correlation coefficients of +1 or -1 represent perfect associations.
Strength of Correlation
- Correlation Coefficient (r):
- Measured between -1 and 1.
- Values indicate strength of correlation.
- Close to +1 = strong positive
- Close to -1 = strong negative
- Close to 0 = no correlation
Calculating Pearson’s Correlation Coefficient
- Formula:
- Interpretation of results:
- e.g., r = 0.242 indicates a low positive correlation.
Outlier Impact
- Correlation sensitive to outliers; careful removal needed.
- Always consider underlying causes of observed correlation.
Cautions
- Correlation does not imply causation.
- Coincidental relationships or influence from a common cause may exist.