Notes on Correlation and Causation

  • Correlation vs. Causation

    • Smoking and lung cancer correlation observed but does not confirm causation.
    • Statistical probability:
    • Increased risk of lung cancer with smoking.
    • Not guaranteed to get cancer from a single cigarette or heavy smoking.
  • Defining Correlation

    • Correlation: degree of association between two variables (e.g., smoking and lung cancer).
    • Examples of correlated variables:
    • Height and weight
    • Age and TV-watching
  • Causation Requirements

    • Temporal Order: cause precedes effect.
    • Consistent Association: reliable link exists between cause and effect.
    • Elimination of Plausible Alternatives: best explanation for the relationship.
  • Visual Representation

    • Scatterplots show the relationship between two variables.
    • Dots represent data points, indicating strength and direction of correlation.
  • Types of Correlation

    • Positive Correlation: as one variable increases, so does the other.
    • Negative Correlation: as one variable increases, the other decreases.
    • Perfect Correlation: correlation coefficients of +1 or -1 represent perfect associations.
  • Strength of Correlation

    • Correlation Coefficient (r):
    • Measured between -1 and 1.
    • Values indicate strength of correlation.
      • Close to +1 = strong positive
      • Close to -1 = strong negative
      • Close to 0 = no correlation
  • Calculating Pearson’s Correlation Coefficient

    • Formula:
    • r=xyxyn(x2(x)2n)(y2(y)2n)r = \frac{ \sum xy - \frac{\sum x \sum y}{n} }{ \sqrt{( \sum x^2 - \frac{(\sum x)^2}{n} )( \sum y^2 - \frac{(\sum y)^2}{n})} }
    • Interpretation of results:
    • e.g., r = 0.242 indicates a low positive correlation.
  • Outlier Impact

    • Correlation sensitive to outliers; careful removal needed.
    • Always consider underlying causes of observed correlation.
  • Cautions

    • Correlation does not imply causation.
    • Coincidental relationships or influence from a common cause may exist.