Correlation

Overview of Statistics Program

  • The statistics program does not require programming knowledge or learning R.

Correlation and Contingency Tables

  • Correlation and contingency tables may seem different but share insights regarding the relationship between variables.

Key Theme: Correlation vs. Causation

  • Critical Reminder: Correlation does not equal causation.
      - Example: High stork populations correlate with high birth rates, but one does not cause the other.
      - Example: Ice cream consumption relates to more shark attacks, but this is due to both factors being more prevalent in summer.
      - References the amusing paper by Robert Matthews about storks and babies.
  • Important Note: Failures in interpreting correlation can lead to erroneous conclusions. For instance, correlation signals potential relationships but does not imply direct causation.
      - Example during COVID: Increase in Yankee Candle reviews lacking scent correlated with spikes in COVID cases, used as a potential early warning indicator.

Understanding Correlation

  • Correlation examines how data points group around the mean to represent the association between two variables.
      - Positive Correlation: As one variable increases, so does the other.
      - Negative Correlation: As one variable increases, the other decreases.
      - Could be zero, implying no correlation exists.
Example Case Studies
  • Wealth and Democracy: Positive correlation shown between GDP per capita and democracy levels by Gustavo Silicano.
      - Graph trends upward indicating that as levels of democracy increase, wealth (GDP) seems to increase.
  • Income Inequality and Foreign Aid: Negative correlation observed; more income inequality results in lower foreign aid spending.
      - Example analysis of wealthier countries likely spending less on foreign aid as inequality increases.

Correlation Coefficient (Pearson's r)

  • Pearson's r evaluates linear relationships between two variables.
      - Ranges from -1 (perfect negative correlation) to +1 (perfect positive correlation).
        - Positive Values: Variables increase together.
        - Negative Values: One variable increases while the other decreases.
        - Zero: No linear correlation.
  • Calculation methodology overview provided, but focus is on interpretation rather than manual calculation.
      - Any value near +1 or -1 indicates a strong correlation, while values close to zero suggest little to no correlation.
Interpretation Guidelines
  • Correlation thresholds are specific to disciplines, but a general rule includes:
      - |r| > 0.9: Very strong correlation.
      - |0.7 < r < 0.9|: Strong correlation.
      - |0.3 < r < 0.5|: Moderate correlation.
      - |r < 0.3|: Weak or negligible correlation.
Examples in Practice
  • Discussion about estimating r based on visual data clustering is encouraged; the computer outputs this information.
      - Example of estimating r = -0.75 and discussing its negative strong association.
  • Uses statistical significance (p-values) to evaluate if the observed correlation would occur by random chance.
      - P < 0.05 indicates significant correlation.

Contingency Tables and Chi-Square Test

  • Contingency tables assess the relationship between categorical variables, often representing data that is nominal or ordinal rather than interval/ratio.
  • The chi-square test is used to evaluate the independence of two nominal/ordinal variables.
      - Observed values from collected data compared against expected values under the null hypothesis (that the variables are independent).
      - The chi-square statistic is computed as:
    χ2=(ObservedExpected)2Expected\chi^2 = \sum\frac{(Observed - Expected)^2}{Expected}
  • Statistical significance is assessed from the chi-square statistic; p < 0.05 indicates rejection of the null hypothesis, indicating worth exploring a relationship.
Case Study Example
  • Emma Jean Stanley's research on gender inclusivity in rebel groups required contingency tables due to the nominal nature of her data.
      - Example of classifying cases based on whether an ideology was gender-inclusive and if women had leadership opportunities.
      - Chi-square results indicated statistically significant relationships, demonstrating that more inclusive ideologies correlate with greater female participation.

Conclusion on Statistical Approaches

  • Emphasis on avoiding causal assertions from correlation; understanding correlation does not imply causation is a critical takeaway.
  • Aim is to interpret statistical data and use it in conjunction with hypothesis testing.
      - Good statistical literacy includes understanding how contingency tables work and how to interpret results from correlation analysis.