The statistics program does not require programming knowledge or learning R.
Correlation and Contingency Tables
Correlation and contingency tables may seem different but share insights regarding the relationship between variables.
Key Theme: Correlation vs. Causation
Critical Reminder: Correlation does not equal causation.
- Example: High stork populations correlate with high birth rates, but one does not cause the other.
- Example: Ice cream consumption relates to more shark attacks, but this is due to both factors being more prevalent in summer.
- References the amusing paper by Robert Matthews about storks and babies.
Important Note: Failures in interpreting correlation can lead to erroneous conclusions. For instance, correlation signals potential relationships but does not imply direct causation.
- Example during COVID: Increase in Yankee Candle reviews lacking scent correlated with spikes in COVID cases, used as a potential early warning indicator.
Understanding Correlation
Correlation examines how data points group around the mean to represent the association between two variables.
- Positive Correlation: As one variable increases, so does the other.
- Negative Correlation: As one variable increases, the other decreases.
- Could be zero, implying no correlation exists.
Example Case Studies
Wealth and Democracy: Positive correlation shown between GDP per capita and democracy levels by Gustavo Silicano.
- Graph trends upward indicating that as levels of democracy increase, wealth (GDP) seems to increase.
Income Inequality and Foreign Aid: Negative correlation observed; more income inequality results in lower foreign aid spending.
- Example analysis of wealthier countries likely spending less on foreign aid as inequality increases.
Correlation Coefficient (Pearson's r)
Pearson's r evaluates linear relationships between two variables.
- Ranges from -1 (perfect negative correlation) to +1 (perfect positive correlation).
- Positive Values: Variables increase together.
- Negative Values: One variable increases while the other decreases.
- Zero: No linear correlation.
Calculation methodology overview provided, but focus is on interpretation rather than manual calculation.
- Any value near +1 or -1 indicates a strong correlation, while values close to zero suggest little to no correlation.
Interpretation Guidelines
Correlation thresholds are specific to disciplines, but a general rule includes:
- |r| > 0.9: Very strong correlation.
- |0.7 < r < 0.9|: Strong correlation.
- |0.3 < r < 0.5|: Moderate correlation.
- |r < 0.3|: Weak or negligible correlation.
Examples in Practice
Discussion about estimating r based on visual data clustering is encouraged; the computer outputs this information.
- Example of estimating r = -0.75 and discussing its negative strong association.
Uses statistical significance (p-values) to evaluate if the observed correlation would occur by random chance.
- P < 0.05 indicates significant correlation.
Contingency Tables and Chi-Square Test
Contingency tables assess the relationship between categorical variables, often representing data that is nominal or ordinal rather than interval/ratio.
The chi-square test is used to evaluate the independence of two nominal/ordinal variables.
- Observed values from collected data compared against expected values under the null hypothesis (that the variables are independent).
- The chi-square statistic is computed as: χ2=∑Expected(Observed−Expected)2
Statistical significance is assessed from the chi-square statistic; p < 0.05 indicates rejection of the null hypothesis, indicating worth exploring a relationship.
Case Study Example
Emma Jean Stanley's research on gender inclusivity in rebel groups required contingency tables due to the nominal nature of her data.
- Example of classifying cases based on whether an ideology was gender-inclusive and if women had leadership opportunities.
- Chi-square results indicated statistically significant relationships, demonstrating that more inclusive ideologies correlate with greater female participation.
Conclusion on Statistical Approaches
Emphasis on avoiding causal assertions from correlation; understanding correlation does not imply causation is a critical takeaway.
Aim is to interpret statistical data and use it in conjunction with hypothesis testing.
- Good statistical literacy includes understanding how contingency tables work and how to interpret results from correlation analysis.