Split stats

Chapter 6 - Correlation

6.1 Introduction to Correlation

  • Personal story about receiving a guitar at 8 years old and initial struggles with playing.

  • Introduction of the concept of correlation between variables.

    • Three types of relationships:

      1. Positive Correlation: More practice = Better performance.

      2. No Correlation: Practice does not affect performance.

      3. Negative Correlation: More practice = Worse performance.

6.2 Importance of Graphical Representation

  • Emphasis on visual data exploration (scatter plots) before analysis.

  • Reference to the necessity of reviewing Chapter 4 for instructions on graphical data presentation.

6.3 Measuring Correlation

6.3.1 Understanding Covariance

  • Covariance defines how two variables change together.

  • Variance is the average squared deviation from the mean:

    • Formula for Variance:

      • ![Variance Formula](https://latex.codecogs.com/svg.latex?Variance(s^{2})= rac{1}{N-1} extstyleullet extstyleullet extstyleulletullet extit{s_{i}^{2}})

  • For covariance, we measure how changes in one variable correspond to the changes in another.

    • Covariance formula:

      • ![Covariance Formula](https://latex.codecogs.com/svg.latex?cov(X,Y)= rac{ extstyleulletulletulletulletulletullet}{N-1})

6.3.2 Standardization and Correlation Coefficient

  • To standardize covariance, we derive the Pearson correlation coefficient (r):

    • Formula for Pearson's r:

      • ![Pearson's r](https://latex.codecogs.com/svg.latex?r= rac{cov(X,Y)}{s_{x}s_{y}})

  • Interpretation of r values:

    • Ranges from -1 to +1,

      • +1: Perfect positive correlation

      • -1: Perfect negative correlation

      • 0: No correlation

6.3.3 Significance of Correlation Coefficient

  • Statistical tests to determine if the correlation seen is statistically significant.

    • Discuss the use of z-scores to assess significance.

6.3.4 Confidence Intervals for r

  • Confidence intervals provide a range of plausible values for the population correlation.

6.3.5 Causality Warning

  • Correlation does not imply causation: Two variables can be correlated without one causing the other.

    • Discuss the third-variable problem and direction of causality.

6.4 Data Entry for Correlation Analysis

  • Guidelines on organizing data for correlation and regression analyses (each variable in separate columns).

6.5 Bivariate Correlation

6.5.1 Different Types of Correlation

  • Bivariate correlation: Relationship between two variables.

  • Partial correlation: Studies the relationship while controlling for one or more additional variables.

6.5.2 Packages for Correlation Analysis in R

  • Required packages for correlation analysis: Hmisc, ggplot2, etc.

6.5.3 Conducting Correlation in R

  • Running correlation tests using base R functions cor(), and cor.test().

6.6 Interpretation of Results

  • Provide examples of interpreting outputs from statistical analyses in R.

6.7 Conclusion

  • Summary of the importance of understanding correlation in statistical analysis.