Correlation Coefficients: Appropriate Use and Interpretation Notes

Overview of Correlation Analysis

Introduction to Correlation

  • Researchers frequently aim to study associations between two observed variables and estimate the strength of this relationship.

    • Example Studies:

    • Nishimura et al. assessed the relationship between volume of infused crystalloid fluid and interstitial fluid leakage during surgery.

    • Kim et al. studied the association between opioid growth factor receptor (OGFR) expression and cell proliferation in cancer cells.

    • Research objectives can be quantitatively addressed using correlation analysis, providing insights into both the strength and direction of relationships.

    • Example of Direction: An increase in OGFR expression may correlate with an increase or decrease in cell proliferation.

Focus of Tutorial

  • This tutorial centers around the two most widely used correlation coefficients in medical research:

    • Pearson Correlation Coefficient

    • Spearman Correlation Coefficient

  • Emphasis on proper usage, interpretation, and common misunderstandings associated with these coefficients.

Pearson Product-Moment Correlation

Definition of Correlation

  • Correlation measures the monotonic association between two variables.

  • A monotonic relationship means that as one variable increases:

    • The other variable also increases (positive correlation), or

    • The other variable decreases (negative correlation).

  • This implies that changes in one variable correlate with changes in another variable in either direction.

Understanding Pearson Correlation

  • The Pearson product-moment correlation (abbreviated as r) is typically used for linear relationships between two continuous, random variables.

  • The mathematical description involves covariance:

    • Covariance measures how two variables vary together (unlike variance, which measures variability of a single variable).

    • Covariance is influenced by the measurement scale, complicating interpretation.

  • The Pearson correlation coefficient is dimensionless, ranging from -1 to +1:

    • r = 0: No linear relationship.

    • r = +1: Perfect positive correlation; all points lie on a straight line.

    • r = -1: Perfect negative correlation; all points lie on a straight line.

  • Example Illustration:

    • Figure 1 shows scatterplots of simulated data illustrating varying Pearson correlation coefficients.

    • As absolute value of r approaches 1 (either direction), the data points align more closely to a straight line, indicating a stronger correlation

Assumptions of Pearson Correlation

  • Key assumptions necessary for valid inference about strength of association:

    1. Data must be derived from a random sample suitable for the population of interest.

    2. Both variables are continuous and jointly normally distributed (bivariate normal distribution).

    • Both typically assessed via diagnostic methods for normality.

    • There must be a linear relationship between jointly normally distributed data.

    1. No relevant outliers present; extreme outliers may distort results.

    2. Each x-y value pair measured independently; repeated measures complicate interpretation.

Dealing with Assumptions Violations

  • Possible adjustments include:

    • Transforming variables to achieve normal distribution.

    • Using Spearman correlation for non-normally distributed or ordinal data.

Interpretation of Correlation Coefficient

  • Approaches to interpreting the coefficient:

    • Cut-offs for descriptors e.g.

    • <0.1: Negligible correlation

    • 0.1 – 0.39: Weak correlation

    • 0.40 – 0.69: Moderate correlation

    • 0.70 – 0.89: Strong correlation

    • 0.90 – 1.00: Very strong correlation

  • These cutoffs are arbitrary and should be contextualized within the specific research question being investigated.

  • Consideration of the range assessed matters, as wider ranges may display higher correlations.

  • Observed coefficient accompanied by a confidence interval to indicate plausibility in the population, indicating uncertainty in estimate; for example, Nishimura et al. reported an r of 0.42 with a 95% confidence interval of (0.03, 0.70).

Statistical Significance

  • Statistical significance is tested via hypothesis testing (e.g., t-test for r=0).

  • Be wary: statistical significance does not equate to clinical significance, particularly in large datasets.

Coefficient of Determination

  • The squared correlation coefficient (), termed the coefficient of determination, indicates the variance accounted for in one variable by the other.

  • Example: Assuming r = 0.42 from Nishimura et al., then R² = 0.1764 or ~18%. This suggests that 18% of variability in fluid leakage is explained by fluid volume, implying other factors also significantly influence it.

Linear Regression vs Pearson Correlation

Distinct Purposes

  • Pearson correlation and linear regression serve different research objectives despite their mathematical relationship:

    • Pearson Correlation: Measures strength of linear relationship without providing estimative capabilities.

    • Linear Regression: Estimates y values from x values, employing an independent variable (x) to predict dependent variable (y).

  • Contextual use cases matter; correlation for observational studies, regression typically in experimental setups with set x values.

Spearman Rank Correlation

Overview

  • Spearman rank correlation (denoted as ρ or rs) is based on ranks rather than actual values of observations, allowing measurement of monotonic relationships without the requirement for normal distribution.

  • Useful for ordinal or non-normally distributed continuous data,

  • Also robust against outliers due to its ranking method, providing varying interpretations within same range as Pearson.

Limitations and Potential Misinterpretations

  • Observed correlations do not confirm causation; correlation implies only association not one variable's influence on another.

  • The misconception arises where correlations depict only simple associations rather than complex multidimensional relationships, e.g., ice cream sales vs fan sales.

Conclusion

  • Correlation coefficients elucidate strength and direction between variables.

  • Clear understanding of modeling implications, method choice (correlation vs regression), and correct interpretation of statistical outputs is critical in medical research contexts.

  • Employ visual assessments (i.e., scatter plots) as preliminary reviews before analysis.