Correlation Coefficients: Appropriate Use and Interpretation Notes
Overview of Correlation Analysis
Introduction to Correlation
Researchers frequently aim to study associations between two observed variables and estimate the strength of this relationship.
Example Studies:
Nishimura et al. assessed the relationship between volume of infused crystalloid fluid and interstitial fluid leakage during surgery.
Kim et al. studied the association between opioid growth factor receptor (OGFR) expression and cell proliferation in cancer cells.
Research objectives can be quantitatively addressed using correlation analysis, providing insights into both the strength and direction of relationships.
Example of Direction: An increase in OGFR expression may correlate with an increase or decrease in cell proliferation.
Focus of Tutorial
This tutorial centers around the two most widely used correlation coefficients in medical research:
Pearson Correlation Coefficient
Spearman Correlation Coefficient
Emphasis on proper usage, interpretation, and common misunderstandings associated with these coefficients.
Pearson Product-Moment Correlation
Definition of Correlation
Correlation measures the monotonic association between two variables.
A monotonic relationship means that as one variable increases:
The other variable also increases (positive correlation), or
The other variable decreases (negative correlation).
This implies that changes in one variable correlate with changes in another variable in either direction.
Understanding Pearson Correlation
The Pearson product-moment correlation (abbreviated as r) is typically used for linear relationships between two continuous, random variables.
The mathematical description involves covariance:
Covariance measures how two variables vary together (unlike variance, which measures variability of a single variable).
Covariance is influenced by the measurement scale, complicating interpretation.
The Pearson correlation coefficient is dimensionless, ranging from -1 to +1:
r = 0: No linear relationship.
r = +1: Perfect positive correlation; all points lie on a straight line.
r = -1: Perfect negative correlation; all points lie on a straight line.
Example Illustration:
Figure 1 shows scatterplots of simulated data illustrating varying Pearson correlation coefficients.
As absolute value of r approaches 1 (either direction), the data points align more closely to a straight line, indicating a stronger correlation
Assumptions of Pearson Correlation
Key assumptions necessary for valid inference about strength of association:
Data must be derived from a random sample suitable for the population of interest.
Both variables are continuous and jointly normally distributed (bivariate normal distribution).
Both typically assessed via diagnostic methods for normality.
There must be a linear relationship between jointly normally distributed data.
No relevant outliers present; extreme outliers may distort results.
Each x-y value pair measured independently; repeated measures complicate interpretation.
Dealing with Assumptions Violations
Possible adjustments include:
Transforming variables to achieve normal distribution.
Using Spearman correlation for non-normally distributed or ordinal data.
Interpretation of Correlation Coefficient
Approaches to interpreting the coefficient:
Cut-offs for descriptors e.g.
<0.1: Negligible correlation
0.1 – 0.39: Weak correlation
0.40 – 0.69: Moderate correlation
0.70 – 0.89: Strong correlation
0.90 – 1.00: Very strong correlation
These cutoffs are arbitrary and should be contextualized within the specific research question being investigated.
Consideration of the range assessed matters, as wider ranges may display higher correlations.
Observed coefficient accompanied by a confidence interval to indicate plausibility in the population, indicating uncertainty in estimate; for example, Nishimura et al. reported an r of 0.42 with a 95% confidence interval of (0.03, 0.70).
Statistical Significance
Statistical significance is tested via hypothesis testing (e.g., t-test for r=0).
Be wary: statistical significance does not equate to clinical significance, particularly in large datasets.
Coefficient of Determination
The squared correlation coefficient (R²), termed the coefficient of determination, indicates the variance accounted for in one variable by the other.
Example: Assuming r = 0.42 from Nishimura et al., then R² = 0.1764 or ~18%. This suggests that 18% of variability in fluid leakage is explained by fluid volume, implying other factors also significantly influence it.
Linear Regression vs Pearson Correlation
Distinct Purposes
Pearson correlation and linear regression serve different research objectives despite their mathematical relationship:
Pearson Correlation: Measures strength of linear relationship without providing estimative capabilities.
Linear Regression: Estimates y values from x values, employing an independent variable (x) to predict dependent variable (y).
Contextual use cases matter; correlation for observational studies, regression typically in experimental setups with set x values.
Spearman Rank Correlation
Overview
Spearman rank correlation (denoted as ρ or rs) is based on ranks rather than actual values of observations, allowing measurement of monotonic relationships without the requirement for normal distribution.
Useful for ordinal or non-normally distributed continuous data,
Also robust against outliers due to its ranking method, providing varying interpretations within same range as Pearson.
Limitations and Potential Misinterpretations
Observed correlations do not confirm causation; correlation implies only association not one variable's influence on another.
The misconception arises where correlations depict only simple associations rather than complex multidimensional relationships, e.g., ice cream sales vs fan sales.
Conclusion
Correlation coefficients elucidate strength and direction between variables.
Clear understanding of modeling implications, method choice (correlation vs regression), and correct interpretation of statistical outputs is critical in medical research contexts.
Employ visual assessments (i.e., scatter plots) as preliminary reviews before analysis.