Correlation Coefficients: Appropriate Use and Interpretation Notes

This tutorial centers around the two most widely used correlation coefficients in medical research:
- Pearson Correlation Coefficient
- Spearman Correlation Coefficient
Emphasis on proper usage, interpretation, and common misunderstandings associated with these coefficients.

Correlation measures the monotonic association between two variables.
A monotonic relationship means that as one variable increases:
- The other variable also increases (positive correlation), or
- The other variable decreases (negative correlation).
This implies that changes in one variable correlate with changes in another variable in either direction.

The Pearson product-moment correlation (abbreviated as r) is typically used for linear relationships between two continuous, random variables.
The mathematical description involves covariance:
- Covariance measures how two variables vary together (unlike variance, which measures variability of a single variable).
- Covariance is influenced by the measurement scale, complicating interpretation.
The Pearson correlation coefficient is dimensionless, ranging from -1 to +1:
- r = 0: No linear relationship.
- r = +1: Perfect positive correlation; all points lie on a straight line.
- r = -1: Perfect negative correlation; all points lie on a straight line.
Example Illustration:
- Figure 1 shows scatterplots of simulated data illustrating varying Pearson correlation coefficients.
- As absolute value of r approaches 1 (either direction), the data points align more closely to a straight line, indicating a stronger correlation

Possible adjustments include:
- Transforming variables to achieve normal distribution.
- Using Spearman correlation for non-normally distributed or ordinal data.

Approaches to interpreting the coefficient:
- Cut-offs for descriptors e.g.
- <0.1: Negligible correlation
- 0.1 – 0.39: Weak correlation
- 0.40 – 0.69: Moderate correlation
- 0.70 – 0.89: Strong correlation
- 0.90 – 1.00: Very strong correlation
These cutoffs are arbitrary and should be contextualized within the specific research question being investigated.
Consideration of the range assessed matters, as wider ranges may display higher correlations.
Observed coefficient accompanied by a confidence interval to indicate plausibility in the population, indicating uncertainty in estimate; for example, Nishimura et al. reported an r of 0.42 with a 95% confidence interval of (0.03, 0.70).

Statistical significance is tested via hypothesis testing (e.g., t-test for r=0).
Be wary: statistical significance does not equate to clinical significance, particularly in large datasets.

The squared correlation coefficient (R²), termed the coefficient of determination, indicates the variance accounted for in one variable by the other.
Example: Assuming r = 0.42 from Nishimura et al., then R² = 0.1764 or ~18%. This suggests that 18% of variability in fluid leakage is explained by fluid volume, implying other factors also significantly influence it.

Pearson correlation and linear regression serve different research objectives despite their mathematical relationship:
- Pearson Correlation: Measures strength of linear relationship without providing estimative capabilities.
- Linear Regression: Estimates y values from x values, employing an independent variable (x) to predict dependent variable (y).
Contextual use cases matter; correlation for observational studies, regression typically in experimental setups with set x values.

Spearman rank correlation (denoted as ρ or rs) is based on ranks rather than actual values of observations, allowing measurement of monotonic relationships without the requirement for normal distribution.
Useful for ordinal or non-normally distributed continuous data,
Also robust against outliers due to its ranking method, providing varying interpretations within same range as Pearson.

Observed correlations do not confirm causation; correlation implies only association not one variable's influence on another.
The misconception arises where correlations depict only simple associations rather than complex multidimensional relationships, e.g., ice cream sales vs fan sales.

Correlation coefficients elucidate strength and direction between variables.
Clear understanding of modeling implications, method choice (correlation vs regression), and correct interpretation of statistical outputs is critical in medical research contexts.
Employ visual assessments (i.e., scatter plots) as preliminary reviews before analysis.