Notes on the Correlation Coefficient and Its Properties
Pearson correlation coefficient (r)
Measures the strength and direction of a linear relationship between two variables (X and Y).
Scales to [-1, +1]; values near ±1 indicate strong linear relationships, near 0 indicate weak or no linear relationship.
If all points lie on a straight line with positive/negative slope, r = +1.0 or -1.0; if no linear relationship, r ≈ 0.0.
The statistic is unitless and invariant to linear transformations of the data (X* = a + bX, Y* = c + dY with b ≠ 0, d ≠ 0).
Symmetric: r{XY} = r{YX}.
Can be undefined if the denominator is zero (e.g., constant X or Y, or perfect horizontal/vertical alignment).
Causation caveat: correlation does not imply causation; r captures association, not causal direction (especially in time-dependent or feedback processes).
Formulas
Conceptual (covariation) formula:
Computational (covariance-based) formula:
Note: these two forms are algebraically equivalent.
Key properties of r
Linearity vs nonlinearity: r measures linear association; non-linear patterns can yield low |r| even if a strong relationship exists.
Invariance under linear scale changes: r is unchanged by linear re-scaling of X and/or Y.
Outliers: magnitude of r is sensitive to outliers; can inflate or deflate the observed relationship.
Range restriction: narrowing the range of X or Y often reduces the observed r.
Levels of analysis: r depends on the unit of analysis (individuals vs groups) and can change with aggregation (ecological validity).
Interpreting magnitude is context-dependent: small r can be meaningful in some contexts; very large r may still be insufficient for strong predictive utility depending on reliability and costs.
r^2 interpretation: proportion of variance in Y explained by X in the linear model.
In population terms: r^2 is the fraction of variance in Y accounted for by the linear relationship with X.
In sample terms: reflects fit of the sample regression line; use with caution.
Examples and context
Example (air-traffic controller data): r ≈ 0.75 indicates a strong positive linear relation between initial test score (X) and post-training performance (Y).
Outliers can drastically change r (e.g., from 0.14 to 0.45 with one extreme point); consider analyses with and without outliers.
Range restriction example: selecting on X can reduce observed r between X and Y because of reduced X variance.
Levels of analysis example: correlations can be high between branch averages but near zero within branches; aggregation changes the observed r (ecological validity).
Interpreting the size of r
Context matters. A tiny r (e.g., r = 0.01) can be meaningful in some scenarios (e.g., survival or high-stakes decisions) and trivial in others.
Even large correlations (e.g., r ≈ 0.90) may be small relative to reliability or practical utility in some contexts (e.g., test-retest reliability or predictive validity with costs/benefits).
Practical interpretation often involves r^2 or utility considerations rather than r alone.
r^2 and alternative interpretations
r^2: proportion of variance in Y associated with variance in X; commonly reported in regression contexts.
Alternative interpretation (direct, not squared): in utility analysis, utility is a function of r (not r^2) and other factors; small r can still yield meaningful utility depending on costs, base rates, and other parameters.
Other correlation coefficients (overview)
There are many related indices; Pearson r is the default, with several important special cases:
Spearman's rho (rank correlation)
Used when data are ordinal or when outliers distort Pearson r.
Formula (based on ranks):
d_i = difference between ranks of X and Y for each pair.
Phi coefficient (2x2 contingency data)
For dichotomous variables arranged in a 2x2 table, phi is the Pearson correlation on dichotomous data.
Formula:
Maximal phi depends on marginals; cannot always reach 1.0 even with strong association.
Point-biserial correlation (continuous X, dichotomous Y)
When Y is 0/1 and X is continuous:
p = n1/n, q = n0/n, where n1 and n0 are group sizes.
Equivalent to a t-test in significance testing of group means.
Biserial correlation (dichotomized continuous X)
When a continuous variable X has been artificially dichotomized, a biserial estimate can recover the underlying r under a normality assumption.
General idea: r{bis} ≈ (\bar X2 - \bar X1) / sX × λ, where λ is the height of the standard normal curve at the dichotomy threshold; depends on group proportions and normality.
Note: biserial r tends to be larger than the corresponding point-biserial r under the normality assumption; sensitive to normality and has larger standard error with unequal group sizes.
Other notes
Dichotomization can reduce the observed correlation; when possible, analyze with the original continuous variables or consider corrections (e.g., tetrachoric for paired dichotomies).
Tetrachoric correlation (not detailed here) estimates the correlation between two underlying continuous variables from a 2x2 table but relies on normality and is sensitive to sample size and nonnormality.
Practical cautions and concepts
Causation vs correlation: r cannot imply causation; observed relationships may be time-lagged, bidirectional, or due to a third variable.
Dynamic/feedback models: simple r may miss causal loops or time-dependent effects; advanced methods (e.g., two-stage least squares, LISREL) may be required for such structures.
Nonlinearity: r may understate the strength of a nonlinear relationship; consider scatterplots and nonlinear models when r is small but a clear pattern exists.
Scale transformations: r is robust to linear scaling; standardizing to z-scores does not change r.
Outliers and robustness: consider robust alternatives or with/without-outliers analyses to assess stability of the relationship.
Worked example (brief)
Data example: X = test score, Y = performance; a large positive r (e.g., r ≈ 0.75) indicates good predictive potential for screening.
Calculation details (summary):
Compute sums: (\sum Xi), (\sum Yi), (\sum Xi^2), (\sum Yi^2), (\sum Xi Yi).
Use the computational formula to obtain r from these sums.
Quick references from the chapter
r is the Pearson product-moment correlation coefficient.
Other coefficients discussed include Spearman's rho, phi, point-biserial, and biserial correlations, with notes on when to use each.
Key cautions include: interpretation depends on context, level of analysis matters, and outliers/range restriction can substantially affect r.