Notes on Testing Correlations for Statistical Significance
Testing a Single Correlation Against Zero
Notation: We denote the population correlation coefficient as p (rho), the sample correlation coefficient as r, and the sample size as n.
Hypotheses: The null hypothesis (H0) states that there is no linear relationship in the population, i.e., p=0. The alternative hypothesis (H1) typically states that there is a linear relationship (p
eq 0 for a two-sided test), or a specific direction (p > 0 or p < 0 for a one-sided test).Test statistic: This test uses a t-distribution. The formula for the t-statistic is:
t = \frac{r\sqrt{n-2}}{\sqrt{1-r^2}} \text{ with df } (n-2)
Decision Rule: The observed t-value is compared to critical values from Student's t-distribution. We reject the null hypothesis (H_0: p=0) if the absolute value of the observed t (|t|) exceeds the critical t-value for a specified significance level (e.g., \alpha = 0.05) and degrees of freedom.
Degrees of freedom (df): The degrees of freedom for this test are n-2. This value reflects the number of independent pieces of information remaining after estimating the two parameters (the intercept and slope) involved in a simple linear regression model, which is implicitly connected to correlation.
Example (SAT validity): Consider a study with n = 122 students, where the sample correlation between SAT scores and first-year college GPA is r = 0.27. Using the formula:
t = \frac{0.27\sqrt{122-2}}{\sqrt{1-0.27^2}} = \frac{0.27\sqrt{120}}{\sqrt{1-0.0729}} = \frac{0.27 \times 10.954}{\sqrt{0.9271}} \approx \frac{2.957}{0.963} \approx 3.07
With df = 120, the critical t-value for a two-sided test at \alpha = 0.05 is approximately 1.980. Since |3.07| > 1.980, we reject H_0, indicating that the correlation is statistically significant (p < 0.05).Insight: Testing a sample correlation r against a population correlation of zero is entirely analogous to performing a significance test on the slope coefficient in a simple linear regression model where one variable predicts the other.
Testing a Single Correlation Versus Any Specified Value (Fisher's z)
Purpose: This test is used to compare an observed sample correlation (r) to a specific, non-zero population correlation (p0), i.e., to test H0: p = p0 where p0 \neq 0 and -1 < p_0 < 1. The standard t-test for correlation is only suitable for testing against p=0.
Fisher’s z transformation: The distribution of r is skewed, especially as p approaches +1 or -1. Fisher's z transformation normalizes this skewed distribution, making it suitable for standard hypothesis testing procedures. The transformation is:
zr = \frac{1}{2} \ln\left(\frac{1+r}{1-r}\right) = \operatorname{artanh}(r) \,, \quad z{p0} = \frac{1}{2} \ln\left(\frac{1+p0}{1-p0}\right) = \operatorname{artanh}(p0)
where \ln is the natural logarithm and \operatorname{artanh} is the inverse hyperbolic tangent.
Sampling distribution: Under the null hypothesis (H0: p=p0), the sampling distribution of zr is approximately normal with a mean of z{p0} and a variance of \frac{1}{n-3}. Thus, zr \sim N(z{p0}, \frac{1}{n-3}).
Test statistic: The Z-statistic for comparing r to a specified p_0 is given by:
Z = \frac{zr - z{p_0}}{\sqrt{\frac{1}{n-3}}}
This Z-statistic follows approximately a standard normal distribution (N(0,1)).
Example (cost assessment center): Suppose we have a sample of n=165 observations, with an observed correlation r=0.54. We want to test if this correlation is significantly different from a hypothesized population correlation of p_0=0.40.
First, transform r and p0 to their z equivalents: zr = \frac{1}{2} \ln\left(\frac{1+0.54}{1-0.54}\right) = \frac{1}{2} \ln\left(\frac{1.54}{0.46}\right) \approx 0.604
z_{0.40} = \frac{1}{2} \ln\left(\frac{1+0.40}{1-0.40}\right) = \frac{1}{2} \ln\left(\frac{1.40}{0.60}\right) \approx 0.424Next, calculate the standard error of zr: \text{SE}{z_r} = \sqrt{\frac{1}{n-3}} = \sqrt{\frac{1}{165-3}} = \sqrt{\frac{1}{162}} \approx 0.0786
Finally, calculate the Z-statistic:
Z = \frac{0.604 - 0.424}{0.0786} = \frac{0.180}{0.0786} \approx 2.29For a one-sided test (e.g., p > 0.40) at \alpha = 0.05, the critical Z-value is 1.645. Since 2.29 > 1.645, the result is significant (p \approx 0.01 for a one-sided test). A two-sided test would involve comparing to \pm 1.96, which would also be significant.
AF facts: These are key characteristics of the Fisher's z transformation:
AF1: The sampling distribution of z_r is approximately normal, even for moderate sample sizes (n \ge 10 to 20 generally provides a good approximation).
AF2: The standard deviation of z_r (also known as the standard error) is approximately \sqrt{1/(n-3)}. This formula highlights that the precision of the estimate increases with sample size.
Note on averaging correlations: The Fisher's z-transformation is crucial for averaging multiple correlation coefficients, as it transforms the skewed distribution of r values into a more symmetrical distribution, reducing bias in the average.
Averaging Correlations (using Fisher's z)
Rationale: When you need to combine correlation coefficients from several independent samples (e.g., in meta-analysis), directly averaging r values is inappropriate due to the skewness of their distribution. Transforming them to z values first, then averaging, and finally transforming back to r provides a more accurate estimate of the overall population correlation.
Procedure: Convert each individual correlation (ri) to its corresponding Fisher's z value (zi). Then calculate a weighted average of these z_i values.
Weighted average: The weights (wi) are typically set to ni - 3, corresponding to the inverse of the variance of each z_i. The formula for the weighted average Z is:
\bar{Z} = \frac{\sumi (ni - 3) zi}{\sumi (n_i - 3)} \,, \quad \hat{r} = \tanh(\bar{Z})
where anh is the hyperbolic tangent function, which is the inverse of \operatorname{artanh}, used to transform the averaged Z back to an r value.
Example (three correlations): Suppose we have three studies with their respective correlations and sample sizes:
Study 1: n1=41, r1=0.792 \Rightarrow z_1 = \operatorname{artanh}(0.792) \approx 1.071
Study 2: n2=34, r2=0.792 \Rightarrow z_2 = \operatorname{artanh}(0.792) \approx 1.071
Study 3: n3=17, r3=0.440 \Rightarrow z_3 = \operatorname{artanh}(0.440) \approx 0.472
Calculate the weighted average of the z values:
\bar{Z} = \frac{(41-3) \times 1.071 + (34-3) \times 1.071 + (17-3) \times 0.472}{(41-3) + (34-3) + (17-3)}
\bar{Z} = \frac{38 \times 1.071 + 31 \times 1.071 + 14 \times 0.472}{38 + 31 + 14} = \frac{40.70 + 33.20 + 6.61}{83} = \frac{80.51}{83} \approx 0.970
(Note: original note had \bar{Z} \approx 0.953, likely due to rounding or slight calculation difference. Using the example values provided, this is a recalculated value.)
Finally, transform back to r:
\hat{r} = \tanh(0.970) \approx 0.749
This \hat{r} is the estimated average correlation across the three studies.
Averaging Across Studies (between different samples)
Method: In meta-analysis or validity generalization studies, the goal is often to obtain a single, more precise estimate of a population correlation from multiple studies. This is achieved by combining correlations using Fisher's z transformation, weighting each transformed correlation by the inverse of its variance. Since the variance of zi is approximately \frac{1}{ni-3}, the weights used are typically wi = ni - 3. This effectively gives more weight to studies with larger sample sizes, which provide more precise estimates.
Result: The z-weighted average, when transformed back to r, provides a sample-size-weighted average correlation. This method is fundamental for reporting combined evidence for relationships across different research settings.
Testing the Equality of Two Independent Correlations
Scenario: This test is used when you have two correlation coefficients (r1 and r2) from two independent samples (e.g., correlations measured in two distinct groups of people).
Hypotheses: The null hypothesis (H0) is that the true population correlations are equal (p1 = p2). The alternative hypothesis (H1) is that they are not equal (p1 \neq p2 for a two-sided test) or one is greater than the other (for one-sided tests).
Procedure: Transform both sample correlations (r1, r2) into Fisher's z values (z1, z2).
Test statistic: The Z-statistic for comparing two independent correlations is:
Z = \frac{z1 - z2}{\sqrt{\frac{1}{n1-3} + \frac{1}{n2-3}}} \,, \quad zi = \operatorname{artanh}(ri)
where n1 and n2 are the sample sizes of the two groups. This Z-statistic is approximately standard normal under H_0.
Decision: Compare the calculated Z to critical values from the standard normal distribution (e.g., \pm 1.96 for a two-sided test at \alpha = 0.05). If |Z| exceeds the critical value, reject H_0. This implies a statistically significant difference between the two population correlations.
Example (two groups): If r1 = 0.05 and r2 = 0.51, both with n \approx 65. Let's calculate:
z1 = \operatorname{artanh}(0.05) \approx 0.0500 \ z2 = \operatorname{artanh}(0.51) \approx 0.563 \
\sqrt{\frac{1}{65-3} + \frac{1}{65-3}} = \sqrt{\frac{2}{62}} \approx \sqrt{0.0322} \approx 0.179
Z = \frac{0.0500 - 0.563}{0.179} \approx \frac{-0.513}{0.179} \approx -2.866
Since |-2.866| > 1.96 (for two-sided \alpha = 0.05), the difference between these two correlations is highly significant.
Testing the Equality of Two Dependent Correlations (Hotelling–Williams Test)
Scenario: This test is used when two correlations are dependent. This occurs in two common situations:
Correlations sharing a common variable: For example, comparing the correlation between variable X and Y (r{xy}) with the correlation between variable Z and Y (r{zy}), where X, Y, and Z are all measured in the same sample.
Correlations from the same sample: For instance, comparing the correlation between Math score and English score (r{ME}) with the correlation between Math score and Science score (r{MS}) from the same group of students.
Null Hypothesis: H0: p{yx} = p_{yz} when variables X, Y, and Z are all measured within the same sample. This tests if variable Y is equally correlated with X and Z.
Test framework: The Hotelling–Williams test is a specific t-test designed for this scenario. It takes into account the inter-correlation (r{xz}) between the two predictor variables (X and Z) in addition to the two correlations being compared (r{yx} and r_{yz}).
Formula for the t-statistic (simplified representation): The exact formula is complex but results in a t-value with df = n-3. It involves the three correlation coefficients (r{yx}, r{yz}, r_{xz}):
t = (r{yx} - r{yz}) \sqrt{\frac{(n-1)(1+r{xz})}{2(1-r{yx}^2-r{yz}^2-r{xz}^2+2r{yx}r{yz}r_{xz})}}
(Note: This is a common form for comparing r{xy} and r{zy}. Other formulas exist for different types of dependent correlations, e.g., comparing r{xy} and r{xw}.)
Example: In a study by Fryxell and Gordon (1989), with n=1518 participants, they observed three correlations: r{yx}=0.60, r{yz}=0.55, and r_{xz}=0.59.
The difference in correlations is r{yx} - r{yz} = 0.60 - 0.55 = 0.05. Although this difference appears small, with a large sample size, it can be statistically significant.
The Hotelling–Williams test yielded a t-statistic of 2.778 with df = 1518 - 3 = 1515.
Given this t-value and large degrees of freedom, the result is statistically significant (p < .01). This means that r{yx} is significantly different from r{yz}.
Interpretation: This suggests that one variable (e.g., X) is a significantly better predictor or is more strongly related to Y than the other variable (Z), even when accounting for the relationship between X and Z. This test is crucial for understanding the relative importance of predictive variables within the same dataset.
Important notes:
The exact standard error for the difference between dependent correlations is more complex, incorporating the inter-correlations and potentially the determinant of the correlation matrix.
While the approach remains a t-test, the dependence among correlations generally reduces the power of the test compared to comparing independent correlations, as shared variance among variables limits the