Notes on Testing Correlations for Statistical Significance

Testing a Single Correlation Against Zero

Notation: We denote the population correlation coefficient as $p$ (rho), the sample correlation coefficient as $r$ , and the sample size as $n$ .
Hypotheses: The null hypothesis ( $H0$ ) states that there is no linear relationship in the population, i.e., $p=0$ . The alternative hypothesis ( $H1$ ) typically states that there is a linear relationship ( $p eq 0$ for a two-sided test), or a specific direction (p > 0 or p < 0 for a one-sided test).
Test statistic: This test uses a $t$ -distribution. The formula for the $t$ -statistic is:
$t = \frac{r\sqrt{n-2}}{\sqrt{1-r^2}} \text{ with df } (n-2)$
Decision Rule: The observed $t$ -value is compared to critical values from Student's $t$ -distribution. We reject the null hypothesis ( $H_0: p=0$ ) if the absolute value of the observed $t$ ( $|t|$ ) exceeds the critical $t$ -value for a specified significance level (e.g., $\alpha = 0.05$ ) and degrees of freedom.
Degrees of freedom (df): The degrees of freedom for this test are $n-2$ . This value reflects the number of independent pieces of information remaining after estimating the two parameters (the intercept and slope) involved in a simple linear regression model, which is implicitly connected to correlation.
Example (SAT validity): Consider a study with $n = 122$ students, where the sample correlation between SAT scores and first-year college GPA is $r = 0.27$ . Using the formula:
$t = \frac{0.27\sqrt{122-2}}{\sqrt{1-0.27^2}} = \frac{0.27\sqrt{120}}{\sqrt{1-0.0729}} = \frac{0.27 \times 10.954}{\sqrt{0.9271}} \approx \frac{2.957}{0.963} \approx 3.07$
With $df = 120$ , the critical $t$ -value for a two-sided test at $\alpha = 0.05$ is approximately $1.980$ . Since |3.07| > 1.980, we reject $H_0$ , indicating that the correlation is statistically significant (p < 0.05).
Insight: Testing a sample correlation $r$ against a population correlation of zero is entirely analogous to performing a significance test on the slope coefficient in a simple linear regression model where one variable predicts the other.

Testing a Single Correlation Versus Any Specified Value (Fisher's $z$ )

Purpose: This test is used to compare an observed sample correlation ( $r$ ) to a specific, non-zero population correlation ( $p0$ ), i.e., to test $H0: p = p0$ where $p0 \neq 0$ and -1 < p_0 < 1. The standard $t$ -test for correlation is only suitable for testing against $p=0$ .
Fisher’s $z$ transformation: The distribution of $r$ is skewed, especially as $p$ approaches $+1$ or $-1$ . Fisher's $z$ transformation normalizes this skewed distribution, making it suitable for standard hypothesis testing procedures. The transformation is:
$zr = \frac{1}{2} \ln\left(\frac{1+r}{1-r}\right) = \operatorname{artanh}(r) \,, \quad z{p0} = \frac{1}{2} \ln\left(\frac{1+p0}{1-p0}\right) = \operatorname{artanh}(p0)$
where $\ln$ is the natural logarithm and $\operatorname{artanh}$ is the inverse hyperbolic tangent.
Sampling distribution: Under the null hypothesis ( $H0: p=p0$ ), the sampling distribution of $zr$ is approximately normal with a mean of $z{p0}$ and a variance of $\frac{1}{n-3}$ . Thus, $zr \sim N(z{p0}, \frac{1}{n-3})$ .
Test statistic: The $Z$ -statistic for comparing $r$ to a specified $p_0$ is given by:
$Z = \frac{zr - z{p_0}}{\sqrt{\frac{1}{n-3}}}$
This $Z$ -statistic follows approximately a standard normal distribution ( $N(0,1)$ ).
Example (cost assessment center): Suppose we have a sample of $n=165$ observations, with an observed correlation $r=0.54$ . We want to test if this correlation is significantly different from a hypothesized population correlation of $p_0=0.40$ .
- First, transform $r$ and $p0$ to their $z$ equivalents: $zr = \frac{1}{2} \ln\left(\frac{1+0.54}{1-0.54}\right) = \frac{1}{2} \ln\left(\frac{1.54}{0.46}\right) \approx 0.604$
 $z_{0.40} = \frac{1}{2} \ln\left(\frac{1+0.40}{1-0.40}\right) = \frac{1}{2} \ln\left(\frac{1.40}{0.60}\right) \approx 0.424$
- Next, calculate the standard error of $zr$ : $\text{SE}{z_r} = \sqrt{\frac{1}{n-3}} = \sqrt{\frac{1}{165-3}} = \sqrt{\frac{1}{162}} \approx 0.0786$
- Finally, calculate the $Z$ -statistic:
 $Z = \frac{0.604 - 0.424}{0.0786} = \frac{0.180}{0.0786} \approx 2.29$
- For a one-sided test (e.g., p > 0.40) at $\alpha = 0.05$ , the critical $Z$ -value is $1.645$ . Since 2.29 > 1.645, the result is significant ( $p \approx 0.01$ for a one-sided test). A two-sided test would involve comparing to $\pm 1.96$ , which would also be significant.
AF facts: These are key characteristics of the Fisher's $z$ transformation:
- AF1: The sampling distribution of $z_r$ is approximately normal, even for moderate sample sizes ( $n \ge 10$ to $20$ generally provides a good approximation).
- AF2: The standard deviation of $z_r$ (also known as the standard error) is approximately $\sqrt{1/(n-3)}$ . This formula highlights that the precision of the estimate increases with sample size.
Note on averaging correlations: The Fisher's $z$ -transformation is crucial for averaging multiple correlation coefficients, as it transforms the skewed distribution of $r$ values into a more symmetrical distribution, reducing bias in the average.

Averaging Correlations (using Fisher's $z$ )

Rationale: When you need to combine correlation coefficients from several independent samples (e.g., in meta-analysis), directly averaging $r$ values is inappropriate due to the skewness of their distribution. Transforming them to $z$ values first, then averaging, and finally transforming back to $r$ provides a more accurate estimate of the overall population correlation.
Procedure: Convert each individual correlation ( $ri$ ) to its corresponding Fisher's $z$ value ( $zi$ ). Then calculate a weighted average of these $z_i$ values.
Weighted average: The weights ( $wi$ ) are typically set to $ni - 3$ , corresponding to the inverse of the variance of each $z_i$ . The formula for the weighted average $Z$ is:
$\bar{Z} = \frac{\sumi (ni - 3) zi}{\sumi (n_i - 3)} \,, \quad \hat{r} = \tanh(\bar{Z})$
where $anh$ is the hyperbolic tangent function, which is the inverse of $\operatorname{artanh}$ , used to transform the averaged $Z$ back to an $r$ value.
Example (three correlations): Suppose we have three studies with their respective correlations and sample sizes:
- Study 1: $n1=41$ , $r1=0.792 \Rightarrow z_1 = \operatorname{artanh}(0.792) \approx 1.071$
- Study 2: $n2=34$ , $r2=0.792 \Rightarrow z_2 = \operatorname{artanh}(0.792) \approx 1.071$
- Study 3: $n3=17$ , $r3=0.440 \Rightarrow z_3 = \operatorname{artanh}(0.440) \approx 0.472$
 Calculate the weighted average of the $z$ values:
 $\bar{Z} = \frac{(41-3) \times 1.071 + (34-3) \times 1.071 + (17-3) \times 0.472}{(41-3) + (34-3) + (17-3)}$
 $\bar{Z} = \frac{38 \times 1.071 + 31 \times 1.071 + 14 \times 0.472}{38 + 31 + 14} = \frac{40.70 + 33.20 + 6.61}{83} = \frac{80.51}{83} \approx 0.970$
 (Note: original note had $\bar{Z} \approx 0.953$ , likely due to rounding or slight calculation difference. Using the example values provided, this is a recalculated value.)
 Finally, transform back to $r$ :
 $\hat{r} = \tanh(0.970) \approx 0.749$
 This $\hat{r}$ is the estimated average correlation across the three studies.

Averaging Across Studies (between different samples)

Method: In meta-analysis or validity generalization studies, the goal is often to obtain a single, more precise estimate of a population correlation from multiple studies. This is achieved by combining correlations using Fisher's $z$ transformation, weighting each transformed correlation by the inverse of its variance. Since the variance of $zi$ is approximately $\frac{1}{ni-3}$ , the weights used are typically $wi = ni - 3$ . This effectively gives more weight to studies with larger sample sizes, which provide more precise estimates.
Result: The $z$ -weighted average, when transformed back to $r$ , provides a sample-size-weighted average correlation. This method is fundamental for reporting combined evidence for relationships across different research settings.

Testing the Equality of Two Independent Correlations

Scenario: This test is used when you have two correlation coefficients ( $r1$ and $r2$ ) from two independent samples (e.g., correlations measured in two distinct groups of people).
Hypotheses: The null hypothesis ( $H0$ ) is that the true population correlations are equal ( $p1 = p2$ ). The alternative hypothesis ( $H1$ ) is that they are not equal ( $p1 \neq p2$ for a two-sided test) or one is greater than the other (for one-sided tests).
Procedure: Transform both sample correlations ( $r1, r2$ ) into Fisher's $z$ values ( $z1, z2$ ).
Test statistic: The $Z$ -statistic for comparing two independent correlations is:
$Z = \frac{z1 - z2}{\sqrt{\frac{1}{n1-3} + \frac{1}{n2-3}}} \,, \quad zi = \operatorname{artanh}(ri)$
where $n1$ and $n2$ are the sample sizes of the two groups. This $Z$ -statistic is approximately standard normal under $H_0$ .
Decision: Compare the calculated $Z$ to critical values from the standard normal distribution (e.g., $\pm 1.96$ for a two-sided test at $\alpha = 0.05$ ). If $|Z|$ exceeds the critical value, reject $H_0$ . This implies a statistically significant difference between the two population correlations.
Example (two groups): If $r1 = 0.05$ and $r2 = 0.51$ , both with $n \approx 65$ . Let's calculate:
z1 = \operatorname{artanh}(0.05) \approx 0.0500 \ z2 = \operatorname{artanh}(0.51) \approx 0.563 \
\sqrt{\frac{1}{65-3} + \frac{1}{65-3}} = \sqrt{\frac{2}{62}} \approx \sqrt{0.0322} \approx 0.179
$Z = \frac{0.0500 - 0.563}{0.179} \approx \frac{-0.513}{0.179} \approx -2.866$
Since |-2.866| > 1.96 (for two-sided $\alpha = 0.05$ ), the difference between these two correlations is highly significant.

Testing the Equality of Two Dependent Correlations (Hotelling–Williams Test)

Scenario: This test is used when two correlations are dependent. This occurs in two common situations:
1. Correlations sharing a common variable: For example, comparing the correlation between variable X and Y ( $r{xy}$ ) with the correlation between variable Z and Y ( $r{zy}$ ), where X, Y, and Z are all measured in the same sample.
2. Correlations from the same sample: For instance, comparing the correlation between Math score and English score ( $r{ME}$ ) with the correlation between Math score and Science score ( $r{MS}$ ) from the same group of students.
Null Hypothesis: $H0: p{yx} = p_{yz}$ when variables X, Y, and Z are all measured within the same sample. This tests if variable Y is equally correlated with X and Z.
Test framework: The Hotelling–Williams test is a specific $t$ -test designed for this scenario. It takes into account the inter-correlation ( $r{xz}$ ) between the two predictor variables (X and Z) in addition to the two correlations being compared ( $r{yx}$ and $r_{yz}$ ).
Formula for the $t$ -statistic (simplified representation): The exact formula is complex but results in a $t$ -value with $df = n-3$ . It involves the three correlation coefficients ( $r{yx}$ , $r{yz}$ , $r_{xz}$ ):
$t = (r{yx} - r{yz}) \sqrt{\frac{(n-1)(1+r{xz})}{2(1-r{yx}^2-r{yz}^2-r{xz}^2+2r{yx}r{yz}r_{xz})}}$
(Note: This is a common form for comparing $r{xy}$ and $r{zy}$ . Other formulas exist for different types of dependent correlations, e.g., comparing $r{xy}$ and $r{xw}$ .)
Example: In a study by Fryxell and Gordon (1989), with $n=1518$ participants, they observed three correlations: $r{yx}=0.60$ , $r{yz}=0.55$ , and $r_{xz}=0.59$ .
- The difference in correlations is $r{yx} - r{yz} = 0.60 - 0.55 = 0.05$ . Although this difference appears small, with a large sample size, it can be statistically significant.
- The Hotelling–Williams test yielded a $t$ -statistic of $2.778$ with $df = 1518 - 3 = 1515$ .
- Given this $t$ -value and large degrees of freedom, the result is statistically significant (p < .01). This means that $r{yx}$ is significantly different from $r{yz}$ .
Interpretation: This suggests that one variable (e.g., X) is a significantly better predictor or is more strongly related to Y than the other variable (Z), even when accounting for the relationship between X and Z. This test is crucial for understanding the relative importance of predictive variables within the same dataset.
Important notes:
- The exact standard error for the difference between dependent correlations is more complex, incorporating the inter-correlations and potentially the determinant of the correlation matrix.
- While the approach remains a $t$ -test, the dependence among correlations generally reduces the power of the test compared to comparing independent correlations, as shared variance among variables limits the

Notes on Testing Correlations for Statistical Significance

Testing a Single Correlation Against Zero

Testing a Single Correlation Versus Any Specified Value (Fisher's zzz)

Averaging Correlations (using Fisher's zzz)

Averaging Across Studies (between different samples)

Testing the Equality of Two Independent Correlations

Testing the Equality of Two Dependent Correlations (Hotelling–Williams Test)

Testing a Single Correlation Versus Any Specified Value (Fisher's $z$ )

Averaging Correlations (using Fisher's $z$ )