Notes on Testing Correlations for Statistical Significance

Testing a Single Correlation Against Zero

  • Notation: We denote the population correlation coefficient as pp (rho), the sample correlation coefficient as rr, and the sample size as nn.

  • Hypotheses: The null hypothesis (H<em>0H<em>0) states that there is no linear relationship in the population, i.e., p=0p=0. The alternative hypothesis (H</em>1H</em>1) typically states that there is a linear relationship (p<br>eq0p <br>eq 0 for a two-sided test), or a specific direction (p > 0 or p < 0 for a one-sided test).

  • Test statistic: This test uses a tt-distribution. The formula for the tt-statistic is:

    t=rn21r2 with df (n2)t = \frac{r\sqrt{n-2}}{\sqrt{1-r^2}} \text{ with df } (n-2)

  • Decision Rule: The observed tt-value is compared to critical values from Student's tt-distribution. We reject the null hypothesis (H0:p=0H_0: p=0) if the absolute value of the observed tt (t|t|) exceeds the critical tt-value for a specified significance level (e.g., α=0.05\alpha = 0.05) and degrees of freedom.

  • Degrees of freedom (df): The degrees of freedom for this test are n2n-2. This value reflects the number of independent pieces of information remaining after estimating the two parameters (the intercept and slope) involved in a simple linear regression model, which is implicitly connected to correlation.

  • Example (SAT validity): Consider a study with n=122n = 122 students, where the sample correlation between SAT scores and first-year college GPA is r=0.27r = 0.27. Using the formula:
    t=0.27122210.272=0.2712010.0729=0.27×10.9540.92712.9570.9633.07t = \frac{0.27\sqrt{122-2}}{\sqrt{1-0.27^2}} = \frac{0.27\sqrt{120}}{\sqrt{1-0.0729}} = \frac{0.27 \times 10.954}{\sqrt{0.9271}} \approx \frac{2.957}{0.963} \approx 3.07
    With df=120df = 120, the critical tt-value for a two-sided test at α=0.05\alpha = 0.05 is approximately 1.9801.980. Since |3.07| > 1.980, we reject H0H_0, indicating that the correlation is statistically significant (p < 0.05).

  • Insight: Testing a sample correlation rr against a population correlation of zero is entirely analogous to performing a significance test on the slope coefficient in a simple linear regression model where one variable predicts the other.

Testing a Single Correlation Versus Any Specified Value (Fisher's zz)

  • Purpose: This test is used to compare an observed sample correlation (rr) to a specific, non-zero population correlation (p<em>0p<em>0), i.e., to test H</em>0:p=p<em>0H</em>0: p = p<em>0 where p</em>00p</em>0 \neq 0 and -1 < p_0 < 1. The standard tt-test for correlation is only suitable for testing against p=0p=0.

  • Fisher’s zz transformation: The distribution of rr is skewed, especially as pp approaches +1+1 or 1-1. Fisher's zz transformation normalizes this skewed distribution, making it suitable for standard hypothesis testing procedures. The transformation is:

    z<em>r=12ln(1+r1r)=artanh(r),z</em>p<em>0=12ln(1+p</em>01p<em>0)=artanh(p</em>0)z<em>r = \frac{1}{2} \ln\left(\frac{1+r}{1-r}\right) = \operatorname{artanh}(r) \,, \quad z</em>{p<em>0} = \frac{1}{2} \ln\left(\frac{1+p</em>0}{1-p<em>0}\right) = \operatorname{artanh}(p</em>0)

    where ln\ln is the natural logarithm and artanh\operatorname{artanh} is the inverse hyperbolic tangent.

  • Sampling distribution: Under the null hypothesis (H<em>0:p=p</em>0H<em>0: p=p</em>0), the sampling distribution of z<em>rz<em>r is approximately normal with a mean of z</em>p<em>0z</em>{p<em>0} and a variance of 1n3\frac{1}{n-3}. Thus, z</em>rN(z<em>p</em>0,1n3)z</em>r \sim N(z<em>{p</em>0}, \frac{1}{n-3}).

  • Test statistic: The ZZ-statistic for comparing rr to a specified p0p_0 is given by:

    Z=z<em>rz</em>p01n3Z = \frac{z<em>r - z</em>{p_0}}{\sqrt{\frac{1}{n-3}}}

    This ZZ-statistic follows approximately a standard normal distribution (N(0,1)N(0,1)).

  • Example (cost assessment center): Suppose we have a sample of n=165n=165 observations, with an observed correlation r=0.54r=0.54. We want to test if this correlation is significantly different from a hypothesized population correlation of p0=0.40p_0=0.40.

    • First, transform rr and p<em>0p<em>0 to their zz equivalents: z</em>r=12ln(1+0.5410.54)=12ln(1.540.46)0.604z</em>r = \frac{1}{2} \ln\left(\frac{1+0.54}{1-0.54}\right) = \frac{1}{2} \ln\left(\frac{1.54}{0.46}\right) \approx 0.604
      z0.40=12ln(1+0.4010.40)=12ln(1.400.60)0.424z_{0.40} = \frac{1}{2} \ln\left(\frac{1+0.40}{1-0.40}\right) = \frac{1}{2} \ln\left(\frac{1.40}{0.60}\right) \approx 0.424

    • Next, calculate the standard error of z<em>rz<em>r: SE</em>zr=1n3=11653=11620.0786\text{SE}</em>{z_r} = \sqrt{\frac{1}{n-3}} = \sqrt{\frac{1}{165-3}} = \sqrt{\frac{1}{162}} \approx 0.0786

    • Finally, calculate the ZZ-statistic:
      Z=0.6040.4240.0786=0.1800.07862.29Z = \frac{0.604 - 0.424}{0.0786} = \frac{0.180}{0.0786} \approx 2.29

    • For a one-sided test (e.g., p > 0.40) at α=0.05\alpha = 0.05, the critical ZZ-value is 1.6451.645. Since 2.29 > 1.645, the result is significant (p0.01p \approx 0.01 for a one-sided test). A two-sided test would involve comparing to ±1.96\pm 1.96, which would also be significant.

  • AF facts: These are key characteristics of the Fisher's zz transformation:

    • AF1: The sampling distribution of zrz_r is approximately normal, even for moderate sample sizes (n10n \ge 10 to 2020 generally provides a good approximation).

    • AF2: The standard deviation of zrz_r (also known as the standard error) is approximately 1/(n3)\sqrt{1/(n-3)}. This formula highlights that the precision of the estimate increases with sample size.

  • Note on averaging correlations: The Fisher's zz-transformation is crucial for averaging multiple correlation coefficients, as it transforms the skewed distribution of rr values into a more symmetrical distribution, reducing bias in the average.

Averaging Correlations (using Fisher's zz)
  • Rationale: When you need to combine correlation coefficients from several independent samples (e.g., in meta-analysis), directly averaging rr values is inappropriate due to the skewness of their distribution. Transforming them to zz values first, then averaging, and finally transforming back to rr provides a more accurate estimate of the overall population correlation.

  • Procedure: Convert each individual correlation (r<em>ir<em>i) to its corresponding Fisher's zz value (z</em>iz</em>i). Then calculate a weighted average of these ziz_i values.

  • Weighted average: The weights (w<em>iw<em>i) are typically set to n</em>i3n</em>i - 3, corresponding to the inverse of the variance of each ziz_i. The formula for the weighted average ZZ is:

    Zˉ=<em>i(n</em>i3)z<em>i</em>i(ni3),r^=tanh(Zˉ)\bar{Z} = \frac{\sum<em>i (n</em>i - 3) z<em>i}{\sum</em>i (n_i - 3)} \,, \quad \hat{r} = \tanh(\bar{Z})

    where anhanh is the hyperbolic tangent function, which is the inverse of artanh\operatorname{artanh}, used to transform the averaged ZZ back to an rr value.

  • Example (three correlations): Suppose we have three studies with their respective correlations and sample sizes:

    • Study 1: n<em>1=41n<em>1=41, r</em>1=0.792z1=artanh(0.792)1.071r</em>1=0.792 \Rightarrow z_1 = \operatorname{artanh}(0.792) \approx 1.071

    • Study 2: n<em>2=34n<em>2=34, r</em>2=0.792z2=artanh(0.792)1.071r</em>2=0.792 \Rightarrow z_2 = \operatorname{artanh}(0.792) \approx 1.071

    • Study 3: n<em>3=17n<em>3=17, r</em>3=0.440z3=artanh(0.440)0.472r</em>3=0.440 \Rightarrow z_3 = \operatorname{artanh}(0.440) \approx 0.472
      Calculate the weighted average of the zz values:
      Zˉ=(413)×1.071+(343)×1.071+(173)×0.472(413)+(343)+(173)\bar{Z} = \frac{(41-3) \times 1.071 + (34-3) \times 1.071 + (17-3) \times 0.472}{(41-3) + (34-3) + (17-3)}
      Zˉ=38×1.071+31×1.071+14×0.47238+31+14=40.70+33.20+6.6183=80.51830.970\bar{Z} = \frac{38 \times 1.071 + 31 \times 1.071 + 14 \times 0.472}{38 + 31 + 14} = \frac{40.70 + 33.20 + 6.61}{83} = \frac{80.51}{83} \approx 0.970
      (Note: original note had Zˉ0.953\bar{Z} \approx 0.953, likely due to rounding or slight calculation difference. Using the example values provided, this is a recalculated value.)
      Finally, transform back to rr:
      r^=tanh(0.970)0.749\hat{r} = \tanh(0.970) \approx 0.749
      This r^\hat{r} is the estimated average correlation across the three studies.

Averaging Across Studies (between different samples)
  • Method: In meta-analysis or validity generalization studies, the goal is often to obtain a single, more precise estimate of a population correlation from multiple studies. This is achieved by combining correlations using Fisher's zz transformation, weighting each transformed correlation by the inverse of its variance. Since the variance of z<em>iz<em>i is approximately 1n</em>i3\frac{1}{n</em>i-3}, the weights used are typically w<em>i=n</em>i3w<em>i = n</em>i - 3. This effectively gives more weight to studies with larger sample sizes, which provide more precise estimates.

  • Result: The zz-weighted average, when transformed back to rr, provides a sample-size-weighted average correlation. This method is fundamental for reporting combined evidence for relationships across different research settings.

Testing the Equality of Two Independent Correlations
  • Scenario: This test is used when you have two correlation coefficients (r<em>1r<em>1 and r</em>2r</em>2) from two independent samples (e.g., correlations measured in two distinct groups of people).

  • Hypotheses: The null hypothesis (H<em>0H<em>0) is that the true population correlations are equal (p</em>1=p<em>2p</em>1 = p<em>2). The alternative hypothesis (H</em>1H</em>1) is that they are not equal (p<em>1p</em>2p<em>1 \neq p</em>2 for a two-sided test) or one is greater than the other (for one-sided tests).

  • Procedure: Transform both sample correlations (r<em>1,r</em>2r<em>1, r</em>2) into Fisher's zz values (z<em>1,z</em>2z<em>1, z</em>2).

  • Test statistic: The ZZ-statistic for comparing two independent correlations is:

    Z=z<em>1z</em>21n<em>13+1n</em>23,z<em>i=artanh(r</em>i)Z = \frac{z<em>1 - z</em>2}{\sqrt{\frac{1}{n<em>1-3} + \frac{1}{n</em>2-3}}} \,, \quad z<em>i = \operatorname{artanh}(r</em>i)

    where n<em>1n<em>1 and n</em>2n</em>2 are the sample sizes of the two groups. This ZZ-statistic is approximately standard normal under H0H_0.

  • Decision: Compare the calculated ZZ to critical values from the standard normal distribution (e.g., ±1.96\pm 1.96 for a two-sided test at α=0.05\alpha = 0.05). If Z|Z| exceeds the critical value, reject H0H_0. This implies a statistically significant difference between the two population correlations.

  • Example (two groups): If r<em>1=0.05r<em>1 = 0.05 and r</em>2=0.51r</em>2 = 0.51, both with n65n \approx 65. Let's calculate:
    z1 = \operatorname{artanh}(0.05) \approx 0.0500 \ z2 = \operatorname{artanh}(0.51) \approx 0.563 \
    \sqrt{\frac{1}{65-3} + \frac{1}{65-3}} = \sqrt{\frac{2}{62}} \approx \sqrt{0.0322} \approx 0.179
    Z=0.05000.5630.1790.5130.1792.866Z = \frac{0.0500 - 0.563}{0.179} \approx \frac{-0.513}{0.179} \approx -2.866
    Since |-2.866| > 1.96 (for two-sided α=0.05\alpha = 0.05), the difference between these two correlations is highly significant.

Testing the Equality of Two Dependent Correlations (Hotelling–Williams Test)
  • Scenario: This test is used when two correlations are dependent. This occurs in two common situations:

    1. Correlations sharing a common variable: For example, comparing the correlation between variable X and Y (r<em>xyr<em>{xy}) with the correlation between variable Z and Y (r</em>zyr</em>{zy}), where X, Y, and Z are all measured in the same sample.

    2. Correlations from the same sample: For instance, comparing the correlation between Math score and English score (r<em>MEr<em>{ME}) with the correlation between Math score and Science score (r</em>MSr</em>{MS}) from the same group of students.

  • Null Hypothesis: H<em>0:p</em>yx=pyzH<em>0: p</em>{yx} = p_{yz} when variables X, Y, and Z are all measured within the same sample. This tests if variable Y is equally correlated with X and Z.

  • Test framework: The Hotelling–Williams test is a specific tt-test designed for this scenario. It takes into account the inter-correlation (r<em>xzr<em>{xz}) between the two predictor variables (X and Z) in addition to the two correlations being compared (r</em>yxr</em>{yx} and ryzr_{yz}).

  • Formula for the tt-statistic (simplified representation): The exact formula is complex but results in a tt-value with df=n3df = n-3. It involves the three correlation coefficients (r<em>yxr<em>{yx}, r</em>yzr</em>{yz}, rxzr_{xz}):

    t=(r<em>yxr</em>yz)(n1)(1+r<em>xz)2(1r</em>yx2r<em>yz2r</em>xz2+2r<em>yxr</em>yzrxz)t = (r<em>{yx} - r</em>{yz}) \sqrt{\frac{(n-1)(1+r<em>{xz})}{2(1-r</em>{yx}^2-r<em>{yz}^2-r</em>{xz}^2+2r<em>{yx}r</em>{yz}r_{xz})}}

    (Note: This is a common form for comparing r<em>xyr<em>{xy} and r</em>zyr</em>{zy}. Other formulas exist for different types of dependent correlations, e.g., comparing r<em>xyr<em>{xy} and r</em>xwr</em>{xw}.)

  • Example: In a study by Fryxell and Gordon (1989), with n=1518n=1518 participants, they observed three correlations: r<em>yx=0.60r<em>{yx}=0.60, r</em>yz=0.55r</em>{yz}=0.55, and rxz=0.59r_{xz}=0.59.

    • The difference in correlations is r<em>yxr</em>yz=0.600.55=0.05r<em>{yx} - r</em>{yz} = 0.60 - 0.55 = 0.05. Although this difference appears small, with a large sample size, it can be statistically significant.

    • The Hotelling–Williams test yielded a tt-statistic of 2.7782.778 with df=15183=1515df = 1518 - 3 = 1515.

    • Given this tt-value and large degrees of freedom, the result is statistically significant (p < .01). This means that r<em>yxr<em>{yx} is significantly different from r</em>yzr</em>{yz}.

  • Interpretation: This suggests that one variable (e.g., X) is a significantly better predictor or is more strongly related to Y than the other variable (Z), even when accounting for the relationship between X and Z. This test is crucial for understanding the relative importance of predictive variables within the same dataset.

  • Important notes:

    • The exact standard error for the difference between dependent correlations is more complex, incorporating the inter-correlations and potentially the determinant of the correlation matrix.

    • While the approach remains a tt-test, the dependence among correlations generally reduces the power of the test compared to comparing independent correlations, as shared variance among variables limits the