Notes on Testing Correlations for Statistical Significance

Testing a Single Correlation Against Zero

  • Notation: We denote the population correlation coefficient as p (rho), the sample correlation coefficient as r, and the sample size as n.

  • Hypotheses: The null hypothesis (H0) states that there is no linear relationship in the population, i.e., p=0. The alternative hypothesis (H1) typically states that there is a linear relationship (p
    eq 0 for a two-sided test), or a specific direction (p > 0 or p < 0 for a one-sided test).

  • Test statistic: This test uses a t-distribution. The formula for the t-statistic is:

    t = \frac{r\sqrt{n-2}}{\sqrt{1-r^2}} \text{ with df } (n-2)

  • Decision Rule: The observed t-value is compared to critical values from Student's t-distribution. We reject the null hypothesis (H_0: p=0) if the absolute value of the observed t (|t|) exceeds the critical t-value for a specified significance level (e.g., \alpha = 0.05) and degrees of freedom.

  • Degrees of freedom (df): The degrees of freedom for this test are n-2. This value reflects the number of independent pieces of information remaining after estimating the two parameters (the intercept and slope) involved in a simple linear regression model, which is implicitly connected to correlation.

  • Example (SAT validity): Consider a study with n = 122 students, where the sample correlation between SAT scores and first-year college GPA is r = 0.27. Using the formula:
    t = \frac{0.27\sqrt{122-2}}{\sqrt{1-0.27^2}} = \frac{0.27\sqrt{120}}{\sqrt{1-0.0729}} = \frac{0.27 \times 10.954}{\sqrt{0.9271}} \approx \frac{2.957}{0.963} \approx 3.07
    With df = 120, the critical t-value for a two-sided test at \alpha = 0.05 is approximately 1.980. Since |3.07| > 1.980, we reject H_0, indicating that the correlation is statistically significant (p < 0.05).

  • Insight: Testing a sample correlation r against a population correlation of zero is entirely analogous to performing a significance test on the slope coefficient in a simple linear regression model where one variable predicts the other.

Testing a Single Correlation Versus Any Specified Value (Fisher's z)

  • Purpose: This test is used to compare an observed sample correlation (r) to a specific, non-zero population correlation (p0), i.e., to test H0: p = p0 where p0 \neq 0 and -1 < p_0 < 1. The standard t-test for correlation is only suitable for testing against p=0.

  • Fisher’s z transformation: The distribution of r is skewed, especially as p approaches +1 or -1. Fisher's z transformation normalizes this skewed distribution, making it suitable for standard hypothesis testing procedures. The transformation is:

    zr = \frac{1}{2} \ln\left(\frac{1+r}{1-r}\right) = \operatorname{artanh}(r) \,, \quad z{p0} = \frac{1}{2} \ln\left(\frac{1+p0}{1-p0}\right) = \operatorname{artanh}(p0)

    where \ln is the natural logarithm and \operatorname{artanh} is the inverse hyperbolic tangent.

  • Sampling distribution: Under the null hypothesis (H0: p=p0), the sampling distribution of zr is approximately normal with a mean of z{p0} and a variance of \frac{1}{n-3}. Thus, zr \sim N(z{p0}, \frac{1}{n-3}).

  • Test statistic: The Z-statistic for comparing r to a specified p_0 is given by:

    Z = \frac{zr - z{p_0}}{\sqrt{\frac{1}{n-3}}}

    This Z-statistic follows approximately a standard normal distribution (N(0,1)).

  • Example (cost assessment center): Suppose we have a sample of n=165 observations, with an observed correlation r=0.54. We want to test if this correlation is significantly different from a hypothesized population correlation of p_0=0.40.

    • First, transform r and p0 to their z equivalents: zr = \frac{1}{2} \ln\left(\frac{1+0.54}{1-0.54}\right) = \frac{1}{2} \ln\left(\frac{1.54}{0.46}\right) \approx 0.604
      z_{0.40} = \frac{1}{2} \ln\left(\frac{1+0.40}{1-0.40}\right) = \frac{1}{2} \ln\left(\frac{1.40}{0.60}\right) \approx 0.424

    • Next, calculate the standard error of zr: \text{SE}{z_r} = \sqrt{\frac{1}{n-3}} = \sqrt{\frac{1}{165-3}} = \sqrt{\frac{1}{162}} \approx 0.0786

    • Finally, calculate the Z-statistic:
      Z = \frac{0.604 - 0.424}{0.0786} = \frac{0.180}{0.0786} \approx 2.29

    • For a one-sided test (e.g., p > 0.40) at \alpha = 0.05, the critical Z-value is 1.645. Since 2.29 > 1.645, the result is significant (p \approx 0.01 for a one-sided test). A two-sided test would involve comparing to \pm 1.96, which would also be significant.

  • AF facts: These are key characteristics of the Fisher's z transformation:

    • AF1: The sampling distribution of z_r is approximately normal, even for moderate sample sizes (n \ge 10 to 20 generally provides a good approximation).

    • AF2: The standard deviation of z_r (also known as the standard error) is approximately \sqrt{1/(n-3)}. This formula highlights that the precision of the estimate increases with sample size.

  • Note on averaging correlations: The Fisher's z-transformation is crucial for averaging multiple correlation coefficients, as it transforms the skewed distribution of r values into a more symmetrical distribution, reducing bias in the average.

Averaging Correlations (using Fisher's z)
  • Rationale: When you need to combine correlation coefficients from several independent samples (e.g., in meta-analysis), directly averaging r values is inappropriate due to the skewness of their distribution. Transforming them to z values first, then averaging, and finally transforming back to r provides a more accurate estimate of the overall population correlation.

  • Procedure: Convert each individual correlation (ri) to its corresponding Fisher's z value (zi). Then calculate a weighted average of these z_i values.

  • Weighted average: The weights (wi) are typically set to ni - 3, corresponding to the inverse of the variance of each z_i. The formula for the weighted average Z is:

    \bar{Z} = \frac{\sumi (ni - 3) zi}{\sumi (n_i - 3)} \,, \quad \hat{r} = \tanh(\bar{Z})

    where anh is the hyperbolic tangent function, which is the inverse of \operatorname{artanh}, used to transform the averaged Z back to an r value.

  • Example (three correlations): Suppose we have three studies with their respective correlations and sample sizes:

    • Study 1: n1=41, r1=0.792 \Rightarrow z_1 = \operatorname{artanh}(0.792) \approx 1.071

    • Study 2: n2=34, r2=0.792 \Rightarrow z_2 = \operatorname{artanh}(0.792) \approx 1.071

    • Study 3: n3=17, r3=0.440 \Rightarrow z_3 = \operatorname{artanh}(0.440) \approx 0.472
      Calculate the weighted average of the z values:
      \bar{Z} = \frac{(41-3) \times 1.071 + (34-3) \times 1.071 + (17-3) \times 0.472}{(41-3) + (34-3) + (17-3)}
      \bar{Z} = \frac{38 \times 1.071 + 31 \times 1.071 + 14 \times 0.472}{38 + 31 + 14} = \frac{40.70 + 33.20 + 6.61}{83} = \frac{80.51}{83} \approx 0.970
      (Note: original note had \bar{Z} \approx 0.953, likely due to rounding or slight calculation difference. Using the example values provided, this is a recalculated value.)
      Finally, transform back to r:
      \hat{r} = \tanh(0.970) \approx 0.749
      This \hat{r} is the estimated average correlation across the three studies.

Averaging Across Studies (between different samples)
  • Method: In meta-analysis or validity generalization studies, the goal is often to obtain a single, more precise estimate of a population correlation from multiple studies. This is achieved by combining correlations using Fisher's z transformation, weighting each transformed correlation by the inverse of its variance. Since the variance of zi is approximately \frac{1}{ni-3}, the weights used are typically wi = ni - 3. This effectively gives more weight to studies with larger sample sizes, which provide more precise estimates.

  • Result: The z-weighted average, when transformed back to r, provides a sample-size-weighted average correlation. This method is fundamental for reporting combined evidence for relationships across different research settings.

Testing the Equality of Two Independent Correlations
  • Scenario: This test is used when you have two correlation coefficients (r1 and r2) from two independent samples (e.g., correlations measured in two distinct groups of people).

  • Hypotheses: The null hypothesis (H0) is that the true population correlations are equal (p1 = p2). The alternative hypothesis (H1) is that they are not equal (p1 \neq p2 for a two-sided test) or one is greater than the other (for one-sided tests).

  • Procedure: Transform both sample correlations (r1, r2) into Fisher's z values (z1, z2).

  • Test statistic: The Z-statistic for comparing two independent correlations is:

    Z = \frac{z1 - z2}{\sqrt{\frac{1}{n1-3} + \frac{1}{n2-3}}} \,, \quad zi = \operatorname{artanh}(ri)

    where n1 and n2 are the sample sizes of the two groups. This Z-statistic is approximately standard normal under H_0.

  • Decision: Compare the calculated Z to critical values from the standard normal distribution (e.g., \pm 1.96 for a two-sided test at \alpha = 0.05). If |Z| exceeds the critical value, reject H_0. This implies a statistically significant difference between the two population correlations.

  • Example (two groups): If r1 = 0.05 and r2 = 0.51, both with n \approx 65. Let's calculate:
    z1 = \operatorname{artanh}(0.05) \approx 0.0500 \ z2 = \operatorname{artanh}(0.51) \approx 0.563 \
    \sqrt{\frac{1}{65-3} + \frac{1}{65-3}} = \sqrt{\frac{2}{62}} \approx \sqrt{0.0322} \approx 0.179
    Z = \frac{0.0500 - 0.563}{0.179} \approx \frac{-0.513}{0.179} \approx -2.866
    Since |-2.866| > 1.96 (for two-sided \alpha = 0.05), the difference between these two correlations is highly significant.

Testing the Equality of Two Dependent Correlations (Hotelling–Williams Test)
  • Scenario: This test is used when two correlations are dependent. This occurs in two common situations:

    1. Correlations sharing a common variable: For example, comparing the correlation between variable X and Y (r{xy}) with the correlation between variable Z and Y (r{zy}), where X, Y, and Z are all measured in the same sample.

    2. Correlations from the same sample: For instance, comparing the correlation between Math score and English score (r{ME}) with the correlation between Math score and Science score (r{MS}) from the same group of students.

  • Null Hypothesis: H0: p{yx} = p_{yz} when variables X, Y, and Z are all measured within the same sample. This tests if variable Y is equally correlated with X and Z.

  • Test framework: The Hotelling–Williams test is a specific t-test designed for this scenario. It takes into account the inter-correlation (r{xz}) between the two predictor variables (X and Z) in addition to the two correlations being compared (r{yx} and r_{yz}).

  • Formula for the t-statistic (simplified representation): The exact formula is complex but results in a t-value with df = n-3. It involves the three correlation coefficients (r{yx}, r{yz}, r_{xz}):

    t = (r{yx} - r{yz}) \sqrt{\frac{(n-1)(1+r{xz})}{2(1-r{yx}^2-r{yz}^2-r{xz}^2+2r{yx}r{yz}r_{xz})}}

    (Note: This is a common form for comparing r{xy} and r{zy}. Other formulas exist for different types of dependent correlations, e.g., comparing r{xy} and r{xw}.)

  • Example: In a study by Fryxell and Gordon (1989), with n=1518 participants, they observed three correlations: r{yx}=0.60, r{yz}=0.55, and r_{xz}=0.59.

    • The difference in correlations is r{yx} - r{yz} = 0.60 - 0.55 = 0.05. Although this difference appears small, with a large sample size, it can be statistically significant.

    • The Hotelling–Williams test yielded a t-statistic of 2.778 with df = 1518 - 3 = 1515.

    • Given this t-value and large degrees of freedom, the result is statistically significant (p < .01). This means that r{yx} is significantly different from r{yz}.

  • Interpretation: This suggests that one variable (e.g., X) is a significantly better predictor or is more strongly related to Y than the other variable (Z), even when accounting for the relationship between X and Z. This test is crucial for understanding the relative importance of predictive variables within the same dataset.

  • Important notes:

    • The exact standard error for the difference between dependent correlations is more complex, incorporating the inter-correlations and potentially the determinant of the correlation matrix.

    • While the approach remains a t-test, the dependence among correlations generally reduces the power of the test compared to comparing independent correlations, as shared variance among variables limits the