Notes on Testing Correlations for Statistical Significance
Testing a Single Correlation Against Zero
Notation: We denote the population correlation coefficient as (rho), the sample correlation coefficient as , and the sample size as .
Hypotheses: The null hypothesis () states that there is no linear relationship in the population, i.e., . The alternative hypothesis () typically states that there is a linear relationship ( for a two-sided test), or a specific direction (p > 0 or p < 0 for a one-sided test).
Test statistic: This test uses a -distribution. The formula for the -statistic is:
Decision Rule: The observed -value is compared to critical values from Student's -distribution. We reject the null hypothesis () if the absolute value of the observed () exceeds the critical -value for a specified significance level (e.g., ) and degrees of freedom.
Degrees of freedom (df): The degrees of freedom for this test are . This value reflects the number of independent pieces of information remaining after estimating the two parameters (the intercept and slope) involved in a simple linear regression model, which is implicitly connected to correlation.
Example (SAT validity): Consider a study with students, where the sample correlation between SAT scores and first-year college GPA is . Using the formula:
With , the critical -value for a two-sided test at is approximately . Since |3.07| > 1.980, we reject , indicating that the correlation is statistically significant (p < 0.05).Insight: Testing a sample correlation against a population correlation of zero is entirely analogous to performing a significance test on the slope coefficient in a simple linear regression model where one variable predicts the other.
Testing a Single Correlation Versus Any Specified Value (Fisher's )
Purpose: This test is used to compare an observed sample correlation () to a specific, non-zero population correlation (), i.e., to test where and -1 < p_0 < 1. The standard -test for correlation is only suitable for testing against .
Fisher’s transformation: The distribution of is skewed, especially as approaches or . Fisher's transformation normalizes this skewed distribution, making it suitable for standard hypothesis testing procedures. The transformation is:
where is the natural logarithm and is the inverse hyperbolic tangent.
Sampling distribution: Under the null hypothesis (), the sampling distribution of is approximately normal with a mean of and a variance of . Thus, .
Test statistic: The -statistic for comparing to a specified is given by:
This -statistic follows approximately a standard normal distribution ().
Example (cost assessment center): Suppose we have a sample of observations, with an observed correlation . We want to test if this correlation is significantly different from a hypothesized population correlation of .
First, transform and to their equivalents:
Next, calculate the standard error of :
Finally, calculate the -statistic:
For a one-sided test (e.g., p > 0.40) at , the critical -value is . Since 2.29 > 1.645, the result is significant ( for a one-sided test). A two-sided test would involve comparing to , which would also be significant.
AF facts: These are key characteristics of the Fisher's transformation:
AF1: The sampling distribution of is approximately normal, even for moderate sample sizes ( to generally provides a good approximation).
AF2: The standard deviation of (also known as the standard error) is approximately . This formula highlights that the precision of the estimate increases with sample size.
Note on averaging correlations: The Fisher's -transformation is crucial for averaging multiple correlation coefficients, as it transforms the skewed distribution of values into a more symmetrical distribution, reducing bias in the average.
Averaging Correlations (using Fisher's )
Rationale: When you need to combine correlation coefficients from several independent samples (e.g., in meta-analysis), directly averaging values is inappropriate due to the skewness of their distribution. Transforming them to values first, then averaging, and finally transforming back to provides a more accurate estimate of the overall population correlation.
Procedure: Convert each individual correlation () to its corresponding Fisher's value (). Then calculate a weighted average of these values.
Weighted average: The weights () are typically set to , corresponding to the inverse of the variance of each . The formula for the weighted average is:
where is the hyperbolic tangent function, which is the inverse of , used to transform the averaged back to an value.
Example (three correlations): Suppose we have three studies with their respective correlations and sample sizes:
Study 1: ,
Study 2: ,
Study 3: ,
Calculate the weighted average of the values:
(Note: original note had , likely due to rounding or slight calculation difference. Using the example values provided, this is a recalculated value.)
Finally, transform back to :
This is the estimated average correlation across the three studies.
Averaging Across Studies (between different samples)
Method: In meta-analysis or validity generalization studies, the goal is often to obtain a single, more precise estimate of a population correlation from multiple studies. This is achieved by combining correlations using Fisher's transformation, weighting each transformed correlation by the inverse of its variance. Since the variance of is approximately , the weights used are typically . This effectively gives more weight to studies with larger sample sizes, which provide more precise estimates.
Result: The -weighted average, when transformed back to , provides a sample-size-weighted average correlation. This method is fundamental for reporting combined evidence for relationships across different research settings.
Testing the Equality of Two Independent Correlations
Scenario: This test is used when you have two correlation coefficients ( and ) from two independent samples (e.g., correlations measured in two distinct groups of people).
Hypotheses: The null hypothesis () is that the true population correlations are equal (). The alternative hypothesis () is that they are not equal ( for a two-sided test) or one is greater than the other (for one-sided tests).
Procedure: Transform both sample correlations () into Fisher's values ().
Test statistic: The -statistic for comparing two independent correlations is:
where and are the sample sizes of the two groups. This -statistic is approximately standard normal under .
Decision: Compare the calculated to critical values from the standard normal distribution (e.g., for a two-sided test at ). If exceeds the critical value, reject . This implies a statistically significant difference between the two population correlations.
Example (two groups): If and , both with . Let's calculate:
z1 = \operatorname{artanh}(0.05) \approx 0.0500 \ z2 = \operatorname{artanh}(0.51) \approx 0.563 \
\sqrt{\frac{1}{65-3} + \frac{1}{65-3}} = \sqrt{\frac{2}{62}} \approx \sqrt{0.0322} \approx 0.179
Since |-2.866| > 1.96 (for two-sided ), the difference between these two correlations is highly significant.
Testing the Equality of Two Dependent Correlations (Hotelling–Williams Test)
Scenario: This test is used when two correlations are dependent. This occurs in two common situations:
Correlations sharing a common variable: For example, comparing the correlation between variable X and Y () with the correlation between variable Z and Y (), where X, Y, and Z are all measured in the same sample.
Correlations from the same sample: For instance, comparing the correlation between Math score and English score () with the correlation between Math score and Science score () from the same group of students.
Null Hypothesis: when variables X, Y, and Z are all measured within the same sample. This tests if variable Y is equally correlated with X and Z.
Test framework: The Hotelling–Williams test is a specific -test designed for this scenario. It takes into account the inter-correlation () between the two predictor variables (X and Z) in addition to the two correlations being compared ( and ).
Formula for the -statistic (simplified representation): The exact formula is complex but results in a -value with . It involves the three correlation coefficients (, , ):
(Note: This is a common form for comparing and . Other formulas exist for different types of dependent correlations, e.g., comparing and .)
Example: In a study by Fryxell and Gordon (1989), with participants, they observed three correlations: , , and .
The difference in correlations is . Although this difference appears small, with a large sample size, it can be statistically significant.
The Hotelling–Williams test yielded a -statistic of with .
Given this -value and large degrees of freedom, the result is statistically significant (p < .01). This means that is significantly different from .
Interpretation: This suggests that one variable (e.g., X) is a significantly better predictor or is more strongly related to Y than the other variable (Z), even when accounting for the relationship between X and Z. This test is crucial for understanding the relative importance of predictive variables within the same dataset.
Important notes:
The exact standard error for the difference between dependent correlations is more complex, incorporating the inter-correlations and potentially the determinant of the correlation matrix.
While the approach remains a -test, the dependence among correlations generally reduces the power of the test compared to comparing independent correlations, as shared variance among variables limits the