Study Notes on Correlation
Chapter 15: Correlation
15.1 Introduction
Overview of Correlation
Correlation is a statistical technique used to measure and describe the relationship between two variables, identified as X and Y. This analysis typically observes the natural occurrence of both variables without manipulation, providing insights into how one variable may relate to another based on collected data.
Key Characteristics of Correlation:
Direction of the Relationship: The sign of the correlation (positive or negative) delineates how the two variables move in relation to each other.
Positive Correlation: Indicates that as X increases, Y also increases.
Negative Correlation: Indicates that as X increases, Y decreases, demonstrating an inverse relationship.
Form of the Relationship: Correlation primarily assesses linear relationships among data points. However, other forms can also be evaluated using specialized methods.
Strength of the Relationship: The correlation coefficient quantifies how consistently points fit a linear relationship. Values range from -1.00 (perfect negative correlation) to +1.00 (perfect positive correlation). A correlation of zero indicates no relationship.
Visual Representation
Data points can be visually represented in scatter plots, allowing for a quick assessment of patterns (Figure 15.2 shows a scatter plot demonstrating the relationship between family income (X) and student grades (Y)).
15.2 The Pearson Correlation
Definition
The Pearson correlation (r) specifically gauges the degree of the linear relationship between two variables. The formula used to compute the Pearson correlation is given as:
r = \frac{SP}{\sqrt{SSX * SSY}}
Where:
SP = Sum of products of deviations, calculated using:
Definitional Formula: SP = \sum (X - MX)(Y - MY)
Computational Formula: SP = \frac{\sum XY - \frac{\sum X \sum Y}{n}}{N}
SS measures the variability of the X and Y datasets, representing the total squared deviations for each respective variable.
Characteristics of Pearson Correlation
A positive outcome signifies the same directional movement of both variables.
A negative outcome indicates inverse relationships.
The Pearson correlation can only yield results within the range of -1.00 to +1.00.
Adding constants to either X or Y does not alter the correlation value. However, multiplying either variable by a negative constant inverses the sign of the correlation coefficient.
15.3 Using and Interpreting the Pearson Correlation
Applications of Correlation
Correlations find utility across various domains, including:
Prediction: Assessing expected outcomes based on known relations, e.g., SAT scores predicting college performance.
Validity: Establishing the effectiveness of assessments through correlation with established measures (e.g., comparing new intelligence tests with standard IQ tests).
Reliability: Verifying consistency across measurements through correlation analysis between repeated tests.
Theory Verification: Testing hypothetical relationships posited by psychological theories through correlation studies.
Critical Considerations
Correlation Does Not Imply Causation: A fundamental caution in interpreting correlations is the avoidance of equating correlation with causative links.
Restricted Range and Outliers: Both factors can skew correlation values. Extreme values (outliers) can drastically affect the resulting correlation.
Coefficient of Determination (r²): This value is derived by squaring the correlation coefficient, indicating how much variation in one variable can be explained by its relationship with another.
For instance, a correlation of r = 0.80 leads to r² = 0.64, suggesting that 64% of the variance in Y can be explained by its relationship with X.
15.4 Hypothesis Tests with the Pearson Correlation
Conducting Hypothesis Tests
When assessing correlations using sample data, hypothesis testing aids in understanding the significance of findings in the broader population. The standard hypotheses framed include:
Null Hypothesis (H0): There is no correlation in the population (ρ = 0).
Alternative Hypothesis (H1): There is a real and significant correlation in the population (ρ ≠ 0).
Computing t-Statistic
The t-statistic for correlation is calculated as follows:
t = \frac{r - 0}{\sqrt{\frac{1 - r^2}{n - 2}}}
This evaluation leads to determining if the correlation is significant or if the observed correlation was a result of sampling error.
15.5 Alternatives to the Pearson Correlation
Spearman Correlation
The Spearman correlation (rS) measures the strength of a monotonic relationship between variables when the data presented are ordinal. It is calculated similarly to Pearson's correlation after ranking the data, effectively eliminating the effects of outliers and non-linear relationships.
Point-Biserial and Phi-Coefficient
Point-Biserial Correlation: Used for measuring relationships when one variable is dichotomous (containing two categories).
Phi-Coefficient: Applied in cases where both variables are dichotomous.
Practical Applications
Both methods expand the utility of correlation, allowing research across diverse data types and structures.
Summary
Correlation measures the relationship characterized by direction, form, and strength.
The Pearson correlation is commonly utilized for linear relationships.
The significance of correlations must be interpreted carefully, considering possible alternative relationships, restricted ranges, and outliers.
Extended forms of correlation (Spearman, point-biserial, phi) allow analysis across other types of data representations.
Key Terms
Correlation: A statistical measure describing the relationship between two variables.
Pearson Correlation: A method for measuring linear relationships.
Spearman Correlation: A method for assessing monotonic relationships using ranks.
Point-Biserial Correlation: For examining relations involving dichotomous variables.
Phi-Coefficient: Measures correlation between two dichotomous variables.