Computing R
Introduction to Pearson's Correlation
Pearson's correlation coefficient measures the strength and direction of the linear relationship between two variables, typically denoted as x and y.
Several formulas exist for computing Pearson's correlation, with some offering conceptual clarity and others providing computational ease.
Calculating Pearson's Correlation Using Deviation Scores
Step 1: Compute Total and Mean for x
Total for x: The first step is to calculate the total for the variable x.
For example, if the total is 20, and the sample size is 5, the mean () can be calculated as follows:
Step 2: Create Deviation Scores for x
Once the mean is calculated, subtract this mean from each value of x to produce a new column, referred to as small x.
Calculation of small x: For each value of x, perform the following:
Notably, the mean of small x will equal zero:
Step 3: Create Deviation Scores for y
Repeat the process for the variable y to derive the small y column.
Similar to small x, each value of y will be adjusted based on its mean:
Step 4: Understand Deviations
The values in small x and small y are considered deviation scores as they represent deviations from their respective means.
Step 5: Compute the Product of Deviation Scores
Generate a new column by multiplying the deviation scores of small x and small y together, referred to as the x y column.
Step 6: Create Squared Deviation Columns
Create two additional columns:
small x squared:
small y squared:
Analyzing the x y Column
The mean of the x y column is crucial in understanding the relationship between x and y:
If there is no relationship between x and y, positive values of small x would negate negative values of small y, resulting in an overall sum of zero.
Conversely, if high values of x are associated with high values of y (and low values correspond as well), the values in the x y column will be predominantly positive.
Example: High x values may correlate with high y values, leading to positive entries in x y.
If high x values were paired with low y values, we would see negative entries in the x y column.
Final Computation of Pearson’s Correlation
The actual computation of Pearson's correlation coefficient (r) is defined as follows:
For the given example, the computed value of r is 0.968, which indicates a very strong positive correlation.
Alternative Computational Formula
An alternative formula for computing Pearson's correlation is available:
This formula may bypass the need for detailed computation of deviation scores and is particularly useful when using simple calculators.
Conclusion
Understanding the relationship between two variables through Pearson's correlation can yield insights into their interplay and is foundational in statistical analysis.
Recognizing both deviation scores and the construction of product and squared terms are essential for accurate calculations of correlation.