Computing R

Introduction to Pearson's Correlation

  • Pearson's correlation coefficient measures the strength and direction of the linear relationship between two variables, typically denoted as x and y.

  • Several formulas exist for computing Pearson's correlation, with some offering conceptual clarity and others providing computational ease.

Calculating Pearson's Correlation Using Deviation Scores

Step 1: Compute Total and Mean for x

  • Total for x: The first step is to calculate the total for the variable x.

  • For example, if the total is 20, and the sample size is 5, the mean () can be calculated as follows:

    • Mean of x=Total of xSample Size=205=4\text{Mean of x} = \frac{\text{Total of x}}{\text{Sample Size}} = \frac{20}{5} = 4

Step 2: Create Deviation Scores for x

  • Once the mean is calculated, subtract this mean from each value of x to produce a new column, referred to as small x.

  • Calculation of small x: For each value of x, perform the following:

    • small x=xMean of x\text{small x} = x - \text{Mean of x}

  • Notably, the mean of small x will equal zero: Mean of small x=0\text{Mean of small x} = 0

Step 3: Create Deviation Scores for y

  • Repeat the process for the variable y to derive the small y column.

  • Similar to small x, each value of y will be adjusted based on its mean:

    • small y=yMean of y\text{small y} = y - \text{Mean of y}

Step 4: Understand Deviations

  • The values in small x and small y are considered deviation scores as they represent deviations from their respective means.

Step 5: Compute the Product of Deviation Scores

  • Generate a new column by multiplying the deviation scores of small x and small y together, referred to as the x y column.

Step 6: Create Squared Deviation Columns

  • Create two additional columns:

    • small x squared: small x2\text{small x}^2

    • small y squared: small y2\text{small y}^2

Analyzing the x y Column

  • The mean of the x y column is crucial in understanding the relationship between x and y:

    • If there is no relationship between x and y, positive values of small x would negate negative values of small y, resulting in an overall sum of zero.

    • Conversely, if high values of x are associated with high values of y (and low values correspond as well), the values in the x y column will be predominantly positive.

    • Example: High x values may correlate with high y values, leading to positive entries in x y.

    • If high x values were paired with low y values, we would see negative entries in the x y column.

Final Computation of Pearson’s Correlation

  • The actual computation of Pearson's correlation coefficient (r) is defined as follows:

    • r=(smallxsmally)(smallx2)(smally2)r = \frac{\sum (small x \cdot small y)}{\sqrt{\sum (small x^2) \cdot \sum (small y^2)}}

  • For the given example, the computed value of r is 0.968, which indicates a very strong positive correlation.

Alternative Computational Formula

  • An alternative formula for computing Pearson's correlation is available:

    • This formula may bypass the need for detailed computation of deviation scores and is particularly useful when using simple calculators.

Conclusion

  • Understanding the relationship between two variables through Pearson's correlation can yield insights into their interplay and is foundational in statistical analysis.

  • Recognizing both deviation scores and the construction of product and squared terms are essential for accurate calculations of correlation.