The Evolution of Pearson's Correlation Coefficient

The Evolution of Pearson's Correlation Coefficient

Introduction to Correlation in Statistics

  • Definition of Correlation:

    • Correlation is a measure of the

    • Direction (positive or negative) and

    • Strength of the linear relationship
      between two quantitative variables.

  • Pearson's Correlation Coefficient (r):

    • Typically taught in high school and introductory college-level statistics.

    • The article explores an activity that aids in understanding its formula and interpretation through scatter plots.

Activity Overview

  • Introduces Quadrant Count Ratio (QCR) as an intermediate measure of association.

  • Discusses how QCR leads to Pearson's r to overcome its shortcomings.

  • Aligns with GAISE Report (Franklin et al., 2007):

    • Promotes statistical literacy and reasoning skills for high school graduates.

    • Statistical education framework described in three developmental levels (A, B, and C).

Developmental Levels
  1. Level A: Introduction to statistical concepts through simple activities.

  2. Level B: Building upon foundational concepts and introducing more complex ideas.

  3. Level C: Advanced statistical thinking and the ability to understand deeper statistical methods.

Understanding Association Between Variables

  • Definition:

    • Two variables are associated if the values of one variable tend to occur more frequently with certain values of the other variable (Moore and McCabe, 2003).

    • Important for making predictions about one variable based on another.

Example of Association
  • Anthropometric Question:

    • Is there a relationship between arm span and height?

    • Treatment of height as independent (x) and arm span as dependent (y) variable.

Scatter Plots
  • Most effective method for exploring the association between two quantitative variables.

  • Example measurement data: Height and arm span of 25 students (in centimeters).

Types of Relationships Identified in Scatter Plots
  1. Direction: Ascending or descending pattern.

  2. Strength: How closely the points cluster around the line.

  3. Form: Linear or non-linear trend.

Quadrant Count Ratio (QCR)

  • QCR Definition:


    • QCR=(Q<em>I)+(Q</em>III)(Q<em>II)+(Q</em>IV)QCR = \frac{(Q<em>{I}) + (Q</em>{III})}{(Q<em>{II}) + (Q</em>{IV})}

  • Components:

    • QI, QII, QIII, and QIV represent the number of points in each quadrant.

    • n = total number of observations.

Example Calculation for Arm Span and Height
  • Using the provided formula with data:
    QCR=8+151+1=232=0.84QCR = \frac{8 + 15}{1 + 1} = \frac{23}{2} = 0.84

  • Interpretation: Indicates a strong positive association between arm span and height.

Properties of the QCR

  • Range:

    • QCR is guaranteed to be between -1 and 1.

  • Units Independence:

    • QCR is independent of the units of measurement, e.g., height and arm span are both in centimeters.

  • Explored through Scatter Plots:

    • Figures 2-7 demonstrate various properties of QCR.

Properties Questions
  1. Is the general trend positive or negative?

  2. How are points distributed across quadrants?

  3. Does the relationship appear linear?

  4. What strength does the QCR suggest?

Specific Properties Explained

  • Property 1:

    • QCR will be positive if predominantly in quadrants I and III; negative if in II and IV.

  • Property 2:

    • QCR approaches zero when association is weak.

  • Property 3:

    • QCR of 1 indicates all points in quadrants I and III; -1 if exclusively in II and IV.

Transitioning to Pearson's Correlation Coefficient
  • Distance Calculation:

    • Use signed distances from each point to the mean lines as part of association strength measure.

  • Calculate Pearson's r: r=1n1(z<em>xz</em>y)r = \frac{1}{n - 1} \sum (z<em>{x} z</em>{y})

    • r represents how much stronger the correlation is when considering distances.

Properties of Pearson's r

  1. Trend Correlation:

    • Positive when trend is ascending, negative if descending.

  2. Weak Correlation:

    • r is close to zero in weak associations.

  3. Perfect Correlation:

    • r = 1 or -1 only in perfect linear relationships.

Comparisons Between QCR and Pearson's r

  • Use of Scatter Plots:

    • Visual representation aids in understanding the nuances of association.

  • Directions and Forms:

    • Both methods inform about the association, but Pearson's r provides more quantifiable metrics.

Summary

  • The article emphasizes an understanding of how Pearson's correlation coefficient represents the direction and strength of relationships in quantitative data.

  • Implementation in classrooms includes practical data collection and exploration through scatter plots.

  • Suggested resources include online platforms that further enhance learning.