Inferential Statistics: The Pearson r Coefficient

Criminal Justice Inferential Statistics Lectures

CJ 3320 Criminal Justice Statistics Course Notes
Dr. Kevin Buckler

The Pearson r Coefficient

  • The Pearson r coefficient is a type of Degree of Association (DoA) measure.

    • Degree of Association (DoA) Definition: A statistical measure that quantifies the relationship between two variables.

  • Uses of the Pearson r:

    • Conditions for Use:

    1. When the independent variable (IV) is interval and the dependent variable (DV) is also interval.

    2. When the IV is a nominal dichotomous variable (coded as 0 and 1) and the DV is interval.

      • Nominal Dichotomous Variable Coding:

      • 1 indicates presence of the quality being measured.

      • 0 indicates absence of that quality.

Overview of the Pearson r

  • The Pearson r Correlation Coefficient provides a numeric expression of the strength and direction of the relationship between two variables: one independent and one dependent.

    • Correlation Coefficient Definition: A statistical index ranging between -1 and 1 that indicates the degree of association between two variables.

  • Values of the Pearson r:

    • Range: -1 to 1.

    • A value of 1 indicates a perfect positive correlation, and -1 indicates a perfect negative correlation.

Zero-Order Pearson r

  • Formula for Zero-Order Pearson r Correlation Coefficient:

    • r=((XXˉ)(YYˉ))(XXˉ)2(YYˉ)2r = \frac{(\sum (X - \bar{X})(Y - \bar{Y}))}{\sqrt{\sum (X - \bar{X})^2 \sum (Y - \bar{Y})^2}}

    • Where:

    • $X$ = The raw score of the case for the Independent Variable

    • $\bar{X}$ = The mean of the Independent Variable

    • $Y$ = The raw score of the case for the Dependent Variable

    • $\bar{Y}$ = The mean of the Dependent Variable

  • Use of technology:

    • Software like Excel provides zero-order coefficients with ease.

Example of Zero-Order Pearson r

  • Variables studied:

    1. Sex of the driver of a vehicle stopped by the police (IV)

    2. Number of MPH over the speed limit (DV)

    • Hypothesis: Males drive faster than females due to biological or sociocultural factors.

    • Coding:

    • Male = 1

    • Female = 0

  • Zero-Order Pearson r Output:

    • MPH Over (Y) for Male (x):

    • Pearson's r: 0.217945

Statistics of Correlation

  • Types of correlation:

    1. Positive Correlation:

    • As the independent variable increases (x-axis), the dependent variable also increases (y-axis).

    • Represented by an upward slope on a scatterplot.

    1. Negative Correlation:

    • As the independent variable increases (x-axis), the dependent variable decreases (y-axis).

    • Represented by a downward slope.

    1. No Correlation:

    • No discernible relationship, illustrated by a horizontal line in scatterplots.

Statistical Control

  • Statistical Control Definition:

    • A method in statistics that involves understanding the relationship between two primary variables while controlling for the effects of one or more additional variables.

    • Provides a clearer insight into the independent variable’s effect on the dependent variable.

  • Importance of Statistical Control:

    • Allows researchers to isolate the effect of the independent variable (IV) after accounting for potential confounding factors.

Example of Statistical Control

  • Researcher examining whether male drivers speed more than female drivers must consider the potential influence of driver age on speed.

    • Variables of Interest:

    • DV: MPH over the speed limit

    • IV: Sex of the driver (coded as Male=1, Female=0)

    • Control Variable: Age of the driver

  • Assumed Relationship:

    • Relationship may be spurious as younger drivers tend to speed more.

Language of Statistical Control

  • Multivariate Research:

    • Involves more than one causal or predictor variable.

  • Bivariate Relationship:

    • A direct relationship between only two variables.

  • Correlation Coefficient Types:

    1. Normal Correlation Coefficient: Strength without controlling variables.

    2. Partial Correlation Coefficient: Strength controlling for additional variables.

    3. Ordered Correlation: Specifies strength with regard to bivariate relationships factoring in control variables.

    4. Zero-Order Correlation: No controls; a simple bivariate relationship.

    5. First-Order Correlation: One control variable included.

    6. Second-Order Correlation: Two control variables included.

    7. Third-Order Correlation: Three control variables included.

  • Variables Explained:

    • X Variable: Independent variable

    • Y Variable: Dependent variable

    • Z Variable: Control variable

Partial Correlation Coefficient

  • Definition: Allows analysis of the relationship between IV (X) and DV (Y) while controlling for a third variable (Z).

  • Particular Focus: Assessing the impact of sex of drivers on speed while subtracting the age variable’s influence.

  • Example Values:

    • IV: Sex of the driver (X)

    • DV: MPH over the speed limit (Y)

    • Control Variable: Age of the driver (Z)

Calculating Partial Correlation for Statistical Control

  • Provided formula: r<em>XY.Z=r</em>XYr<em>XZr</em>YZ(1r<em>XZ2)(1r</em>YZ2)r<em>{XY.Z} = \frac{r</em>{XY} - r<em>{XZ} * r</em>{YZ}}{\sqrt{(1 - r<em>{XZ}^2)(1 - r</em>{YZ}^2)}}

    • Example output:

    • Partial Pearson r = 0.203

    • Indicates relationship strength of driver sex and speeding while controlling for age.

Multiple Control Variables in Partial Correlation

  • The analysis isn’t limited to one control variable; it can encompass multiple controls.

    • Examples of controls:

    1. Age 24 and under (Z1)

    2. Number of prior tickets in the past 12 months (Z2)

    3. Time of day (Z3)

Testing for Significance

  • To determine if the computed Pearson r from a sample accurately reflects the population, hypothesis testing must be performed.

  • Null Hypothesis (H0): Pearson r equals 0 (no correlation).

  • Alternative Hypothesis (H1): Pearson r does not equal 0 (some correlation).

Necessary Information for Decision Making

  • Five key elements are needed:

    1. Value of Pearson's r

    2. Sample size (

    3. T-Value

    4. Degrees of freedom

    5. P-Value

Sample Size and Degrees of Freedom Calculation

  • Participation in calculations:

    • Degrees of freedom (DF) = Sample Size - Number of Variables

    • Formula:
      t=rn21r2t = \frac{r * \sqrt{n - 2}}{\sqrt{1 - r^2}}

  • Calculating significance through statistical modeling represents the correlation measure more accurately than mere observational data.