Inferential Statistics: The Pearson r Coefficient
Criminal Justice Inferential Statistics Lectures
CJ 3320 Criminal Justice Statistics Course Notes
Dr. Kevin Buckler
The Pearson r Coefficient
The Pearson r coefficient is a type of Degree of Association (DoA) measure.
Degree of Association (DoA) Definition: A statistical measure that quantifies the relationship between two variables.
Uses of the Pearson r:
Conditions for Use:
When the independent variable (IV) is interval and the dependent variable (DV) is also interval.
When the IV is a nominal dichotomous variable (coded as 0 and 1) and the DV is interval.
Nominal Dichotomous Variable Coding:
1 indicates presence of the quality being measured.
0 indicates absence of that quality.
Overview of the Pearson r
The Pearson r Correlation Coefficient provides a numeric expression of the strength and direction of the relationship between two variables: one independent and one dependent.
Correlation Coefficient Definition: A statistical index ranging between -1 and 1 that indicates the degree of association between two variables.
Values of the Pearson r:
Range: -1 to 1.
A value of 1 indicates a perfect positive correlation, and -1 indicates a perfect negative correlation.
Zero-Order Pearson r
Formula for Zero-Order Pearson r Correlation Coefficient:
Where:
$X$ = The raw score of the case for the Independent Variable
$\bar{X}$ = The mean of the Independent Variable
$Y$ = The raw score of the case for the Dependent Variable
$\bar{Y}$ = The mean of the Dependent Variable
Use of technology:
Software like Excel provides zero-order coefficients with ease.
Example of Zero-Order Pearson r
Variables studied:
Sex of the driver of a vehicle stopped by the police (IV)
Number of MPH over the speed limit (DV)
Hypothesis: Males drive faster than females due to biological or sociocultural factors.
Coding:
Male = 1
Female = 0
Zero-Order Pearson r Output:
MPH Over (Y) for Male (x):
Pearson's r: 0.217945
Statistics of Correlation
Types of correlation:
Positive Correlation:
As the independent variable increases (x-axis), the dependent variable also increases (y-axis).
Represented by an upward slope on a scatterplot.
Negative Correlation:
As the independent variable increases (x-axis), the dependent variable decreases (y-axis).
Represented by a downward slope.
No Correlation:
No discernible relationship, illustrated by a horizontal line in scatterplots.
Statistical Control
Statistical Control Definition:
A method in statistics that involves understanding the relationship between two primary variables while controlling for the effects of one or more additional variables.
Provides a clearer insight into the independent variable’s effect on the dependent variable.
Importance of Statistical Control:
Allows researchers to isolate the effect of the independent variable (IV) after accounting for potential confounding factors.
Example of Statistical Control
Researcher examining whether male drivers speed more than female drivers must consider the potential influence of driver age on speed.
Variables of Interest:
DV: MPH over the speed limit
IV: Sex of the driver (coded as Male=1, Female=0)
Control Variable: Age of the driver
Assumed Relationship:
Relationship may be spurious as younger drivers tend to speed more.
Language of Statistical Control
Multivariate Research:
Involves more than one causal or predictor variable.
Bivariate Relationship:
A direct relationship between only two variables.
Correlation Coefficient Types:
Normal Correlation Coefficient: Strength without controlling variables.
Partial Correlation Coefficient: Strength controlling for additional variables.
Ordered Correlation: Specifies strength with regard to bivariate relationships factoring in control variables.
Zero-Order Correlation: No controls; a simple bivariate relationship.
First-Order Correlation: One control variable included.
Second-Order Correlation: Two control variables included.
Third-Order Correlation: Three control variables included.
Variables Explained:
X Variable: Independent variable
Y Variable: Dependent variable
Z Variable: Control variable
Partial Correlation Coefficient
Definition: Allows analysis of the relationship between IV (X) and DV (Y) while controlling for a third variable (Z).
Particular Focus: Assessing the impact of sex of drivers on speed while subtracting the age variable’s influence.
Example Values:
IV: Sex of the driver (X)
DV: MPH over the speed limit (Y)
Control Variable: Age of the driver (Z)
Calculating Partial Correlation for Statistical Control
Provided formula:
Example output:
Partial Pearson r = 0.203
Indicates relationship strength of driver sex and speeding while controlling for age.
Multiple Control Variables in Partial Correlation
The analysis isn’t limited to one control variable; it can encompass multiple controls.
Examples of controls:
Age 24 and under (Z1)
Number of prior tickets in the past 12 months (Z2)
Time of day (Z3)
Testing for Significance
To determine if the computed Pearson r from a sample accurately reflects the population, hypothesis testing must be performed.
Null Hypothesis (H0): Pearson r equals 0 (no correlation).
Alternative Hypothesis (H1): Pearson r does not equal 0 (some correlation).
Necessary Information for Decision Making
Five key elements are needed:
Value of Pearson's r
Sample size (
T-Value
Degrees of freedom
P-Value
Sample Size and Degrees of Freedom Calculation
Participation in calculations:
Degrees of freedom (DF) = Sample Size - Number of Variables
Formula:
Calculating significance through statistical modeling represents the correlation measure more accurately than mere observational data.