Inferential Statistics: The Pearson r Coefficient

The Pearson r coefficient is a type of Degree of Association (DoA) measure.
- Degree of Association (DoA) Definition: A statistical measure that quantifies the relationship between two variables.
Uses of the Pearson r:
- Conditions for Use:
1. When the independent variable (IV) is interval and the dependent variable (DV) is also interval.
2. When the IV is a nominal dichotomous variable (coded as 0 and 1) and the DV is interval.
  - Nominal Dichotomous Variable Coding:
  - 1 indicates presence of the quality being measured.
  - 0 indicates absence of that quality.

The Pearson r Correlation Coefficient provides a numeric expression of the strength and direction of the relationship between two variables: one independent and one dependent.
- Correlation Coefficient Definition: A statistical index ranging between -1 and 1 that indicates the degree of association between two variables.
Values of the Pearson r:
- Range: -1 to 1.
- A value of 1 indicates a perfect positive correlation, and -1 indicates a perfect negative correlation.

Formula for Zero-Order Pearson r Correlation Coefficient:
- $r = \frac{(\sum (X - \bar{X})(Y - \bar{Y}))}{\sqrt{\sum (X - \bar{X})^2 \sum (Y - \bar{Y})^2}}$
- Where:
- $X$ = The raw score of the case for the Independent Variable
- $\bar{X}$ = The mean of the Independent Variable
- $Y$ = The raw score of the case for the Dependent Variable
- $\bar{Y}$ = The mean of the Dependent Variable
Use of technology:
- Software like Excel provides zero-order coefficients with ease.

Variables studied:
1. Sex of the driver of a vehicle stopped by the police (IV)
2. Number of MPH over the speed limit (DV)
- Hypothesis: Males drive faster than females due to biological or sociocultural factors.
- Coding:
- Male = 1
- Female = 0
Zero-Order Pearson r Output:
- MPH Over (Y) for Male (x):
- Pearson's r: 0.217945

Statistical Control Definition:
- A method in statistics that involves understanding the relationship between two primary variables while controlling for the effects of one or more additional variables.
- Provides a clearer insight into the independent variable’s effect on the dependent variable.
Importance of Statistical Control:
- Allows researchers to isolate the effect of the independent variable (IV) after accounting for potential confounding factors.

Researcher examining whether male drivers speed more than female drivers must consider the potential influence of driver age on speed.
- Variables of Interest:
- DV: MPH over the speed limit
- IV: Sex of the driver (coded as Male=1, Female=0)
- Control Variable: Age of the driver
Assumed Relationship:
- Relationship may be spurious as younger drivers tend to speed more.

Multivariate Research:
- Involves more than one causal or predictor variable.
Bivariate Relationship:
- A direct relationship between only two variables.
Correlation Coefficient Types:
1. Normal Correlation Coefficient: Strength without controlling variables.
2. Partial Correlation Coefficient: Strength controlling for additional variables.
3. Ordered Correlation: Specifies strength with regard to bivariate relationships factoring in control variables.
4. Zero-Order Correlation: No controls; a simple bivariate relationship.
5. First-Order Correlation: One control variable included.
6. Second-Order Correlation: Two control variables included.
7. Third-Order Correlation: Three control variables included.
Variables Explained:
- X Variable: Independent variable
- Y Variable: Dependent variable
- Z Variable: Control variable

Definition: Allows analysis of the relationship between IV (X) and DV (Y) while controlling for a third variable (Z).
Particular Focus: Assessing the impact of sex of drivers on speed while subtracting the age variable’s influence.
Example Values:
- IV: Sex of the driver (X)
- DV: MPH over the speed limit (Y)
- Control Variable: Age of the driver (Z)

Provided formula: $r{XY.Z} = \frac{r{XY} - r{XZ} * r{YZ}}{\sqrt{(1 - r{XZ}^2)(1 - r{YZ}^2)}}$
- Example output:
- Partial Pearson r = 0.203
- Indicates relationship strength of driver sex and speeding while controlling for age.

The analysis isn’t limited to one control variable; it can encompass multiple controls.
- Examples of controls:
1. Age 24 and under (Z1)
2. Number of prior tickets in the past 12 months (Z2)
3. Time of day (Z3)

To determine if the computed Pearson r from a sample accurately reflects the population, hypothesis testing must be performed.
Null Hypothesis (H0): Pearson r equals 0 (no correlation).
Alternative Hypothesis (H1): Pearson r does not equal 0 (some correlation).

Participation in calculations:
- Degrees of freedom (DF) = Sample Size - Number of Variables
- Formula:
  $t = \frac{r * \sqrt{n - 2}}{\sqrt{1 - r^2}}$
Calculating significance through statistical modeling represents the correlation measure more accurately than mere observational data.