FMPH 102 Study Notes: Correlation of Two Continuous Variables

FMPH 102: Biostatistics in Public Health

Course Overview

Focus on the analysis of correlation between two continuous variables in health statistics.

Topics Covered

Continuous Variables Inference for Means:
- One sample
- Two paired samples
- Two independent samples
- Three or more independent samples
Correlation and Linear Regression
Binary Variables Inference for Proportions:
- One sample
- Two paired samples
- Two independent samples
- Three or more independent samples

Understanding Correlation

Definition

Correlation assesses the relationship between two continuous variables.
Key Principle: "Everybody who went to the moon has eaten chicken"—implies correlation can sometimes suggest spurious relationships.

Correlation of Continuous Variables

Correlation examines how two continuous variables for the same sample of individuals vary together.
Examples of Correlation Investigations:
- Is Body Mass Index (BMI) correlated with total exercise time per week?
- Is LDL cholesterol associated with the total amount of cholesterol in the diet?
- Is blood pressure related to total sodium intake?

Properties of Correlation

Form of Correlation

Investigates whether the relationship is linear, curved, or random.

Direction of Correlation

Positive Association: Upward trend indicating that as one variable increases, the other does too.
Negative Association: Downward trend indicating that as one variable increases, the other decreases.
Flat Trend: No correlation, indicating no association between the variables.

Strength of Correlation

Measured by how closely data points adhere to the trend line or curve in a scatterplot.
Example:
- Values of variable x for person A
- Values of variable y for person A

Visualization and Example

Jackson Study Example

Utilized a scatterplot of percentage of body fat (PBF) and BMI with n = 655 participants.
Observed a positive association between PBF and BMI:
- Individuals with high PBF tend to exhibit high BMI.
- Relationship appears approximately linear but likely has a quadratic form expressed as:
  $y = a(x+b)^2$

Strength of Correlation

Pearson’s Correlation Coefficient (r)

Definition:
- The population correlation coefficient, denoted by $corr(X,Y) = \rho$ , estimated by the sample correlation $corr(x,y) = r$ .
Assumptions:
- Both X and Y are normally distributed.
- There must be a linear association between X and Y.
Range:
- Values lie between -1 and 1:
- If X and Y are independent: $\rho = 0$ or $r \approx 0$ .
- Positively correlated: \rho > 0 (or r > 0).
- Negatively correlated: \rho < 0 (or r < 0).
- Strong correlation: $|\rho| \geq 0.5$ or $|r| \geq 0.5$ .
 - Often calculate $r^2$ : strong correlation if $r^2 \geq 0.25$ .

Visualization of Pearson's Correlation Coefficients

Various scatterplots visualize correlation coefficients where:
- $r = 0.9$ , $r = -0.8$ , $r = 0.7$ , $r = -0.6$ , etc.

Hypothesis Testing with Pearson’s Correlation Coefficient

Process

As with means and proportions, hypothesis tests can be conducted using the correlation coefficient.
Null Hypothesis (H0): No correlation between variables $X$ and $Y$ ( $\rho = 0$ ).
Alternative Hypothesis (HA): There is a correlation between variables $X$ and $Y$ .

Covariance

Definition:
- $cov(X,Y)$ or $S_{XY}$ measures how random variables X and Y relate to one another, capturing the degree to which two variables change together.
Positive Covariance: Indicates direct relationship.
Negative Covariance: Indicates inverse relationship.

Formula

$COV(X,Y) = S{XY} = \frac{1}{N-1}\sum{i=1}^{N} (xi - \bar{x})(yi - \bar{y})$ .
Covariance can range from negative to positive infinity.

Covariance Characteristics

Scale Effect:
- Covariance is sensitive to the scale of the variables—the unit is the product of the units of both variables.
Adjustment Example:
- If the y-values are doubled: changing $(xi, yi)$ to $(xi, 2yi)$ affects covariance.
Important conclusion: Covariance is not standardized, meaning it retains the scale of measurement.

Correlation vs. Covariance

Correlation is a scaled form of covariance, making it unit-less and allowing comparison across different datasets.

Assumptions of Pearson's Correlation Coefficient r

Assumptions

Normal distribution of both variables X and Y.
Linear relationship between X and Y must be true.

Cases When Assumptions Fail

If assumptions do not hold, use a nonparametric (rank-based) test such as Spearman's Correlation.

Spearman’s Correlation Coefficient $\rhoS, rS$

Definition:
- Spearman's correlation coefficient considers the ranks of X and Y, rather than their raw values.
Use Cases:
- For non-linear relationships.
- When data are not normally distributed.
- Robust in various conditions.
Hypothesis for rank correlation:
- Null Hypothesis ( $H0: \rhoS = 0$ ).
- Test statistic follows the Student's t-distribution with $df = n - 2$ .

Conducting Pearson & Spearman Correlations in SPSS

Step-by-Step Process

Access the dataset of interest (e.g., weight & energy intake from Dennison.sav).
Generate scatter plots:
- SPSS: Graph > Legacy Dialogs > Scatter/Dot > Simple Scatter > Define
  - Set Y-Axis to weight and X-Axis to energy.
Statistical Correlation Analysis:
- SPSS: Analyze > Correlate > Bivariate…
  - Input weight and energy, choose Pearson and Spearman correlations.

Examples and Consistency of Results

Case of Inconsistent Results

When testing correlation between bilirubin and creatinine:
- Pearson showed significant correlation $r = 0.198$ , p = 0.004, while Spearman suggested no correlation $r_S = 0.021$ , p = 0.76.
Conclusion: Pearson correlation should be validated for normal distribution; otherwise, rely on Spearman's coefficient.

Summary of Key Concepts

Final Insights

Correlation of continuous variables can be assessed using:
- Pearson correlation coefficient (for normally distributed data)
- Spearman rank correlation (for non-normal data or non-linear relationships).
Statistical tests apply to both Spearman and Pearson correlations via:
- $tstat = \frac{r}{SE_r}$
- $tstat = \frac{rS}{SE{r_S}}$

Hypothesis Testing

Effective statistical test design requires defining assumptions and ensuring methods are appropriate for the underlying data characteristics, validating results, and correctly interpreting correlation as not implying causation.

FMPH 102 Study Notes: Correlation of Two Continuous Variables

FMPH 102: Biostatistics in Public Health

Course Overview

Topics Covered

Understanding Correlation

Definition

Correlation of Continuous Variables

Properties of Correlation

Form of Correlation

Direction of Correlation

Strength of Correlation

Visualization and Example

Jackson Study Example

Strength of Correlation

Pearson’s Correlation Coefficient (r)

Visualization of Pearson's Correlation Coefficients

Hypothesis Testing with Pearson’s Correlation Coefficient

Process

Covariance

Formula

Covariance Characteristics

Correlation vs. Covariance

Assumptions of Pearson's Correlation Coefficient r

Assumptions

Cases When Assumptions Fail

Spearman’s Correlation Coefficient ρ<em>S,r</em>S\rho<em>S, r</em>Sρ<em>S,r</em>S

Conducting Pearson & Spearman Correlations in SPSS

Step-by-Step Process

Examples and Consistency of Results

Case of Inconsistent Results

Summary of Key Concepts

Final Insights

Hypothesis Testing

Spearman’s Correlation Coefficient $\rho<em>S, r</em>S$