EBP Lec 13: Correlations and Comparisons

Opening Remarks

  • Good Morning Greeting: The speaker welcomes everyone to the lecture, expressing enthusiasm and encouragement for the day's content.

  • Structure of the Day: Mention of a four-hour lecture stint, indicating that future lectures will involve more interactive elements, including presentations and revisions.

  • Previous Topics Recap: Review of the chocolate experiment and fundamentals of inferential statistics including:

    • Taking a random sample from a population

    • Making inferences about the population based on sample data

    • Introduction to the Central Limit Theorem for statistical analysis.

Key Concept: Correlation

  • Definition of Correlation: A relationship between two variables.

  • **Types of Correlation:

    • Positive Correlation:**

    • As one variable increases, the other also increases; similarly, if one decreases, the other decreases.

    • Example: Increased chocolate consumption corresponds with the number of Nobel Prize laureates.

  • Example Comic Reference:

    • The comic emphasizes that "correlation does not imply causation."

Illustrative Examples of Correlation vs. Causation

  1. Ice Cream and Drowning Example:

    • An increase in ice cream sales correlates with an increase in drownings, but sunny weather is the true underlying cause.

    • The speaker emphasizes the common logical mistake of assuming causation from correlation.

  2. Married vs. Single Men:

    • Men who live longer are often married; however, healthier men are more likely to get married.

    • This instance demonstrates how statistical relationships can be misinterpreted.

  3. Short-Sighted Children:

    • A study finds a correlation between sleeping with lights on and increased short-sightedness, later debunked as a genetic issue.

    • -

  4. Self-Esteem and Academic Performance:

    • Initial conclusions that self-esteem causes good grades were reversed; in reality, good grades increase self-esteem.

Important Distinction: Statistical Significance

  • Understanding Correlation Mistakes: The speaker emphasizes the dangers of erroneously inferring causation from correlation.

  • Hard Limitations: Illustrates caution while interpreting correlations, noting such correlations may hint at deeper issues but require additional analysis.

Types of Correlation

  • Positive Correlation:

    • Example provided: Scores in anatomy and physiology increasing together.

  • Negative Correlation:

    • Conceptually opposite where an increase in one variable results in a decrease in the other; example given: hearing threshold levels correlating with speech discrimination scores.

  • No Correlation:

    • Random data with no direct relationship, exemplified using unrelated variables such as house numbers and age.

  • Curvilinear Relationships:

    • Explanation that some data may show relationships that are not linear (like health metrics against weight).

Measuring Correlation

  • Pearson’s Correlation Coefficient (r):

    • Represents the strength of the correlation:

    • Interpretation of r:

      • $ r = 1$: Perfect positive correlation

      • $ r = -1$: Perfect negative correlation

      • $ r = 0$: No correlation

  • Strength of Correlation

    • Value threshold interpretations:

    • 0.00-0.30: Weak

    • 0.30-0.70: Moderate

    • 0.70-1.00: Strong

Reporting Correlation Statistics

  • Range of required statistics includes direction, strength, and significance (p-value).

  • Example of Reporting: "There is a significant strong positive correlation between anatomy and physiology scores."

Apparent Counterpoints to Correlation

  • Examples of situations where correlation isn’t indicative of causation for clarity.

Regression Analysis Overview

  • Correlation establishes relationships; regression is used for predictions based on correlated data.

  • Regression Equation: The example provided: physiology score = 4.5 + (0.961 * anatomy score).

  • Important distinction between independent variable (anatomy score) and dependent variable (physiology score).

Assumptions for Linear Regression Analysis

  • Need to work with interval or ratio data.

  • Assumption of independence of observations.

  • Assumption of linear relationships and normal distribution.

  • Homoscedasticity: Equal variance among data points.

Conducting Regression Analysis using Software (Minitab)

  • Steps outlined for executing regression in Minitab.

  • Interpretation includes:

    • R value and its explanation of variance accounted for.

    • All reported outputs including significance of regression statistics.

Conclusion and Transition to T-tests

  • Transition into t-tests context:

    • The speaker explains the necessity for comparing two groups with categorical independent variables that can't be handled by correlation or regression methods.

  • Introduction to different types of t-tests: one-sample, two-sample, and paired t-tests, while discussing methods of executing them in practical scenarios.

Summary Guidance for a Research Assignment

  • Group assignments announced: exploration of a provided research question, consultation of literature, data analysis, and integration of findings, culminating in a report.

  • Reminder of tools and community resources available for student use.


Correlation – Concept and Purpose

Definition

  • Correlation describes the relationship between two variables — how one changes with respect to the other.

  • Can be:

    • Positive correlation: both increase or decrease together.

    • Negative correlation: one increases while the other decreases.

    • No correlation: no consistent relationship.

    • Curvilinear: relationship exists but not linear.

Key Reminder

“Correlation does not imply causation.”

  • Example (video):

    • Ice cream sales vs drownings — both increase in summer, but caused by weather, not each other.

    • Marriage and lifespan — healthier men more likely to marry, not vice versa.

    • Children’s night lights and myopia — short-sighted parents leave lights on; genetics, not lighting, causes short-sightedness.

    • Self-esteem and grades — good grades → high self-esteem, not the reverse.

Be cautious: correlations can be real but misleading if confounding variables exist.


3. Types of Correlation

Type

Pattern

Example

Positive

As one increases, the other increases.

Anatomy vs Physiology scores — students who perform well in one tend to perform well in the other.

Negative

As one increases, the other decreases.

PTA thresholds vs Speech discrimination — poorer hearing → lower speech scores.

No Correlation

No pattern.

House number vs Age.

Curvilinear

Non-linear pattern.

Weight vs Health — best at optimal weight, declines if too low/high.

In this course, focus on linear relationships (straight-line trends).


4. Pearson’s Correlation Coefficient (r)

Definition

  • Quantifies strength and direction of linear relationship.

  • Denoted by r (ranges from –1 to +1).

    • r = +1 → perfect positive linear relationship.

    • r = –1 → perfect negative linear relationship.

    • r = 0 → no relationship.

  • Works with interval or ratio data only.

Comparison

  • Spearman’s correlation (ρ) → used for ordinal data or non-parametric cases.

Strength Guidelines

r value range

Strength

0.00–0.19

Very weak

0.20–0.39

Weak

0.40–0.59

Moderate

0.60–0.79

Strong

≥ 0.80

Very strong

Real-world data rarely exceeds r = 0.8.


5. Three Aspects to Report

  1. Direction – Positive, Negative, or None.

  2. Strength – Weak/Moderate/Strong.

  3. Significance – p-value (< 0.05).

Important Note

  • Strong correlation ≠ necessarily significant if sample size is too small.

    • e.g., r = 0.98 but n = 3 → p = 0.127 (not significant).

  • Always report r and p together.


6. Pearson’s Correlation in Minitab

Procedure

  1. Open dataset (e.g., Anatomy vs Physiology).

  2. Go to Stat → Basic Statistics → Correlation.

  3. Select both variables.

  4. In “Graphs” options, tick “Show correlation and P value”.

Output Example

  • r = 0.87, p = .001
    Significant strong positive correlation between anatomy and physiology scores.

APA Reporting Style

  • Omit leading zero (APA rule).

  • Report as:

    r = .87, p < .001


7. Exercise: Height and Shoe Size

  • Task: Run correlation for height and shoe size using class data.

  • Result example:

    r = .87, p < .001
    → Strong, positive, significant correlation between height and shoe size.

  • Report p < .001 (never p = 0).

  • Use 2 decimal places (3 if needed for clarity).


8. Regression Analysis

Definition

  • Explores prediction between two variables once correlation exists.

  • Regression line (line of best fit):

    Y=A+BXY = A + BXY=A+BX

    • Y = dependent (predicted variable).

    • X = independent (predictor variable).

    • A = intercept.

    • B = slope (gradient).

Example:
Physiology score = 4.5 + 0.961 × Anatomy score
→ For every 1-point increase in anatomy, physiology rises by 0.961.


Assumptions of Linear Regression

Assumption

Meaning

Data level

Both X and Y are interval/ratio.

Independence

Observations are independent and randomly sampled.

Linearity

Relationship must be linear.

Normality

Data approximately normal.

Homoscedasticity

Equal variance of Y across all X values.

Violations → interpret with caution.


9. Output Interpretation

Key Metrics

Statistic

Interpretation

% of variance in Y explained by X (r²).

p-value

Whether regression model is significant.

F-statistic

From ANOVA table (used for APA reporting).

Example: R² = 0.75 → 75% of variance in Physiology explained by Anatomy.


Confidence vs Prediction Intervals

Interval Type

Meaning

Confidence interval (green lines)

Range where true mean response likely lies (±2 SE).

Prediction interval (purple lines)

Range where 95% of all individual values likely fall (±2 SD).

  • Prediction interval always wider than confidence interval.


10. Regression in Minitab

Steps

  1. Stat → Regression → Fitted Line Plot

  2. Set:

    • Response (Y): Dependent variable (e.g., Physiology)

    • Predictor (X): Independent variable (e.g., Anatomy)

  3. Under “Options,” tick:

    • Display confidence and prediction intervals.

Interpretation Example

  • R² = .75, F(1,18) = 53.16, p = .001
    → Anatomy scores significantly predict Physiology scores.

APA Reporting Format

Anatomy scores predicted physiology scores, R² = .75, F(1,18) = 53.16, p = .001.


Example: Prediction

  • Equation: Physiology = 4.46 + 0.9608 × Anatomy

  • For Anatomy = 80 → Predicted Physiology = 81.32


11. Practice Example: Shoe Size Predicting Height

  • Regression equation: Height = A + B × Shoe size

  • R² = .76, F(1,55) = 176.09, p < .001
    → Shoe size strongly predicts height.

Used as forensic example (estimate height from footprint size).


12. T-Tests Overview

Purpose

  • Compare means between two groups or a group vs known value.

  • Requires:

    • Categorical independent variable

    • Interval/ratio dependent variable

“T for Two” — T-tests compare two things.


13. Types of T-Tests

Type

When to Use

Example

One-sample

Compare sample mean to known value.

Mean Mars Bar weight vs 16 g standard.

Independent (Two-sample)

Compare two independent groups.

Mars vs Cadbury chocolate weights.

Paired (Dependent)

Compare two measures from same group.

Student’s Anatomy vs Physiology scores, or Summer vs Winter hours outdoors.


14. One-Sample T-Test

Example

  • Does the average Mars Bar weight differ from 16 g?

    • t-test result: t = 8.41, p < .001
      → Mean weight (17 g) significantly higher than standard.

Interpretation Steps:

  1. Check significance (p < .05).

  2. Compare means to determine direction.

  3. Report in APA:

    t(19) = 8.41, p < .001
    “Average Mars Bar (M = 17.0 g, SD = 1.2) heavier than recommended 16 g.”


15. Two-Sample (Independent) T-Test

Example

  • Do Cadbury chocolates differ in weight from Mars chocolates?

    • Cadbury (M = 14.4 g), Mars (M = 15.1 g)

    • t = –3.10, p = .002
      → Significant difference; Mars heavier.

APA Reporting

t(174) = –3.10, p = .002
“Mars chocolates (M = 15.14 g, SD = 1.2) were significantly heavier than Cadbury (M = 14.41 g, SD = 0.9).”

Notes:

  • df = N – 2 (two samples).

  • Sign of t doesn’t affect conclusion — only direction.


16. Paired (Dependent) T-Test

Concept

  • Compares two related measures:

    • Same group tested twice (e.g., before/after intervention).

    • Two comparable measures from same participants.

Example

  • Audiology students’ outdoor hours:

    • Summer (M = 5.3 h), Winter (M = 3.2 h)

    • p = .044 → significant difference.

“Students spend significantly more time outdoors in summer than winter.”

APA Reporting

t(9) = 2.26, p = .044

Notes

  • Removes inter-subject variability.

  • df = N – 1 (one sample measured twice).


17. One-Tailed vs Two-Tailed Tests

Test Type

Purpose

Example

Two-tailed

Tests for difference in either direction.

“Does Mars Bar weight differ from 16 g?”

One-tailed

Tests for difference in specific direction.

“Is Mars Bar weight less than 16 g?”

  • Two-tailed = more conservative, standard approach.

  • One-tailed = easier to reach significance but less rigorous.

Use one-tailed only with clear directional hypothesis.


18. Degrees of Freedom Summary

T-Test Type

df Formula

One-sample

N – 1

Two-sample

N – 2

Paired

N – 1


19. Minitab Procedures

Test

Menu Path

Key Inputs

Tips

One-sample

Stat → Basic Statistics → 1-sample t

Enter test mean (e.g., 16)

Use “Perform hypothesis test.”

Two-sample

Stat → Basic Statistics → 2-sample t

Select dependent var (e.g., Weight) + grouping var (e.g., Brand)

Tick “Assume equal variances.”

Paired

Stat → Basic Statistics → Paired t

Select paired columns

Works only with paired data.


20. Practical Activity Summary

  • One-sample: Sangas R Us rating vs city average (75).

  • Two-sample: Compare Sangas R Us vs Best Bagels ratings.

  • Paired: Sangas R Us ratings in January vs June (improvement check).

Each output should include:

  • Mean, SD

  • t-value, df, p-value

  • Interpretation of direction

  • APA-style report.


21. Final Key Points

Correlation = Relationship
Regression = Prediction
T-tests = Comparison

Always Report:

  • Direction (positive/negative or group mean)

  • Strength (r or R²)

  • Significance (p-value)

  • Interpretation (what it means in real context)

“Statistics are tools for meaning — numbers only matter when you can interpret them responsibly.”