wk4 IRM Pearson correlation and t-tests (transcript-based)

Homoscedasticity (homogeneity of variance)

Definition: homoscedasticity means the variance of the dependent variable is roughly the same across the range of the independent variable or along the correlation line.
Visual cues:
- Left plot: data appear evenly scattered around the correlation line → homoscedastic.
- Right plot: variance increases or decreases as you move along the line (data fan out) → heteroskedastic/unequal variance.
Why it matters: a common assumption of parametric tests (e.g., Pearson correlation, t-tests); violations can affect test accuracy.
Statistical testing: Levene's test assesses equality of variances.
- Output: an F statistic and a p-value (p-value used to decide if variances are significantly different).
- Interpretation:
- If p < 0.05 → violation of homogeneity of variance (heteroskedasticity).
- If p > 0.05 → no violation (variance considered homogeneous).
Related tests mentioned: Shapiro–Wilk test (normality) is analogous in that it has its own statistic (W) and p-value; Levene's test has F statistic and p-value.
Remedies if violated:
- For t tests: use Welch's t-test (does not assume equal variances).
- For correlation: if obvious groupings appear, consider Spearman’s correlation (nonparametric) rather than Pearson.
Practical note: often assessed visually when exploring data; Levene's test provides a formal statistical check.

Independence

Definition: observations are independent; no two observations affect each other.
Why it matters: fundamental assumption for most parametric tests; not directly tested by Levene’s or Shapiro–Wilk—it's a logic/experimental design check.
Examples from transcripts:
- Example 1 (not independent): gym data from Mondays and Wednesdays with 3 participants appearing in both groups → violation of independence.
- Example 2 (not independent): one instructor teaches multiple classes (old and new approaches) → potential information flow between groups, violating independence.
- Example 3 (independent): survey where each member is asked once about pool usage → independence holds.
Consequences: if independence is violated, you should adjust the data (e.g., exclude repeated participants) or use models that account for relatedness/clustering.

Pearson correlation (r)

Purpose: assesses a linear (straight-line) relationship between two variables.
Assumptions:
- Linear relationship: scatter plot should show a straight-line pattern; non-linear patterns (curves, U-shapes, etc.) violate this assumption.
- Normality (parametric data) for the variables involved.
- Homogeneity of variance: equal variance across the range of the relationship.
- Independence of observations.
- Typically both variables are continuous; with one continuous and one dichotomous variable Pearson can be computed but it is more common to use a t-test or, in some contexts, a point-biserial approach; the transcript notes that with a dichotomous predictor you’d usually use a t-test instead.
Range and interpretation:
- Range: $r \in [-1, 1]$
- r = 0 indicates no linear relationship; r = 1 or r = -1 indicates a perfect linear relationship (positive or negative).
- Positive r indicates a direct relationship; negative r indicates an inverse relationship.
- The strength of r is an effect size for the linear relationship (larger |r| means a stronger relationship).
Visualization importance: always visualize with a scatter plot to assess linearity; four illustrated cases with the same r value (0.82) can differ in whether the relationship is linear.
Example of reporting correlation:
- Reporting format (APA-style guidance): include r, degrees of freedom, and p-value.
- Example from transcript: $r(38) = 0.34,\ p = 0.009.$ (statistically significant, small-to-moderate positive relationship)
- Another example: r(196) = -0.75,\ p < 0.001. (statistically significant, strong negative relationship)
Interpretation caveat:
- Even with a high |r|, if the relationship is non-linear, Pearson correlation can be misleading. Visualization is essential.
Effect size note:
- r is a measure of effect size for the linear association.
- Thresholds for small/medium/large are arbitrary and come from textbooks; different sources may use different cutoffs.
Reporting recommendations (APA): Always report degrees of freedom and p-value alongside r.
When to choose Spearman: if data show obvious groupings or non-normality, Spearman’s correlation (rank-based) may be more appropriate.

When to use Spearman’s correlation

Used as a nonparametric alternative when Pearson’s assumptions (especially linearity and normality) are violated or when data have obvious non-linear patterns or outliers.
The transcript suggests: if you see obvious groupings in the data, you probably should consider Spearman’s correlation instead of Pearson.

Between-subjects (independent samples) t-test

Purpose: compare the means of two independent groups on a continuous dependent variable.
Data configuration:
- Independent variable is dichotomous (two groups).
- Dependent variable is continuous.
Assumptions:
- Dependent variable is parametric (approximately normally distributed within each group).
- Independent variable is dichotomous with two groups.
- Independence of observations (no participant belongs to both groups).
- Homogeneity of variances across the two groups (often tested with Levene’s test).
Alternatives: if the dependent variable is not parametric or variances are unequal, consider nonparametric tests (not covered in detail in transcript).
Within-subjects vs between-subjects:
- Between-subjects (independent samples) t-test: different participants in each group.
- Within-subjects (paired/dependent) t-test: same participants measured at two time points or under two conditions.
Note from transcript: focus for this course is on between-subjects t-tests.

Effect size and reporting for t-tests

Effect size metric: Cohen’s d.
- Definition: $d = \frac{\bar{X}1 - \bar{X}2}{sp}$ where $sp = \sqrt{\frac{(n1 - 1)s1^2 + (n2 - 1)s2^2}{n1 + n2 - 2}}$ is the pooled standard deviation.
- Interpretation: small, medium, or large effects (thresholds vary by source; transcript notes these are textbook cutoffs and are arbitrary).
Test statistic and p-value reporting:
- Standard format (APA-like): $t(\text{df}) = value,\ p = value,\ d = value.$
- Example 1 from transcript: $t(24) = 5.23,\ p = 0.002,\ d = 1.09.$
- Interpretation: those in the heavy TV-watching group were statistically more fatigued than those in the low TV-watching group with a large effect size.
- Example 2 from transcript: $t(187) = -2.31,\ p = 0.017,\ d = 0.04.$
- Interpretation: heavy TV-watching group had statistically significantly less sleep than the low TV-watching group with a small effect size.
Signs of t-statistic:
- Positive t indicates group 1 mean > group 2 mean (depending on labeling).
- Negative t indicates group 1 mean < group 2 mean.
Note on Welch’s t-test:
- Used when Levene’s test indicates unequal variances; t-statistic and df adjust accordingly (Welch-Satterthwaite df).

Practical workflow and reporting guidelines (based on transcript)

Data exploration:
- Plot scatter plots for correlations to assess linearity and homoscedasticity.
- Check normality (Shapiro–Wilk test) as part of normality assessment for parametric tests.
- Use Levene’s test to formally test equality of variances for t-tests.
Choice of test based on data characteristics:
- If two continuous variables with a linear relationship and met assumptions, use Pearson correlation.
- If the relationship is non-linear or variances are non-homogeneous, consider Spearman’s correlation.
- For comparing two independent groups on a continuous outcome with two groups: use independent-samples t-test (Welch’s variant if variances differ).
- If there are repeated measures or the same individuals across conditions: consider paired-sample t-test (not the focus in this transcript).
Reporting conventions (APA style hints):
- Correlation: r(df) = value, p = value; include context (what variables were correlated).
- t-tests: t(df) = value, p = value, d = value; explicitly state which group is which if needed for interpretation.
- Always report effect sizes along with p-values to convey practical significance.
Ethical/practical implications:
- Accurate reporting of p-values and effect sizes helps avoid misinterpretation of results.
- Awareness of assumptions reduces risk of drawing invalid conclusions from parametric tests.

Quick recap of key concepts and relationships

Homoscedasticity and independence are core assumptions for many parametric tests; violations are checked via visuals and Levene’s/Shapiro–Wilk tests, with remedies like Welch’s t-test or Spearman’s correlation when appropriate.
Pearson correlation measures linear associations between two variables and is sensitive to nonlinearity; it is also used as an effect size measure, with reporting that includes r and p-values; Spearman’s correlation is a nonparametric alternative for non-linear or non-normally distributed data.
Between-subjects t-tests compare means of two independent groups on a continuous outcome, requiring normality, independence, and equal variances (Levene’s test). When variances differ, Welch’s t-test is preferred.
Cohen’s d provides a standardized measure of effect size for t-tests; reporting includes t, df, p, and d.
In all cases, visual inspection (scatter plots, distribution plots) complements statistical tests to ensure appropriate model choice and interpretation.

Examples to study (from transcript)

Independence violations:
- Monday vs Wednesday gym data with 3 overlapping participants → not independent; would need to remove overlaps or adjust design.
- One instructor teaching multiple classes across methods → potential dependence between groups; design may confound methods with instructor effects.
Independent samples t-test examples:
- Example 1: t(24) = 5.23, p = 0.002, d = 1.09 → heavy TV-watching group more fatigued than low TV group (large effect).
- Example 2: t(187) = -2.31, p = 0.017, d = 0.04 → heavy TV-watching group slept less than low TV group (small effect).
Correlation reporting examples:
- Example 1: r(38) = 0.34, p = 0.009 → significant, small-to-moderate positive relationship (hours of TV vs fatigue).
- Example 2: r(196) = -0.75, p < 0.001 → significant, strong negative relationship (hours of TV vs sleep duration).

Note: The above notes reflect the concepts and examples described in the transcript, including the emphasis on visual data checks, the role of Levene’s and Shapiro–Wilk tests, and the reported formats for correlation and t-tests using the APA style conventions referenced in the material.