Stats

Confidence Intervals (CIs)

What They Measure:

  • Provide a range of plausible values for a population parameter (e.g., mean, proportion).

  • Example: A 95% CI suggests we are 95% confident the true population mean falls within the interval.

Factors That Affect Them:

  • Sample Size (n): Larger n narrows the CI (increased precision).

  • Confidence Level: Higher confidence (e.g., 99%) widens the CI due to more certainty needed.

  • Variability (s): Higher variability increases the width of the CI.

Interpretation Pitfalls (Common Mistakes):

  • Misinterpreting the CI as containing the true mean with certainty (it’s probabilistic, not absolute).

  • Assuming a narrow CI always means accuracy—it depends on the data quality.


Hypothesis Testing

Purpose:

  • Test if observed data support a specific hypothesis about a population (e.g., mean difference, association).

Key Components:

  1. Null Hypothesis (H0): Assumes no effect or no difference (e.g., H012).

  2. Alternative Hypothesis (HA): Contradicts H0 (e.g., HA1≠μ2).

  3. P-value:
    Measures the probability of observing the data (or more extreme results) if H0H0
     is true.
    Small PP-value (<0.05) suggests evidence against H0
    .

Types of Tests:

1.                   t-tests:

  • One-sample: Tests if a population mean equals a fixed value.

  • Unpaired (Two-sample): Compares means of two independent groups.

  • Paired: Analyses mean difference in related groups (e.g., pre- and post-treatment).

2.                   Chi-squared (χ2): Tests independence in categorical data.

Common Mistakes:

  • Interpreting  P>0.05 as proof H0 is true—it only indicates lack of evidence against H0.

  • Ignoring assumptions (e.g., normality for t-tests, independence for χ2).


t-tests

What They Measure:

  • Test if means of groups are significantly different.

Assumptions:

  • Data are normally distributed.

  • For unpaired tests: Groups have similar variances.

  • Paired tests: Focus on the difference, not the individual data distributions.

Values & Interpretations:

  • t-statistic: Larger absolute value indicates stronger evidence against H0.

  • P-value: Small values (<0.05) suggest significant mean differences.

Common Mistakes:

  • Using unpaired tests for related data.

  • Ignoring normality of differences in paired tests.


Regression

What It Measures:

  • Relationship between an outcome (dependent variable) and predictors (independent variables).

Key Metrics:

  • Slope (β): Change in outcome for one unit change in predictor.

  • Residuals: Differences between observed and predicted values—used to assess model fit.

Assumptions:

  • Linear relationship between variables.

  • Homoscedasticity (constant variance of residuals).

  • Normally distributed residuals.

Common Mistakes:

  • Not plotting data before modelling.

  • Confusing prediction intervals (for individual values) with confidence intervals (for mean).


Chi-squared Test

What It Measures:

  • Tests if observed frequencies differ from expected frequencies under H0.

Assumptions:

  • Observations are independent.

  • Expected cell counts ≥5; otherwise, use Fisher’s Exact Test.

Values & Interpretations:

  • Large χ2: Greater discrepancy between observed and expected frequencies.

  • Small P-value: Evidence of association or difference.

Common Mistakes:

  • Applying the test to percentages instead of raw counts.

  • Ignoring the independence assumption.


Logistic Regression

What It Measures:

  • Relationship between binary outcomes and predictors.

Key Metrics:

  • Odds Ratio (OR):

    • OR>1: Event more likely in exposed group.

    • OR<1: Event less likely in exposed group.

  • Logit: Natural log of odds, treated as linear.

Assumptions:

  • Logit-linear relationship between predictor and outcome.

  • Independent observations.

Common Mistakes:

  • Misinterpreting ORs (e.g., OR=2 means twice the odds, not probability).


Common Factors That Affect All Tests

  1. Sample Size (n):

    • Larger n increases power (ability to detect effects).

    • Small n can lead to wide CIs and low test power.

  2. Variability:

    • Higher variability reduces precision and increases uncertainty.

  3. Violations of Assumptions:

    • Non-normal data or unequal variances affect the validity of t-tests.

    • Non-independence in χ2 invalidates conclusions.