module 3 stat part 3

Analysis of Continuous Data: Comparing Two Means

Overview of Comparing Two Means

  • Continuous data analysis often involves comparing the means of two groups.

  • The independent sample t-test (or two-sample t-test) is commonly used for this purpose.

  • In this analysis:

    • The independent variable (categorical) indicates group membership (e.g., treatment vs. placebo).

    • The outcome (response variable) is continuous (e.g., systolic blood pressure).

Example of Comparing Means

  • Systolic Blood Pressure Analysis:

    • Example variable: Systolic blood pressure (in mmHg).

    • Compare means across two groups:

      • Group 1: Patients receiving a placebo.

      • Group 2: Patients receiving a new drug (e.g., Drug A).

    • Dependent variable (outcome): Mean systolic blood pressure.

  • Mental Health Score Analysis:

    • Example variable: Mental health score (range 0 to 100).

    • Groups may include:

      • Physically active vs. inactive participants (categorical independent variable).

Hypotheses Formulation

  • Null Hypothesis (H0): Mean systolic blood pressure for patients on placebo equals that for patients on Drug A.

  • Alternative Hypothesis (H1): There is a difference in mean systolic blood pressure between the two groups.

Assumptions of the Two-Sample T-Test

  1. Normality of Distribution:

    • Distribution of responses must be normally distributed (especially for small sample sizes).

    • Not necessary for larger sample sizes (n > 30).

  2. Independence of Samples:

    • Participants in each group must be independent.

    • No repeated measures or related participants involved.

  3. Homogeneity of Variances:

    • Variances between the two groups should be approximately equal.

    • Assumption can be visually inspected using box plots.

    • Levene's test can formally assess variance equality.

Box Plots and Variance Comparison

  • Box Plot Analysis:

    • Visual representation to determine normal distribution and variance similarity.

    • Medians represented as lines within boxes; should ideally be centered.

    • If medians are skewed, suggests potential normality violations.

  • Example:

    • Box plot comparison for systolic blood pressure in placebo vs. Drug A:

      • Assess variability and check if assumptions are satisfied.

Levene's Test

  • Used to formally assess the equality of variances.

  • If variances are unequal, a modified t-test should be used.

Reporting Results of T-Test

  • Results should summarize the findings succinctly:

    • E.g., "There was significant evidence of mean differences in mental health scores between active and inactive groups."

    • Statistical significance should be reported (t-statistic and p-value).

    • Confidence intervals for mean differences give insights into the population means.

Non-Parametric Alternatives: Mann Whitney U Test

  • If assumptions for the t-test are violated, a non-parametric test can be used:

    • Mann Whitney U Test:

      • Used when sample sizes are small or data are not normally distributed.

      • Compares medians by ranking data rather than directly comparing means.

  • Non-parametric tests can be more flexible but typically have lower power than parametric tests.

Historical Context

  • The t-test was developed by William Gossett, who worked at Guinness Brewery and published under the pseudonym "Student".

Summary of Assumptions for Two-Sample T-Test

  1. Normal Distribution: Check for normality, crucial for small samples.

  2. Independent Samples: Ensure that groups are independent.

  3. Equal Variances: Test and confirm variances are similar using tests like Levene's test.

Looking Ahead

  • Next topic will cover analysis of paired data. This builds on the concepts learned in comparing two means, offering a further nuanced exploration into statistical analysis methods.