Two-Sample Inference Study Notes

Two-Sample Inference

1. Introduction to Two-Sample Inference

  • Objectives: Conduct hypothesis tests for differences in independent model proportions and means.    - 4.1: Hypothesis tests for a difference in independent model proportions ($ heta_1 - heta_2$) by hand or in RStudio.    - 4.2: Hypothesis tests for a difference in independent model means ($ar{y_1} - ar{y_2}$) by hand or in RStudio.    - 4.3: Construct confidence intervals for a difference of model proportions or means.       - Interpretation: Confidence intervals provide a range of parameter values compatible with collected data and recognize common misinterpretations.

2. Normal Procedures for Difference of Proportions

  • Objective: Draw inferences about population proportions ($ heta_1 - heta_2$) using the normal model, akin to previous examples.

  • Sampling Distribution: The sampling distribution of two sample proportions $ar{p}_1 - ar{p}_2$.

2.1 Key Idea
  • Normal Model Conditions:      - Condition 1: Success-failure assumption.      - Condition 2: Further condition defined for the almost normal distribution condition.

  • If these conditions are met:    - The mean of the normal distribution: to be filled in based on set conditions.    - The standard error of this distribution: SE(pˉ1pˉ2)=ext<em>fillingrequired</em>SE(\bar{p}_1 - \bar{p}_2) = ext{<em>filling required</em>}.

3. Summary of Two-Proportion Formulas

  • Hypothesis Test & Confidence Interval Formulas: Two different formulas for computing $SE(ar{p}_1 - ar{p}_2)$ based on procedures used.

  • Conditions for Approximation:    1. $n_1 p_1 ext{ and } n_1(1 - p_1) ext{ all } ext{at least } 10$.    2.** $n_2 p_2 ext{ and } n_2(1 - p_2) ext{ all } ext{at least } 10$.

  • Implication: If any of these quantities fall below 10, the normal approximation to the sampling distribution becomes unreliable.

4. Confidence Intervals for Differences in Proportions

  • Example: Alcohol use and heart health.    - Study Overview: 410 men observed to examine the relationship between moderate alcohol intake and heart disease risk.    - Groups: 209 ‘abstainers’ and 201 ‘moderate drinkers’ over 10 years, recording cardiac arrests.    - Data Summary:       - Abstainers experiencing cardiac arrest: 12       - Moderate drinkers experiencing cardiac arrest: 9    - Questions:      - (a) Point Estimate of true difference: $p_{abstainers} - p_{drinkers}$.      - (b) Compute a 95% confidence interval.      - (c) Interpret the level of this interval; misinterpretations discussed.      - (d) State conditions for validity of the interval.

5. Conducting Hypothesis Tests

  • Example 1: Cancer rates in dogs related to herbicide exposure.    - Study Overview: 1994 study investigating risk of cancer in dogs exposed to 2,4-D herbicide.    - Sample Size: 491 cancer-affected dogs; 945 control group.    - Expected cancer cases based on exposure: Statistical evidence computed.    - Steps:      - Step 1: Establish hypotheses:         - $H_0$: No increased cancer risk.         - $H_a$: Increased cancer risk in 2,4-D dogs.      - Step 2: Summarize data and check conditions:         - Independency Check: $n_1p_{null} ext{ and } n_1(1 - p_{null}) ext{ must be } ext{ at least } 10$.      - Step 3: Calculate test statistic, p-value, effect size:         - Observed test statistic: formula required here.      - Step 4: Interpret p-value & report conclusions in context.

6. Sample Data Example: Vaccine for Diarrhea

  • Study Overview: 2010 study on vaccine effectiveness against rotavirus gastroenteritis in children.    - Vaccine Group Outcome: 63 out of 3298 children contracted the virus.    - Placebo Outcome: 80 out of 1641 children contracted the virus.    - Steps:       - (a) Compute sample percentages, interpret effectiveness.       - (b) Conduct hypothesis test for vaccine effectiveness, all steps shown.       - (c) Clinical testing of a newer vaccine with results from 1100 children.

7. Two-Proportion z-Test

  • Definition: A hypothesis test to compare proportions $p_1$ between two independent groups.

  • Test Statistic Formula: Z=racpˉ1pˉ2SE(pˉ1pˉ2)Z = rac{\bar{p}_1 - \bar{p}_2}{SE(\bar{p}_1 - \bar{p}_2)}

  • Application: Used when comparing two proportions from different populations: independent random samples of sufficient size.

8. Differences Between Independent Means

  • Sampling Distribution: $ar{y}_1 - ar{y}_2$, differences of two independent population means.

  • Condition 1: Conditions required here.

  • Condition 2: Conditions required here.

  • Resulting Distribution: If conditions are met, distribution is nearly normal.

9. Calculating Confidence Intervals for Differences in Means

  • Example 1: Blubber effectiveness in dolphins; provide relevant calculations.

  • Hypothesis Testing: Assess diltiazem medication outcomes via statistical comparisons.

10. Next Steps in Two-Sample Procedures

  • RStudio Codes and Calculations: Follow-up questions posed for practical application in sessions.

  • Confidence Interval Validations: Refresh how to conduct intervals based on sample data.

11. Chi-Squared Procedures Overview

  • Objectives: Conduct Chi-squared goodness-of-fit and independence tests. Interpret statistics, p-values, replicate.

  • Test Statistic Construction:extChiSquared=rac(extObservedextExpected)2extExpectedext{Chi-Squared} = rac{( ext{Observed} - ext{Expected})^2}{ ext{Expected}}

  • Chi-Squared Distribution: Left-skewed, positive values only; implication on hypothesis testing.

12. Data Comparison Summary and Additional Examples

  • Conclusions drawn from multiple datasets, ensuring clarity on findings and related implications.

  • Discussion Points: About ethical considerations and practical applications related to studies, results influences.

Methods of Influence in Statistical Testing

  • Cohen’s d for Effect Size Determination: Interpretable metrics to evaluate study findings.

  • Example Studies: Review various tests of independence and their results among other real-world applications like dietary assessments.