STAT 101 Lecture Notes

Sample Size Determination

  • When conducting research, determining the appropriate sample size is crucial.
  • The required sample size depends on:
    • How accurate the estimation needs to be.
    • How confident you need to be in the estimate.

Margin of Error and Sample Size

  • Margin of Error: Calculated by multiplying a zz-value by the standard error.
    • Formula: MarginofError=zStandardErrorMargin\,of\,Error = z * Standard\,Error
  • Rearranging the margin of error formula allows us to solve for nn (sample size).
    • Formula: n=(zMarginofError)2p^(1p^)n = (\frac{z}{Margin\,of\,Error})^2 * p\,\hat{} * (1 - p\,\hat{})

Factors Affecting Sample Size

  • Confidence Level:
    • A higher confidence level requires a larger zz^* value.
  • Accuracy:
    • Determined by the margin of error; a smaller margin of error requires a larger sample size.

Issue: Estimating p^p\,\hat{} Before Sampling

  • Problem: The formula for nn includes p^p\,\hat{}, the sample proportion, which is unknown until the sample is taken.
  • p^=NumberofcasesinthecategoryofinterestSamplesizep\,\hat{} = \frac{Number\,of\,cases\,in\,the\,category\,of\,interest}{Sample\,size}

Solutions for Estimating p^p\,\hat{}

  1. Previous Research: Use p^p\,\hat{} from a similar study.
  2. Pilot Study: Run a small pilot study (e.g., 10-30 cases) to estimate p^p\,\hat{}.
  3. Maximize p^(1p^)p\,\hat{} * (1 - p\,\hat{}):
    • The maximum value of p^(1p^)p\,\hat{} * (1 - p\,\hat{}) occurs when p^=0.5p\,\hat{} = 0.5.

Determining Sample Size

  1. Determine the desired confidence level (which affects zz^*).
  2. Determine the desired accuracy (margin of error).
  3. Estimate p^p\,\hat{} using one of the methods above.
  4. Calculate the required sample size nn using the formula.

Tongue Rolling Data

  • Data collected: 96 people can roll their tongues, 23 cannot.
  • Total sample size: 119.
  • Sample proportion: p^=96119p\,\hat{} = \frac{96}{119}.

Data Visualization and Summary

  • Data type: Categorical (Yes/No).
  • Appropriate plot: Bar chart.
  • Y-axis: Frequency.

Point Estimate

  • Point estimate: The sample proportion, p^=96119p\,\hat{} = \frac{96}{119}.

Confidence Interval

  • Confidence level: 95% (commonly used).
  • zz^* value for 95% confidence: 1.96.
  • Formula for Confidence Interval: p^±zSEp\,\hat{} \pm z^* * SE
    • Where SE=p^(1p^)nSE = \sqrt{\frac{p\,\hat{} * (1 - p\,\hat{})}{n}}

Calculating Confidence Interval in Excel

  • Sample proportion: 96/119 ≈ 0.806723
  • Standard error: p^(1p^)n=0.036\sqrt{\frac{p\,\hat{} * (1 - p\,\hat{})}{n}} = 0.036
  • Margin of error: 1.960.036=0.0711.96 * 0.036 = 0.071
  • Lower limit: 0.8067230.071=0.740.806723 - 0.071 = 0.74
  • Upper limit: 0.806723+0.071=0.880.806723 + 0.071 = 0.88
  • Confidence interval: (0.74, 0.88)

Interpretation

  • We are 95% confident that the true proportion of STAT101 students who can roll their tongue is between 74% and 88%.

Connection to Dominant Traits

  • Tongue rolling is said to be a dominant trait, theoretically present in 75% of people.
  • Since 0.75 falls within the calculated confidence interval (0.74, 0.88), our data does not provide evidence against this theory.

Home Field Advantage

  • Investigating whether there is a home-field advantage in sports (e.g., baseball).
  • Advantage: More likely to win at home.
Determining Advantage
  • Look at the proportion of games won at home.
  • Null hypothesis (no advantage): proportion of home wins = 50% (0.5).
  • Baseball data: 2,430 games, home team won in 54.9% of games.
    • Sample size n=2430n = 2430
    • Sample proportion of wins p^=0.549p\,\hat{} = 0.549

Hypothesis Testing

  • We use a hypothesis test to determine if there is enough evidence for a home-field advantage.
  • A hypothesis test is used to find evidence against a null hypothesis in support of an alternative one.
General Setup for Hypothesis Test for a Proportion
  • Null hypothesis (H<em>0H<em>0): p=p</em>0p = p</em>0, where p0p_0 is the hypothesized proportion.
  • Alternative hypothesis:
    • One-tailed test: p > p0 or p<p</em>0p < p</em>0
    • Two-tailed test: pp0p \neq p_0
Standard Error in Hypothesis Testing
  • Assumption: The null hypothesis is true.
  • Use p0p_0 (hypothesized proportion) instead of p^p\,\hat{} in the standard error calculation.
  • Formula: SE=p<em>0(1p</em>0)nSE = \sqrt{\frac{p<em>0 * (1 - p</em>0)}{n}}
Test Statistic (Z-score)
  • Standardize the sample statistic (p-hat) to calculate a z-score.
  • Formula: z=p^p0SEz = \frac{p\,\hat{} - p_0}{SE}
Checking Assumptions
  • Verify np<em>010n * p<em>0 \geq 10 and n(1p</em>0)10n * (1 - p</em>0) \geq 10 to ensure the sampling distribution of p^p\,\hat{} is approximately normal.
  • Forgot to check this for the tongue-rolling confidence interval (corrected afterwards).

Home Field Advantage Hypothesis Test (Continued)

Hypotheses
  • Null Hypothesis (H0H_0): The proportion of home wins is 0.5 (p=0.5p = 0.5).
  • Alternative Hypothesis (HaH_a): There is a home-field advantage, so the proportion of home wins is greater than 0.5 (p > 0.5). This is a one-tailed test.
  • Significance Level: α=0.05\alpha = 0.05 (5%).
Test Statistic
  • Calculate the z-score (standardized test statistic).
  • Formula: z=p^p<em>0p</em>0(1p0)nz = \frac{p\,\hat{} - p<em>0}{\sqrt{\frac{p</em>0(1 - p_0)}{n}}}