STAT 101 Lecture Notes

Sample Size Determination

  • When conducting research, determining the appropriate sample size is crucial.
  • The required sample size depends on:
    • How accurate the estimation needs to be.
    • How confident you need to be in the estimate.

Margin of Error and Sample Size

  • Margin of Error: Calculated by multiplying a z-value by the standard error.
    • Formula: Margin\,of\,Error = z * Standard\,Error
  • Rearranging the margin of error formula allows us to solve for n (sample size).
    • Formula: n = (\frac{z}{Margin\,of\,Error})^2 * p\,\hat{} * (1 - p\,\hat{})

Factors Affecting Sample Size

  • Confidence Level:
    • A higher confidence level requires a larger z^* value.
  • Accuracy:
    • Determined by the margin of error; a smaller margin of error requires a larger sample size.

Issue: Estimating p\,\hat{} Before Sampling

  • Problem: The formula for n includes p\,\hat{}, the sample proportion, which is unknown until the sample is taken.
  • p\,\hat{} = \frac{Number\,of\,cases\,in\,the\,category\,of\,interest}{Sample\,size}

Solutions for Estimating p\,\hat{}

  1. Previous Research: Use p\,\hat{} from a similar study.
  2. Pilot Study: Run a small pilot study (e.g., 10-30 cases) to estimate p\,\hat{}.
  3. Maximize p\,\hat{} * (1 - p\,\hat{}):
    • The maximum value of p\,\hat{} * (1 - p\,\hat{}) occurs when p\,\hat{} = 0.5.

Determining Sample Size

  1. Determine the desired confidence level (which affects z^*).
  2. Determine the desired accuracy (margin of error).
  3. Estimate p\,\hat{} using one of the methods above.
  4. Calculate the required sample size n using the formula.

Tongue Rolling Data

  • Data collected: 96 people can roll their tongues, 23 cannot.
  • Total sample size: 119.
  • Sample proportion: p\,\hat{} = \frac{96}{119}.

Data Visualization and Summary

  • Data type: Categorical (Yes/No).
  • Appropriate plot: Bar chart.
  • Y-axis: Frequency.

Point Estimate

  • Point estimate: The sample proportion, p\,\hat{} = \frac{96}{119}.

Confidence Interval

  • Confidence level: 95% (commonly used).
  • z^* value for 95% confidence: 1.96.
  • Formula for Confidence Interval: p\,\hat{} \pm z^* * SE
    • Where SE = \sqrt{\frac{p\,\hat{} * (1 - p\,\hat{})}{n}}

Calculating Confidence Interval in Excel

  • Sample proportion: 96/119 ≈ 0.806723
  • Standard error: \sqrt{\frac{p\,\hat{} * (1 - p\,\hat{})}{n}} = 0.036
  • Margin of error: 1.96 * 0.036 = 0.071
  • Lower limit: 0.806723 - 0.071 = 0.74
  • Upper limit: 0.806723 + 0.071 = 0.88
  • Confidence interval: (0.74, 0.88)

Interpretation

  • We are 95% confident that the true proportion of STAT101 students who can roll their tongue is between 74% and 88%.

Connection to Dominant Traits

  • Tongue rolling is said to be a dominant trait, theoretically present in 75% of people.
  • Since 0.75 falls within the calculated confidence interval (0.74, 0.88), our data does not provide evidence against this theory.

Home Field Advantage

  • Investigating whether there is a home-field advantage in sports (e.g., baseball).
  • Advantage: More likely to win at home.

Determining Advantage

  • Look at the proportion of games won at home.
  • Null hypothesis (no advantage): proportion of home wins = 50% (0.5).
  • Baseball data: 2,430 games, home team won in 54.9% of games.
    • Sample size n = 2430
    • Sample proportion of wins p\,\hat{} = 0.549

Hypothesis Testing

  • We use a hypothesis test to determine if there is enough evidence for a home-field advantage.
  • A hypothesis test is used to find evidence against a null hypothesis in support of an alternative one.

General Setup for Hypothesis Test for a Proportion

  • Null hypothesis (H0): p = p0, where p_0 is the hypothesized proportion.
  • Alternative hypothesis:
    • One-tailed test: p > p0 or p < p0
    • Two-tailed test: p \neq p_0

Standard Error in Hypothesis Testing

  • Assumption: The null hypothesis is true.
  • Use p_0 (hypothesized proportion) instead of p\,\hat{} in the standard error calculation.
  • Formula: SE = \sqrt{\frac{p0 * (1 - p0)}{n}}

Test Statistic (Z-score)

  • Standardize the sample statistic (p-hat) to calculate a z-score.
  • Formula: z = \frac{p\,\hat{} - p_0}{SE}

Checking Assumptions

  • Verify n * p0 \geq 10 and n * (1 - p0) \geq 10 to ensure the sampling distribution of p\,\hat{} is approximately normal.
  • Forgot to check this for the tongue-rolling confidence interval (corrected afterwards).

Home Field Advantage Hypothesis Test (Continued)

Hypotheses

  • Null Hypothesis (H_0): The proportion of home wins is 0.5 (p = 0.5).
  • Alternative Hypothesis (H_a): There is a home-field advantage, so the proportion of home wins is greater than 0.5 (p > 0.5). This is a one-tailed test.
  • Significance Level: \alpha = 0.05 (5%).

Test Statistic

  • Calculate the z-score (standardized test statistic).
  • Formula: z = \frac{p\,\hat{} - p0}{\sqrt{\frac{p0(1 - p_0)}{n}}}