STAT 101 Lecture Notes
Sample Size Determination
- When conducting research, determining the appropriate sample size is crucial.
- The required sample size depends on:
- How accurate the estimation needs to be.
- How confident you need to be in the estimate.
Margin of Error and Sample Size
- Margin of Error: Calculated by multiplying a z-value by the standard error.
- Formula: Margin\,of\,Error = z * Standard\,Error
- Rearranging the margin of error formula allows us to solve for n (sample size).
- Formula: n = (\frac{z}{Margin\,of\,Error})^2 * p\,\hat{} * (1 - p\,\hat{})
Factors Affecting Sample Size
- Confidence Level:
- A higher confidence level requires a larger z^* value.
- Accuracy:
- Determined by the margin of error; a smaller margin of error requires a larger sample size.
Issue: Estimating p\,\hat{} Before Sampling
- Problem: The formula for n includes p\,\hat{}, the sample proportion, which is unknown until the sample is taken.
- p\,\hat{} = \frac{Number\,of\,cases\,in\,the\,category\,of\,interest}{Sample\,size}
Solutions for Estimating p\,\hat{}
- Previous Research: Use p\,\hat{} from a similar study.
- Pilot Study: Run a small pilot study (e.g., 10-30 cases) to estimate p\,\hat{}.
- Maximize p\,\hat{} * (1 - p\,\hat{}):
- The maximum value of p\,\hat{} * (1 - p\,\hat{}) occurs when p\,\hat{} = 0.5.
Determining Sample Size
- Determine the desired confidence level (which affects z^*).
- Determine the desired accuracy (margin of error).
- Estimate p\,\hat{} using one of the methods above.
- Calculate the required sample size n using the formula.
Tongue Rolling Data
- Data collected: 96 people can roll their tongues, 23 cannot.
- Total sample size: 119.
- Sample proportion: p\,\hat{} = \frac{96}{119}.
Data Visualization and Summary
- Data type: Categorical (Yes/No).
- Appropriate plot: Bar chart.
- Y-axis: Frequency.
Point Estimate
- Point estimate: The sample proportion, p\,\hat{} = \frac{96}{119}.
Confidence Interval
- Confidence level: 95% (commonly used).
- z^* value for 95% confidence: 1.96.
- Formula for Confidence Interval: p\,\hat{} \pm z^* * SE
- Where SE = \sqrt{\frac{p\,\hat{} * (1 - p\,\hat{})}{n}}
Calculating Confidence Interval in Excel
- Sample proportion: 96/119 ≈ 0.806723
- Standard error: \sqrt{\frac{p\,\hat{} * (1 - p\,\hat{})}{n}} = 0.036
- Margin of error: 1.96 * 0.036 = 0.071
- Lower limit: 0.806723 - 0.071 = 0.74
- Upper limit: 0.806723 + 0.071 = 0.88
- Confidence interval: (0.74, 0.88)
Interpretation
- We are 95% confident that the true proportion of STAT101 students who can roll their tongue is between 74% and 88%.
Connection to Dominant Traits
- Tongue rolling is said to be a dominant trait, theoretically present in 75% of people.
- Since 0.75 falls within the calculated confidence interval (0.74, 0.88), our data does not provide evidence against this theory.
Home Field Advantage
- Investigating whether there is a home-field advantage in sports (e.g., baseball).
- Advantage: More likely to win at home.
Determining Advantage
- Look at the proportion of games won at home.
- Null hypothesis (no advantage): proportion of home wins = 50% (0.5).
- Baseball data: 2,430 games, home team won in 54.9% of games.
- Sample size n = 2430
- Sample proportion of wins p\,\hat{} = 0.549
Hypothesis Testing
- We use a hypothesis test to determine if there is enough evidence for a home-field advantage.
- A hypothesis test is used to find evidence against a null hypothesis in support of an alternative one.
General Setup for Hypothesis Test for a Proportion
- Null hypothesis (H0): p = p0, where p_0 is the hypothesized proportion.
- Alternative hypothesis:
- One-tailed test: p > p0 or p < p0
- Two-tailed test: p \neq p_0
Standard Error in Hypothesis Testing
- Assumption: The null hypothesis is true.
- Use p_0 (hypothesized proportion) instead of p\,\hat{} in the standard error calculation.
- Formula: SE = \sqrt{\frac{p0 * (1 - p0)}{n}}
Test Statistic (Z-score)
- Standardize the sample statistic (p-hat) to calculate a z-score.
- Formula: z = \frac{p\,\hat{} - p_0}{SE}
Checking Assumptions
- Verify n * p0 \geq 10 and n * (1 - p0) \geq 10 to ensure the sampling distribution of p\,\hat{} is approximately normal.
- Forgot to check this for the tongue-rolling confidence interval (corrected afterwards).
Home Field Advantage Hypothesis Test (Continued)
Hypotheses
- Null Hypothesis (H_0): The proportion of home wins is 0.5 (p = 0.5).
- Alternative Hypothesis (H_a): There is a home-field advantage, so the proportion of home wins is greater than 0.5 (p > 0.5). This is a one-tailed test.
- Significance Level: \alpha = 0.05 (5%).
Test Statistic
- Calculate the z-score (standardized test statistic).
- Formula: z = \frac{p\,\hat{} - p0}{\sqrt{\frac{p0(1 - p_0)}{n}}}