Lecture Note 09 Inference for Two Population Means Part I

Simple Random Sample: Every member of the population has an equal chance of being selected.
Stratified Random Sample: Population is divided into subgroups (strata) and samples are taken from each.
Cluster Random Sample: Entire groups (clusters) are chosen randomly instead of individuals.
Multistage Random Sample: Combines several sampling methods, usually involves clusters followed by individual sampling.
Convenient Sample: Population members selected based on accessibility.
Volunteer Sample: Members self-select to participate in a study.

Observational Study: The researcher observes but does not interfere with the population.
Experimental Study: The researcher actively manipulates variables to observe the effects.

Understanding: Population proportion, P, and its inference methods.
Types of inference:
- Parametric Inference: Assuming data follows certain distributions.
- Nonparametric Inference: No assumption about the distribution of the data.

Two different exam versions were given to determine if there is a significant difference in difficulty.
Randomization used to distribute exams to students.
Goal: Evaluate if Version B is statistically harder than Version A based on student performance.

Point Estimate of the difference:
- 𝜇1 − 𝜇2 = 75.1 − 72 = 3.1
It is acknowledged that the point estimate may not reflect the true mean difference.

Independence: Samples must be randomly selected.
Normality: Sample groups are assumed to come from normal distributions.
- This assumption can be relaxed if groups are large enough or have no outliers.

If assumptions are met, use:
- Point estimate of the difference: ҧ𝑥1 − ҧ𝑥2 follows normal distribution.
- Confidence Interval formula:
  - (x̄1 − x̄2) ± z * √(σ1²/n1 + σ2²/n2)
Standard deviations (σ1², σ2²) are often unknown.

Complex degrees of freedom, often calculated with software, or using the smaller of (n1 - 1, n2 - 1) for manual calculations.
Use the t-distribution for confidence intervals in parametric tests:
- Formulae include:
  - t*value from t-table
  - Margin of error
  - Standard error of point estimate.

Question: Can we apply parametric inference for comparing population means?
Actions:
- Compute 99% confidence interval for the difference between means, 𝜇1 − 𝜇2.
- Interpret this interval in practical context.

Provides critical t-values for different degrees of freedom (df) and confidence levels, essential for inference calculations (e.g., 80%, 90%, 95%, etc.).

Question: Does the calculated 99% confidence interval indicate that Version B was indeed harder?