Lecture Notes on Population Estimation and Hypothesis Testing
Analytics Using Sample Data
- Statistical inferences about a population are often made using sample data.
- Two key types of inferences:
- Estimation of population means and proportions
- Hypothesis testing
Population Parameters vs Sample Statistics
- Population Parameters: Constant, unknown values that describe population attributes.
- Mean: m (mu)
- Proportion: Π (pi)
- Standard Deviation: s (sigma)
- Variance: s^2
- Sample Statistics: Variable values that depend on the selected sample.
- Mean: ar{x} (X-bar)
- Proportion: p
- Standard Deviation: s
Estimating Population Parameters
- Estimation Process:
- Involves assessing unknown parameters using sample data.
- An unbiased estimator: expected value equals true population parameter.
- Point Estimate: a single number derived from sample data.
- Confidence Interval Construction:
- Determine point estimate.
- Calculate margin of error (dependent on sample size, standard deviation, confidence level).
- Determine interval bounds (point estimate ± margin of error).
Margin of Errors
- Smaller margins imply narrower, more precise estimates.
- Three main factors affecting margin of error:
- Sample Size: Larger samples yield more precise representations.
- Variability in Data: Higher variation leads to less precision.
- Level of Confidence: Higher confidence levels increase margin of error.
- Common Confidence Levels: 90%, 95%, 99%.
Example: IQ Scores of Students
- QUT wants to estimate average IQ based on a sample of 25 students with a sample mean of 115.
- Determine:
- Is it a mean or proportion estimation?
- What is the point estimate?
- Is the population standard deviation known?
- Confidence level chosen?
Confidence Interval Example
- For a confidence level of 95%, the expectation is that out of 20 samples, 19 will contain the true population mean.
Interpretation of Confidence Intervals
- 90% CI for a mean height estimate: (155.51, 184.49) indicates confidence that the true population mean lies within that range.
- Misinterpretation example: A given CI does not guarantee the true mean lies within it if data does not provide that interval.
Use of z vs t-distributions
- z-distribution: Assumes known population standard deviation; suitable for larger samples.
- t-distribution: Used when population standard deviation is unknown or sample size is small; has thicker tails appropriate for smaller sample variances.
- Degrees of freedom (df) are critical for the t-distribution, typically represented as (n-1).
Confirmation of Confidence Interval Calculations
- Example calculations should include standard error and use appropriate forms (either z or t) based on known parameters.
Prediction Intervals vs Confidence Intervals
- Prediction Interval: Predicts future observations from the population; typically wider than confidence intervals.
Hypothesis Testing Overview
- Definition: A hypothesis is a proposed explanation testable by scientific methods.
- Categories of hypothesis testing in business:
- Service quality assessments (waiting times)
- Defective rates in production
- Operational cost evaluations.
Empirical Evidence Through Hypothesis Testing
- Hypothesis testing can determine:
- Presence of specific conditions (e.g., health conditions).
- Effects (e.g., treatments, marketing strategies).
- Differences or relationships (e.g., demographic impacts on purchasing behaviors).