Lecture Notes on Population Estimation and Hypothesis Testing

Analytics Using Sample Data

  • Statistical inferences about a population are often made using sample data.
  • Two key types of inferences:
    • Estimation of population means and proportions
    • Hypothesis testing

Population Parameters vs Sample Statistics

  • Population Parameters: Constant, unknown values that describe population attributes.
    • Mean: m (mu)
    • Proportion: Π (pi)
    • Standard Deviation: s (sigma)
    • Variance: s^2
  • Sample Statistics: Variable values that depend on the selected sample.
    • Mean: ar{x} (X-bar)
    • Proportion: p
    • Standard Deviation: s

Estimating Population Parameters

  • Estimation Process:
    • Involves assessing unknown parameters using sample data.
    • An unbiased estimator: expected value equals true population parameter.
    • Point Estimate: a single number derived from sample data.
  • Confidence Interval Construction:
    1. Determine point estimate.
    2. Calculate margin of error (dependent on sample size, standard deviation, confidence level).
    3. Determine interval bounds (point estimate ± margin of error).

Margin of Errors

  • Smaller margins imply narrower, more precise estimates.
  • Three main factors affecting margin of error:
    • Sample Size: Larger samples yield more precise representations.
    • Variability in Data: Higher variation leads to less precision.
    • Level of Confidence: Higher confidence levels increase margin of error.
    • Common Confidence Levels: 90%, 95%, 99%.

Example: IQ Scores of Students

  • QUT wants to estimate average IQ based on a sample of 25 students with a sample mean of 115.
  • Determine:
    • Is it a mean or proportion estimation?
    • What is the point estimate?
    • Is the population standard deviation known?
    • Confidence level chosen?

Confidence Interval Example

  • For a confidence level of 95%, the expectation is that out of 20 samples, 19 will contain the true population mean.

Interpretation of Confidence Intervals

  • 90% CI for a mean height estimate: (155.51, 184.49) indicates confidence that the true population mean lies within that range.
  • Misinterpretation example: A given CI does not guarantee the true mean lies within it if data does not provide that interval.

Use of z vs t-distributions

  • z-distribution: Assumes known population standard deviation; suitable for larger samples.
  • t-distribution: Used when population standard deviation is unknown or sample size is small; has thicker tails appropriate for smaller sample variances.
  • Degrees of freedom (df) are critical for the t-distribution, typically represented as (n-1).

Confirmation of Confidence Interval Calculations

  • Example calculations should include standard error and use appropriate forms (either z or t) based on known parameters.

Prediction Intervals vs Confidence Intervals

  • Prediction Interval: Predicts future observations from the population; typically wider than confidence intervals.

Hypothesis Testing Overview

  • Definition: A hypothesis is a proposed explanation testable by scientific methods.
  • Categories of hypothesis testing in business:
    • Service quality assessments (waiting times)
    • Defective rates in production
    • Operational cost evaluations.

Empirical Evidence Through Hypothesis Testing

  • Hypothesis testing can determine:
    • Presence of specific conditions (e.g., health conditions).
    • Effects (e.g., treatments, marketing strategies).
    • Differences or relationships (e.g., demographic impacts on purchasing behaviors).