Sampling Distributions and Estimation

Sampling Distributions and Estimation

Sampling Variation

  • Sample statistic: A random variable whose value depends on which population items are included in the random sample.
  • Sampling distribution: A probability distribution of all possible values of a sample statistic for a given sample size selected from a population.
  • The sample statistic's ability to represent the population accurately depends on the sample size.
  • Sampling variation is illustrated using eight random samples of size n=5n = 5 from a large population of GMAT scores.

Estimators and Sampling Distributions

  • Estimator: A statistic derived from a sample to infer the value of a population parameter.
  • Estimate: The value of the estimator in a particular sample.
  • Population parameters are represented by Greek letters, while corresponding statistics are represented by Roman letters.
    • Sample mean (xˉ\bar{x}) is the estimator for the population mean (μ\mu).
    • Sample proportion (pp) is the estimator for the population proportion (π\pi).
    • Sample standard deviation (ss) is the estimator for the population standard deviation (σ\sigma).
  • Sampling error: The difference between an estimate and the corresponding population parameter. For example:
    • SamplingError=xˉμSampling Error = \bar{x} - \mu
  • Bias: The difference between the expected value of the estimator and the true parameter.
    • Bias=E(Xˉ)μBias = E(\bar{X}) - \mu
  • An estimator is unbiased if its expected value is the parameter being estimated.
    • The sample mean is an unbiased estimator of the population mean since: E(Xˉ)=μE(\bar{X}) = \mu. On average, an unbiased estimator neither overstates nor understates the true parameter.

Sample Mean Sampling Distribution: Standard Error of the Mean

  • Different samples of the same size from the same population will yield different sample means.
  • Standard Error of the Mean: A measure of the variability in the sample means (from a theoretical distribution of all possible sample means of sample size n) from sample to sample.
    • σxˉ=σn\sigma_{\bar{x}} = \frac{\sigma}{\sqrt{n}}, assuming sampling with replacement or without replacement from an infinite population.
    • The standard error of the mean decreases as the sample size increases.

Sample Mean Sampling Distribution: If the Population is Normal

  • If a population is normal with mean μ\mu and standard deviation σ\sigma, the sampling distribution of xˉ\bar{x} is also normally distributed with:
    • μxˉ=μ\mu_{\bar{x}} = \mu
    • σxˉ=σn\sigma_{\bar{x}} = \frac{\sigma}{\sqrt{n}}
  • Z-value for Sampling Distribution of the Mean:
    • Z=Xˉμσxˉ=XˉμσnZ = \frac{\bar{X} - \mu}{\sigma_{\bar{x}}} = \frac{\bar{X} - \mu}{\frac{\sigma}{\sqrt{n}}}
    • Where:
      • Xˉ\bar{X} = sample mean
      • μ\mu = population mean
      • σ\sigma = population standard deviation
      • nn = sample size

Determining An Interval Including A Fixed Proportion of the Sample Means

  • To find a symmetrically distributed interval around μ\mu that will include 95% of the sample means when μ=368\mu = 368, σ=15\sigma = 15, and n=25n = 25:
    • Since the interval contains 95% of the sample means, 5% of the sample means will be outside the interval.
    • Since the interval is symmetric, 2.5% will be above the upper limit, and 2.5% will be below the lower limit.
    • From the standardized normal table, the Z score with 2.5% (0.0250) below it is -1.96, and the Z score with 2.5% (0.0250) above it is 1.96.
    • Calculating the lower limit of the interval:
      • XˉL=μ+Zscore(σn)=368+(1.96)(1525)=362.12\bar{X}_L = \mu + Zscore(\frac{\sigma}{\sqrt{n}}) = 368 + (-1.96)(\frac{15}{\sqrt{25}}) = 362.12
    • Calculating the upper limit of the interval:
      • XˉU=μ+Zscore(σn)=368+(1.96)(1525)=373.88\bar{X}_U = \mu + Zscore(\frac{\sigma}{\sqrt{n}}) = 368 + (1.96)(\frac{15}{\sqrt{25}}) = 373.88
    • Based on samples of size 25, the sample means in 95% of all samples are between 362.12 and 373.88.
  • Generalized Equation for the interval that contains some defined percentage of all sample means:
    • μ±Zscore(σn)\mu \pm Zscore(\frac{\sigma}{\sqrt{n}})

Population Proportions

  • π\pi = the proportion of the population having some characteristic.
  • σ2\sigma^2 of a proportion is defined as π(1π)\pi(1 - \pi), so:
  • σ\sigma of a proportion is defined as π(1π)\sqrt{\pi(1 - \pi)}
  • Sample proportion (pp) provides an estimate of π\pi:
    • p = \frac{X}{n} = \frac{# \ of \ items \ in \ the \ sample \ of \ interest}{sample \ size}
    • 0p10 \leq p \leq 1
    • pp is approximately distributed as a normal distribution when n is large (assuming sampling with replacement from a finite population or without replacement from an infinite population).

Sampling Distribution of p

  • Sampling Distribution of p is approximated by a normal distribution if:
    • n\pi > 5 and n(1 - \pi) > 5
    • Where: μ<em>p=π\mu<em>p = \pi and σ</em>p=π(1π)n\sigma</em>p = \sqrt{\frac{\pi(1-\pi)}{n}}
    • Where: π\pi is the population proportion.
  • Z-Value for Proportions
    • Standardize p to a Z value with the formula:
      • Z=pπσp=pππ(1π)n=pππ(1π)nZ = \frac{p - \pi}{\sigma_p} = \frac{p - \pi}{\sqrt{\frac{\pi(1-\pi)}{n}}} = \frac{p - \pi}{\sqrt{\frac{\pi(1-\pi)}{n}}}
  • Example: If the true proportion of voters who support Proposition A is π=0.4\pi = 0.4, what is the probability that a sample of size 200 yields a sample proportion between 0.40 and 0.45?
    • i.e.: if π=0.4\pi = 0.4 and n=200n = 200, what is P(0.40p0.45)P(0.40 \leq p \leq 0.45)?
      • Find σp\sigma_p:
        • σp=π(1π)n=0.40(10.40)200=0.03464\sigma_p = \sqrt{\frac{\pi(1-\pi)}{n}} = \sqrt{\frac{0.40(1-0.40)}{200}} = 0.03464
      • P(0.40p0.45)=P(0.400.400.03464Z0.450.400.03464)=P(0Z1.44)P(0.40 \leq p \leq 0.45) = P(\frac{0.40 - 0.40}{0.03464} \leq Z \leq \frac{0.45 - 0.40}{0.03464}) = P(0 \leq Z \leq 1.44)
      • Utilize the cumulative normal table:
        • P(0Z1.44)=0.92510.5000=0.4251P(0 \leq Z \leq 1.44) = 0.9251 – 0.5000 = 0.4251
        • Rounded to 2 significant digits = 0.43 or 43%