Sampling Distributions

Population vs. Sample

  • Population: All elements/members (individuals, items, objects) whose characteristics are studied.
  • Sample: A portion/subset of the population selected for study.

Reasons for Sampling

  • Gathering information on the entire population is too expensive.
  • It is often impossible to gather information on the entire population.

Representative Sample

  • A sample representing the population's characteristics as closely as possible.
  • Also called random sample.

Types of Samples

  • Random/Representative Samples
    • Simple Random Sample
    • Systematic Sample
    • Stratified Sample
    • Cluster Sample
  • Non-random/Non-representative
    • Voluntary Response Sample / Convenience Sample

Sampling Methods

  • Simple Random Sample: Each element has the same chance of being included.
  • Systematic Sample: Select a starting point, then select every nth element (e.g., 10th or 50th).
  • Cluster Sample: Divide the population into groups (clusters), randomly select clusters, and then include all members from selected clusters.
  • Stratified Sample: Subdivide the population into subgroups (strata) with shared characteristics (e.g., gender, age), then draw a sample from each stratum.
  • Voluntary Response/Convenience Sample: Respondents decide whether to be included.

Inferential Statistics

  • Making statements about a population by examining sample results.
  • Using sample statistics to infer population parameters.
  • Inference: Estimating unknown population parameters from sample evidence.
  • Estimation: Estimating population mean weight using the sample mean weight.
  • Hypothesis Testing: Using sample evidence to test a claim about a population parameter (e.g., population mean weight is 120 pounds).
  • Drawing conclusions and/or making decisions about a population based on sample results.

Population and Sampling Distribution

  • Population Distribution: Probability distribution of the population data.
  • Sampling Distribution: Probability distribution of a sample statistic.
  • Sample information helps infer population parameters (\mu, p, and \sigma) based on sample statistics (\bar{x}, \hat{p}, and S).

Sampling Distributions

  • Sampling Distributions of Sample Means
  • Sampling Distributions of Sample Proportions

Sampling Distribution of the Mean (\sigma known)

  • Central Limit Theorem:
    • Let X1, X2, …, X_n be a sequence of independent and identically distributed random variables each having a mean \mu and a standard deviation \sigma.
    • For a large n (by the law of large numbers), the Central Limit Theorem asserts that: E(X) = \mu_x = \mu
    • Standard error of the mean: \sigma_\bar{x} = \frac{\sigma}{\sqrt{n}}
    • For samples from finite populations: \sigma_\bar{x} = \frac{\sigma}{\sqrt{n}} \sqrt{\frac{N-n}{N-1}}
    • Finite population correction factor: \sqrt{\frac{N-n}{N-1}}
    • If \frac{n}{N} \leq 0.05, then the correction factor can be ignored.

Central Limit Theorem Visualization

  • Sampling distribution of \bar{X} becomes normal as n increases, regardless of population distribution (shown with uniform and exponential distributions).
  • As sample size (n) increases, the sampling distribution becomes approximately normal.

Standardized Sample Mean

  • Z = \frac{\bar{X} - \mu}{\frac{\sigma}{\sqrt{n}}}
  • When n is large and the population is infinite or finite but \frac{n}{N} \leq 0.05
  • If random samples are from a normal distribution, the sampling distribution of the mean is normal regardless of sample size.
  • In practice, the normal distribution provides an excellent approximation to the sample distribution of the mean for n as small as 30.

Sampling Distribution of the Mean (\sigma unknown)

  • If \bar{X} is the mean of a random sample of size n from a normal population with mean \mu and standard deviation \sigma:
    • t = \frac{\bar{X} - \mu}{\frac{S}{\sqrt{n}}}, where \frac{S}{\sqrt{n}} = S_x
    • S is a random variable having a t-distribution with df = n-1 (degrees of freedom).
    • S = \sqrt{\frac{\sum{i=1}^{n} (Xi - \bar{X})^2}{n-1}}

Student’s t-Distribution

  • t-distributions are bell-shaped and symmetric, but have ‘fatter’ tails than the normal distribution.
  • Standard Normal distribution is equivalent to a t-distribution with df = ∞.
  • t approaches Z as n increases.
  • The t-distribution is similar to the standard normal distribution and both are bell shaped and symmetrical about the mean.
  • The standard deviation of the t-distribution exceeds 1, but approaches 1 as n approaches ∞.
  • The standard normal distribution provides a good approximation to the t-distribution for a sample size of 30 or more.
  • The assumption that the sample must come from a normal distribution is not so severe restriction as it may seem. Studies have shown that the distribution of random variable is fairly close to a t-distribution even for a sample from certain non-normal populations. In practice, it is necessary to make sure that the population from which we draw the sample is approximately bell shaped and not too skewed.

Problems and Solutions: Sample Mean

  • Problem 1:

    • SAT test mean score: 904, standard deviation: 152. Assume normal distribution.
    • Calculate mean and standard deviation of \bar{X} for:
      • a) n = 16
        • E(\bar{X}) = \mu_{\bar{x}} = \mu = 904
        • \sigma_{\bar{x}} = \frac{\sigma}{\sqrt{n}} = \frac{152}{\sqrt{16}} = 38
      • b) n = 100
        • E(\bar{X}) = \mu_{\bar{x}} = \mu = 904
        • \sigma_{\bar{x}} = \frac{\sigma}{\sqrt{n}} = \frac{152}{\sqrt{100}} = 15.2
  • Problem 2:

    • Cookie weight mean: 8 ounces, standard deviation: 3 ounces.
    • Find P(7.8 ≤ \bar{X} ≤ 8.2) for n = 36.
    • Solution:
      • \mu_{\bar{x}} = 8
      • \sigma_{\bar{x}} = \frac{\sigma}{\sqrt{n}} = \frac{3}{\sqrt{36}} = 0.5
      • P(7.8 \leq \bar{X} \leq 8.2) = P(\frac{7.8 - 8}{0.5} \leq Z \leq \frac{8.2 - 8}{0.5}) = P(-0.4 \leq Z \leq 0.4) = 0.3108

Sampling Distribution of the Proportion

  • The information that is usually available for the estimation of a proportion is the number of times, X, that an appropriate event occurs in n trials, occasions, or observations.
  • The point estimator of the population proportion, itself, is usually the sample proportion: \hat{p} = \frac{X}{n}
    • \hat{p} - proportion of the time that the event actually occurs or sample proportion.
  • If the n trials satisfy the assumptions underlying the binomial distribution, we know the mean and standard deviation of the number of successes.
  • If we divide both of these quantities by n, we will find the mean and the standard deviation of the proportions of successes:
    • \mu_{\hat{p}} = p
    • \sigma_{\hat{p}} = \sqrt{\frac{pq}{n}} = \sqrt{\frac{p(1-p)}{n}}

Standardized Sample Mean for Proportion

  • Z = \frac{\hat{p} - p}{\sqrt{\frac{pq}{n}}}
  • When n is large, we can construct a Z-value for a binomial parameter p, by using the normal approximation to the binomial distribution.
  • Usually, the normal distribution is used as an approximation to the binomial distribution when np ≥ 5 and nq ≥ 5

Proportion Problems and Solutions

  • Problem 1:
    • 44% of teenage boys are bad drivers. Sample of 100 male teenager drivers.
    • Find mean and standard deviation of \hat{p}.
    • \mu_{\hat{p}} = p = 0.44
    • \sigma_{\hat{p}} = \sqrt{\frac{(0.44)(0.56)}{100}} = 0.0496
  • Problem 2:
    • True proportion of voters supporting Proposition A is 0.4. Sample size 200.
    • Find P(0.40 ≤ \hat{p} ≤ 0.45).
    • \sigma_{\hat{p}} = \sqrt{\frac{0.4(1-0.4)}{200}} = 0.03464
    • P(0.40 \leq \hat{p} \leq 0.45) = P(\frac{0.40 - 0.40}{0.03464} \leq Z \leq \frac{0.45 - 0.40}{0.03464}) = P(0 \leq Z \leq 1.44) = 0.4251

Additional Problems

  • Problem 1:
    • Maureen Webster claims 53% favor in a large city. Sample 400 voters. Find probability less than 49% favor her.
    • P(\hat{p} < 0.49) = P(Z < \frac{0.49 - 0.53}{\sqrt{\frac{(0.53)(0.47)}{400}}}) = P(Z < -1.6) = 0.0548
  • Problem 2:
    • Machine produces 5% defective parts. Sample of 400 parts every week. Stop and readjust if 8% or more defective.
    • Find probability process will be stopped.
    • P(\hat{p} > 0.08) = P(Z > \frac{0.08 - 0.05}{\sqrt{\frac{(0.05)(0.95)}{400}}}) = P(Z > 2.75) = 0.003