Sampling Distribution

Module 5: Sampling Distribution

Overview

  • Presenter: Rosana Fok

  • Focus: Sampling Distribution of a Sample Proportion & Central Limit Theorem (CLT)

Key Definitions

Population Parameter
  • Definition: A numerical measure such as the mean, median, mode, range, variance, or standard deviation calculated for a population data set.

  • Notation: Typically written with Greek letters (e.g., µ for mean, σ for standard deviation).

  • Characteristics:

    • Usually unknown and constant.

Sample Statistic
  • Definition: A summary measure calculated for a sample data set, expressed using Latin letters (e.g., 𝑦̅ for sample mean, s for sample standard deviation).

  • Characteristics:

    • Regarded as random before the sample is selected.

    • Observed after the sample is selected.

    • The value varies from sample to sample, a phenomenon known as sampling variability.

  • Sampling Distribution: The distribution of all possible values of a statistic from repeated samples.

Sampling Distribution

Example for Sample Proportion
  • Introduction: A simple random sample (SRS) of size n from a large population enables estimation of probability p by calculating the sample proportion, defined as:
    ext{Sample Proportion } (p̂) = \frac{\text{Number of Successes in the Sample}}{\text{Sample Size (n)}}

  • Examples:

    • Flipping n coins and recording how many show 'Tail'.

    • Surveying n random individuals to determine how many possess an IQ above 120.

    • Surveying n random students to find how many have more than two siblings.

Population and Sample Proportions
  • Population Proportion (p): Defined by the ratio of the number of successes in a population to the total number of elements in that population, given by:
    p = \frac{\text{Number of Successes in the Population}}{\text{Population Size (N)}}

  • Examples:

    • Determining the number of non-resident students from N students checked.

    • Counting how many out of N adults in a village consume snacks.

Sampling Example

  • Scenario: A greeting card company produces 10,000 cards, of which 7,000 are birthday cards. A random sample of 200 cards shows that 128 are birthday cards.

    • Tasks:

    • Calculate the proportion of birthday cards in the population and the sample.

    • Find the sampling error, assuming no non-sampling error has occurred.

Sampling Distribution of a Sample Proportion (p̂)

  • Concept: When taking different samples to estimate a population characteristic, outcomes will likely differ. This is known as sampling variability.

  • Histogram Representation: If we analyzed all samples, the histogram of the sample proportions would be called the sampling distribution of the proportions.

  • Expectation: The histogram is expected to center around the true population proportion p.

Rules of Sampling Distribution

Rule 1: Mean of the Sampling Distribution
  • ext{Mean } (μ_{p̂}) = p

  • Notation: Also denoted as μ_{p̂}.

Rule 2: Standard Deviation of the Sampling Distribution
  • ext{Standard Deviation } (σ_{p̂}) = \sqrt{\frac{p(1 - p)}{n}}

  • Notation: This standard deviation is called the standard error, denoted SD_{p̂}.

Rule 3: Shape of Sampling Distribution
  • Distribution Characteristics:

    • Unimodal, symmetric, and centered at p.

    • For large n, the sampling proportion p̂ is approximately normally distributed (Central Limit Theorem).

  • Rule of Thumb:

    • Sample size is sufficiently large if:
      np ≥ 10 ext{ and } n(1 - p) ≥ 10

Sampling Distribution Model

  • Model Application: A sampling distribution model quantifies variation in sample proportions and calculates the likelihood of observing a sample proportion within a specific range.

  • Probability Model: Foundationally modeled as N(p, σ^{2}) for sufficiently large n.

Specific Example: Coffee Brand Preference

  • Brands: S and T; assume equal preference.

  • Sample Size: n = 3 tasters.

  • Objective: Determine the sampling distribution for the sample proportion, including mean and standard deviation.

Assumptions Required

Independence Assumption
  • Sampled values must be independent of each other.

Sample Size Assumption
  • Sample size n must be sufficiently large.

Acknowledgment of Challenges
  • Assumptions can be difficult or sometimes impossible to check; thus, we assume them but must verify their reasonableness through related conditions.

Conditions before Normal Model Application

  • Randomization Condition: The sample should be a simple random sample of the population.

  • 10% Condition: If sampling without replacement, the sample size (n) should not exceed 10% of the population size.

  • Success/Failure Condition: The sample size must be large enough that both np and nq are at least 10.

Probability and Z-score Conversion

  • For standardization of proportion successes, the z-value is defined as:
    z = \frac{p̂ - μ{p}}{σ{p}}
    where p̂ is the sample proportion.