Sampling Distribution
Module 5: Sampling Distribution
Overview
Presenter: Rosana Fok
Focus: Sampling Distribution of a Sample Proportion & Central Limit Theorem (CLT)
Key Definitions
Population Parameter
Definition: A numerical measure such as the mean, median, mode, range, variance, or standard deviation calculated for a population data set.
Notation: Typically written with Greek letters (e.g., µ for mean, σ for standard deviation).
Characteristics:
Usually unknown and constant.
Sample Statistic
Definition: A summary measure calculated for a sample data set, expressed using Latin letters (e.g., 𝑦̅ for sample mean, s for sample standard deviation).
Characteristics:
Regarded as random before the sample is selected.
Observed after the sample is selected.
The value varies from sample to sample, a phenomenon known as sampling variability.
Sampling Distribution: The distribution of all possible values of a statistic from repeated samples.
Sampling Distribution
Example for Sample Proportion
Introduction: A simple random sample (SRS) of size n from a large population enables estimation of probability p by calculating the sample proportion, defined as:
ext{Sample Proportion } (p̂) = \frac{\text{Number of Successes in the Sample}}{\text{Sample Size (n)}}Examples:
Flipping n coins and recording how many show 'Tail'.
Surveying n random individuals to determine how many possess an IQ above 120.
Surveying n random students to find how many have more than two siblings.
Population and Sample Proportions
Population Proportion (p): Defined by the ratio of the number of successes in a population to the total number of elements in that population, given by:
p = \frac{\text{Number of Successes in the Population}}{\text{Population Size (N)}}Examples:
Determining the number of non-resident students from N students checked.
Counting how many out of N adults in a village consume snacks.
Sampling Example
Scenario: A greeting card company produces 10,000 cards, of which 7,000 are birthday cards. A random sample of 200 cards shows that 128 are birthday cards.
Tasks:
Calculate the proportion of birthday cards in the population and the sample.
Find the sampling error, assuming no non-sampling error has occurred.
Sampling Distribution of a Sample Proportion (p̂)
Concept: When taking different samples to estimate a population characteristic, outcomes will likely differ. This is known as sampling variability.
Histogram Representation: If we analyzed all samples, the histogram of the sample proportions would be called the sampling distribution of the proportions.
Expectation: The histogram is expected to center around the true population proportion p.
Rules of Sampling Distribution
Rule 1: Mean of the Sampling Distribution
ext{Mean } (μ_{p̂}) = p
Notation: Also denoted as μ_{p̂}.
Rule 2: Standard Deviation of the Sampling Distribution
ext{Standard Deviation } (σ_{p̂}) = \sqrt{\frac{p(1 - p)}{n}}
Notation: This standard deviation is called the standard error, denoted SD_{p̂}.
Rule 3: Shape of Sampling Distribution
Distribution Characteristics:
Unimodal, symmetric, and centered at p.
For large n, the sampling proportion p̂ is approximately normally distributed (Central Limit Theorem).
Rule of Thumb:
Sample size is sufficiently large if:
np ≥ 10 ext{ and } n(1 - p) ≥ 10
Sampling Distribution Model
Model Application: A sampling distribution model quantifies variation in sample proportions and calculates the likelihood of observing a sample proportion within a specific range.
Probability Model: Foundationally modeled as N(p, σ^{2}) for sufficiently large n.
Specific Example: Coffee Brand Preference
Brands: S and T; assume equal preference.
Sample Size: n = 3 tasters.
Objective: Determine the sampling distribution for the sample proportion, including mean and standard deviation.
Assumptions Required
Independence Assumption
Sampled values must be independent of each other.
Sample Size Assumption
Sample size n must be sufficiently large.
Acknowledgment of Challenges
Assumptions can be difficult or sometimes impossible to check; thus, we assume them but must verify their reasonableness through related conditions.
Conditions before Normal Model Application
Randomization Condition: The sample should be a simple random sample of the population.
10% Condition: If sampling without replacement, the sample size (n) should not exceed 10% of the population size.
Success/Failure Condition: The sample size must be large enough that both np and nq are at least 10.
Probability and Z-score Conversion
For standardization of proportion successes, the z-value is defined as:
z = \frac{p̂ - μ{p}}{σ{p}}
where p̂ is the sample proportion.