Sampling Distribution

Presenter: Rosana Fok
Focus: Sampling Distribution of a Sample Proportion & Central Limit Theorem (CLT)

Definition: A numerical measure such as the mean, median, mode, range, variance, or standard deviation calculated for a population data set.
Notation: Typically written with Greek letters (e.g., µ for mean, σ for standard deviation).
Characteristics:
- Usually unknown and constant.

Definition: A summary measure calculated for a sample data set, expressed using Latin letters (e.g., 𝑦̅ for sample mean, s for sample standard deviation).
Characteristics:
- Regarded as random before the sample is selected.
- Observed after the sample is selected.
- The value varies from sample to sample, a phenomenon known as sampling variability.
Sampling Distribution: The distribution of all possible values of a statistic from repeated samples.

Introduction: A simple random sample (SRS) of size n from a large population enables estimation of probability p by calculating the sample proportion, defined as:
ext{Sample Proportion } (p̂) = \frac{\text{Number of Successes in the Sample}}{\text{Sample Size (n)}}
Examples:
- Flipping n coins and recording how many show 'Tail'.
- Surveying n random individuals to determine how many possess an IQ above 120.
- Surveying n random students to find how many have more than two siblings.

Population Proportion (p): Defined by the ratio of the number of successes in a population to the total number of elements in that population, given by:
p = \frac{\text{Number of Successes in the Population}}{\text{Population Size (N)}}
Examples:
- Determining the number of non-resident students from N students checked.
- Counting how many out of N adults in a village consume snacks.

Scenario: A greeting card company produces 10,000 cards, of which 7,000 are birthday cards. A random sample of 200 cards shows that 128 are birthday cards.
- Tasks:
- Calculate the proportion of birthday cards in the population and the sample.
- Find the sampling error, assuming no non-sampling error has occurred.

Concept: When taking different samples to estimate a population characteristic, outcomes will likely differ. This is known as sampling variability.
Histogram Representation: If we analyzed all samples, the histogram of the sample proportions would be called the sampling distribution of the proportions.
Expectation: The histogram is expected to center around the true population proportion p.

ext{Standard Deviation } (σ_{p̂}) = \sqrt{\frac{p(1 - p)}{n}}
Notation: This standard deviation is called the standard error, denoted SD_{p̂}.

Distribution Characteristics:
- Unimodal, symmetric, and centered at p.
- For large n, the sampling proportion p̂ is approximately normally distributed (Central Limit Theorem).
Rule of Thumb:
- Sample size is sufficiently large if:
  np ≥ 10 ext{ and } n(1 - p) ≥ 10

Model Application: A sampling distribution model quantifies variation in sample proportions and calculates the likelihood of observing a sample proportion within a specific range.
Probability Model: Foundationally modeled as N(p, σ^{2}) for sufficiently large n.

Brands: S and T; assume equal preference.
Sample Size: n = 3 tasters.
Objective: Determine the sampling distribution for the sample proportion, including mean and standard deviation.

Assumptions can be difficult or sometimes impossible to check; thus, we assume them but must verify their reasonableness through related conditions.

Randomization Condition: The sample should be a simple random sample of the population.
10% Condition: If sampling without replacement, the sample size (n) should not exceed 10% of the population size.
Success/Failure Condition: The sample size must be large enough that both np and nq are at least 10.

For standardization of proportion successes, the z-value is defined as:
z = \frac{p̂ - μ{p}}{σ{p}}
where p̂ is the sample proportion.