Sampling Distributions

Population vs. Sample

Population: All elements/members (individuals, items, objects) whose characteristics are studied.
Sample: A portion/subset of the population selected for study.

Reasons for Sampling

Gathering information on the entire population is too expensive.
It is often impossible to gather information on the entire population.

Representative Sample

A sample representing the population's characteristics as closely as possible.
Also called random sample.

Types of Samples

Random/Representative Samples
- Simple Random Sample
- Systematic Sample
- Stratified Sample
- Cluster Sample
Non-random/Non-representative
- Voluntary Response Sample / Convenience Sample

Sampling Methods

Simple Random Sample: Each element has the same chance of being included.
Systematic Sample: Select a starting point, then select every nth element (e.g., 10th or 50th).
Cluster Sample: Divide the population into groups (clusters), randomly select clusters, and then include all members from selected clusters.
Stratified Sample: Subdivide the population into subgroups (strata) with shared characteristics (e.g., gender, age), then draw a sample from each stratum.
Voluntary Response/Convenience Sample: Respondents decide whether to be included.

Inferential Statistics

Making statements about a population by examining sample results.
Using sample statistics to infer population parameters.
Inference: Estimating unknown population parameters from sample evidence.
Estimation: Estimating population mean weight using the sample mean weight.
Hypothesis Testing: Using sample evidence to test a claim about a population parameter (e.g., population mean weight is 120 pounds).
Drawing conclusions and/or making decisions about a population based on sample results.

Population and Sampling Distribution

Population Distribution: Probability distribution of the population data.
Sampling Distribution: Probability distribution of a sample statistic.
Sample information helps infer population parameters (\mu, p, and \sigma) based on sample statistics (\bar{x}, \hat{p}, and S).

Sampling Distributions

Sampling Distributions of Sample Means
Sampling Distributions of Sample Proportions

Sampling Distribution of the Mean (\sigma known)

Central Limit Theorem:
- Let X1, X2, …, X_n be a sequence of independent and identically distributed random variables each having a mean \mu and a standard deviation \sigma.
- For a large n (by the law of large numbers), the Central Limit Theorem asserts that: E(X) = \mu_x = \mu
- Standard error of the mean: \sigma_\bar{x} = \frac{\sigma}{\sqrt{n}}
- For samples from finite populations: \sigma_\bar{x} = \frac{\sigma}{\sqrt{n}} \sqrt{\frac{N-n}{N-1}}
- Finite population correction factor: \sqrt{\frac{N-n}{N-1}}
- If \frac{n}{N} \leq 0.05, then the correction factor can be ignored.

Central Limit Theorem Visualization

Sampling distribution of \bar{X} becomes normal as n increases, regardless of population distribution (shown with uniform and exponential distributions).
As sample size (n) increases, the sampling distribution becomes approximately normal.

Standardized Sample Mean

Z = \frac{\bar{X} - \mu}{\frac{\sigma}{\sqrt{n}}}
When n is large and the population is infinite or finite but \frac{n}{N} \leq 0.05
If random samples are from a normal distribution, the sampling distribution of the mean is normal regardless of sample size.
In practice, the normal distribution provides an excellent approximation to the sample distribution of the mean for n as small as 30.

Sampling Distribution of the Mean (\sigma unknown)

If \bar{X} is the mean of a random sample of size n from a normal population with mean \mu and standard deviation \sigma:
- t = \frac{\bar{X} - \mu}{\frac{S}{\sqrt{n}}}, where \frac{S}{\sqrt{n}} = S_x
- S is a random variable having a t-distribution with df = n-1 (degrees of freedom).
- S = \sqrt{\frac{\sum{i=1}^{n} (Xi - \bar{X})^2}{n-1}}

Student’s t-Distribution

t-distributions are bell-shaped and symmetric, but have ‘fatter’ tails than the normal distribution.
Standard Normal distribution is equivalent to a t-distribution with df = ∞.
t approaches Z as n increases.
The t-distribution is similar to the standard normal distribution and both are bell shaped and symmetrical about the mean.
The standard deviation of the t-distribution exceeds 1, but approaches 1 as n approaches ∞.
The standard normal distribution provides a good approximation to the t-distribution for a sample size of 30 or more.
The assumption that the sample must come from a normal distribution is not so severe restriction as it may seem. Studies have shown that the distribution of random variable is fairly close to a t-distribution even for a sample from certain non-normal populations. In practice, it is necessary to make sure that the population from which we draw the sample is approximately bell shaped and not too skewed.

Problems and Solutions: Sample Mean

Problem 1:
- SAT test mean score: 904, standard deviation: 152. Assume normal distribution.
- Calculate mean and standard deviation of \bar{X} for:
  - a) n = 16
    - E(\bar{X}) = \mu_{\bar{x}} = \mu = 904
    - \sigma_{\bar{x}} = \frac{\sigma}{\sqrt{n}} = \frac{152}{\sqrt{16}} = 38
  - b) n = 100
    - E(\bar{X}) = \mu_{\bar{x}} = \mu = 904
    - \sigma_{\bar{x}} = \frac{\sigma}{\sqrt{n}} = \frac{152}{\sqrt{100}} = 15.2
Problem 2:
- Cookie weight mean: 8 ounces, standard deviation: 3 ounces.
- Find P(7.8 ≤ \bar{X} ≤ 8.2) for n = 36.
- Solution:
  - \mu_{\bar{x}} = 8
  - \sigma_{\bar{x}} = \frac{\sigma}{\sqrt{n}} = \frac{3}{\sqrt{36}} = 0.5
  - P(7.8 \leq \bar{X} \leq 8.2) = P(\frac{7.8 - 8}{0.5} \leq Z \leq \frac{8.2 - 8}{0.5}) = P(-0.4 \leq Z \leq 0.4) = 0.3108

Sampling Distribution of the Proportion

The information that is usually available for the estimation of a proportion is the number of times, X, that an appropriate event occurs in n trials, occasions, or observations.
The point estimator of the population proportion, itself, is usually the sample proportion: \hat{p} = \frac{X}{n}
- \hat{p} - proportion of the time that the event actually occurs or sample proportion.
If the n trials satisfy the assumptions underlying the binomial distribution, we know the mean and standard deviation of the number of successes.
If we divide both of these quantities by n, we will find the mean and the standard deviation of the proportions of successes:
- \mu_{\hat{p}} = p
- \sigma_{\hat{p}} = \sqrt{\frac{pq}{n}} = \sqrt{\frac{p(1-p)}{n}}

Standardized Sample Mean for Proportion

Z = \frac{\hat{p} - p}{\sqrt{\frac{pq}{n}}}
When n is large, we can construct a Z-value for a binomial parameter p, by using the normal approximation to the binomial distribution.
Usually, the normal distribution is used as an approximation to the binomial distribution when np ≥ 5 and nq ≥ 5

Proportion Problems and Solutions

Problem 1:
- 44% of teenage boys are bad drivers. Sample of 100 male teenager drivers.
- Find mean and standard deviation of \hat{p}.
- \mu_{\hat{p}} = p = 0.44
- \sigma_{\hat{p}} = \sqrt{\frac{(0.44)(0.56)}{100}} = 0.0496
Problem 2:
- True proportion of voters supporting Proposition A is 0.4. Sample size 200.
- Find P(0.40 ≤ \hat{p} ≤ 0.45).
- \sigma_{\hat{p}} = \sqrt{\frac{0.4(1-0.4)}{200}} = 0.03464
- P(0.40 \leq \hat{p} \leq 0.45) = P(\frac{0.40 - 0.40}{0.03464} \leq Z \leq \frac{0.45 - 0.40}{0.03464}) = P(0 \leq Z \leq 1.44) = 0.4251

Additional Problems

Problem 1:
- Maureen Webster claims 53% favor in a large city. Sample 400 voters. Find probability less than 49% favor her.
- P(\hat{p} < 0.49) = P(Z < \frac{0.49 - 0.53}{\sqrt{\frac{(0.53)(0.47)}{400}}}) = P(Z < -1.6) = 0.0548
Problem 2:
- Machine produces 5% defective parts. Sample of 400 parts every week. Stop and readjust if 8% or more defective.
- Find probability process will be stopped.
- P(\hat{p} > 0.08) = P(Z > \frac{0.08 - 0.05}{\sqrt{\frac{(0.05)(0.95)}{400}}}) = P(Z > 2.75) = 0.003