Sampling Distributions
Population vs. Sample
- Population: All elements/members (individuals, items, objects) whose characteristics are studied.
- Sample: A portion/subset of the population selected for study.
Reasons for Sampling
- Gathering information on the entire population is too expensive.
- It is often impossible to gather information on the entire population.
Representative Sample
- A sample representing the population's characteristics as closely as possible.
- Also called random sample.
Types of Samples
- Random/Representative Samples
- Simple Random Sample
- Systematic Sample
- Stratified Sample
- Cluster Sample
- Non-random/Non-representative
- Voluntary Response Sample / Convenience Sample
Sampling Methods
- Simple Random Sample: Each element has the same chance of being included.
- Systematic Sample: Select a starting point, then select every nth element (e.g., 10th or 50th).
- Cluster Sample: Divide the population into groups (clusters), randomly select clusters, and then include all members from selected clusters.
- Stratified Sample: Subdivide the population into subgroups (strata) with shared characteristics (e.g., gender, age), then draw a sample from each stratum.
- Voluntary Response/Convenience Sample: Respondents decide whether to be included.
Inferential Statistics
- Making statements about a population by examining sample results.
- Using sample statistics to infer population parameters.
- Inference: Estimating unknown population parameters from sample evidence.
- Estimation: Estimating population mean weight using the sample mean weight.
- Hypothesis Testing: Using sample evidence to test a claim about a population parameter (e.g., population mean weight is 120 pounds).
- Drawing conclusions and/or making decisions about a population based on sample results.
Population and Sampling Distribution
- Population Distribution: Probability distribution of the population data.
- Sampling Distribution: Probability distribution of a sample statistic.
- Sample information helps infer population parameters (\mu, p, and \sigma) based on sample statistics (\bar{x}, \hat{p}, and S).
Sampling Distributions
- Sampling Distributions of Sample Means
- Sampling Distributions of Sample Proportions
Sampling Distribution of the Mean (\sigma known)
- Central Limit Theorem:
- Let X1, X2, …, X_n be a sequence of independent and identically distributed random variables each having a mean \mu and a standard deviation \sigma.
- For a large n (by the law of large numbers), the Central Limit Theorem asserts that: E(X) = \mu_x = \mu
- Standard error of the mean: \sigma_\bar{x} = \frac{\sigma}{\sqrt{n}}
- For samples from finite populations: \sigma_\bar{x} = \frac{\sigma}{\sqrt{n}} \sqrt{\frac{N-n}{N-1}}
- Finite population correction factor: \sqrt{\frac{N-n}{N-1}}
- If \frac{n}{N} \leq 0.05, then the correction factor can be ignored.
Central Limit Theorem Visualization
- Sampling distribution of \bar{X} becomes normal as n increases, regardless of population distribution (shown with uniform and exponential distributions).
- As sample size (n) increases, the sampling distribution becomes approximately normal.
Standardized Sample Mean
- Z = \frac{\bar{X} - \mu}{\frac{\sigma}{\sqrt{n}}}
- When n is large and the population is infinite or finite but \frac{n}{N} \leq 0.05
- If random samples are from a normal distribution, the sampling distribution of the mean is normal regardless of sample size.
- In practice, the normal distribution provides an excellent approximation to the sample distribution of the mean for n as small as 30.
Sampling Distribution of the Mean (\sigma unknown)
- If \bar{X} is the mean of a random sample of size n from a normal population with mean \mu and standard deviation \sigma:
- t = \frac{\bar{X} - \mu}{\frac{S}{\sqrt{n}}}, where \frac{S}{\sqrt{n}} = S_x
- S is a random variable having a t-distribution with df = n-1 (degrees of freedom).
- S = \sqrt{\frac{\sum{i=1}^{n} (Xi - \bar{X})^2}{n-1}}
Student’s t-Distribution
- t-distributions are bell-shaped and symmetric, but have ‘fatter’ tails than the normal distribution.
- Standard Normal distribution is equivalent to a t-distribution with df = ∞.
- t approaches Z as n increases.
- The t-distribution is similar to the standard normal distribution and both are bell shaped and symmetrical about the mean.
- The standard deviation of the t-distribution exceeds 1, but approaches 1 as n approaches ∞.
- The standard normal distribution provides a good approximation to the t-distribution for a sample size of 30 or more.
- The assumption that the sample must come from a normal distribution is not so severe restriction as it may seem. Studies have shown that the distribution of random variable is fairly close to a t-distribution even for a sample from certain non-normal populations. In practice, it is necessary to make sure that the population from which we draw the sample is approximately bell shaped and not too skewed.
Problems and Solutions: Sample Mean
Problem 1:
- SAT test mean score: 904, standard deviation: 152. Assume normal distribution.
- Calculate mean and standard deviation of \bar{X} for:
- a) n = 16
- E(\bar{X}) = \mu_{\bar{x}} = \mu = 904
- \sigma_{\bar{x}} = \frac{\sigma}{\sqrt{n}} = \frac{152}{\sqrt{16}} = 38
- b) n = 100
- E(\bar{X}) = \mu_{\bar{x}} = \mu = 904
- \sigma_{\bar{x}} = \frac{\sigma}{\sqrt{n}} = \frac{152}{\sqrt{100}} = 15.2
- a) n = 16
Problem 2:
- Cookie weight mean: 8 ounces, standard deviation: 3 ounces.
- Find P(7.8 ≤ \bar{X} ≤ 8.2) for n = 36.
- Solution:
- \mu_{\bar{x}} = 8
- \sigma_{\bar{x}} = \frac{\sigma}{\sqrt{n}} = \frac{3}{\sqrt{36}} = 0.5
- P(7.8 \leq \bar{X} \leq 8.2) = P(\frac{7.8 - 8}{0.5} \leq Z \leq \frac{8.2 - 8}{0.5}) = P(-0.4 \leq Z \leq 0.4) = 0.3108
Sampling Distribution of the Proportion
- The information that is usually available for the estimation of a proportion is the number of times, X, that an appropriate event occurs in n trials, occasions, or observations.
- The point estimator of the population proportion, itself, is usually the sample proportion: \hat{p} = \frac{X}{n}
- \hat{p} - proportion of the time that the event actually occurs or sample proportion.
- If the n trials satisfy the assumptions underlying the binomial distribution, we know the mean and standard deviation of the number of successes.
- If we divide both of these quantities by n, we will find the mean and the standard deviation of the proportions of successes:
- \mu_{\hat{p}} = p
- \sigma_{\hat{p}} = \sqrt{\frac{pq}{n}} = \sqrt{\frac{p(1-p)}{n}}
Standardized Sample Mean for Proportion
- Z = \frac{\hat{p} - p}{\sqrt{\frac{pq}{n}}}
- When n is large, we can construct a Z-value for a binomial parameter p, by using the normal approximation to the binomial distribution.
- Usually, the normal distribution is used as an approximation to the binomial distribution when np ≥ 5 and nq ≥ 5
Proportion Problems and Solutions
- Problem 1:
- 44% of teenage boys are bad drivers. Sample of 100 male teenager drivers.
- Find mean and standard deviation of \hat{p}.
- \mu_{\hat{p}} = p = 0.44
- \sigma_{\hat{p}} = \sqrt{\frac{(0.44)(0.56)}{100}} = 0.0496
- Problem 2:
- True proportion of voters supporting Proposition A is 0.4. Sample size 200.
- Find P(0.40 ≤ \hat{p} ≤ 0.45).
- \sigma_{\hat{p}} = \sqrt{\frac{0.4(1-0.4)}{200}} = 0.03464
- P(0.40 \leq \hat{p} \leq 0.45) = P(\frac{0.40 - 0.40}{0.03464} \leq Z \leq \frac{0.45 - 0.40}{0.03464}) = P(0 \leq Z \leq 1.44) = 0.4251
Additional Problems
- Problem 1:
- Maureen Webster claims 53% favor in a large city. Sample 400 voters. Find probability less than 49% favor her.
- P(\hat{p} < 0.49) = P(Z < \frac{0.49 - 0.53}{\sqrt{\frac{(0.53)(0.47)}{400}}}) = P(Z < -1.6) = 0.0548
- Problem 2:
- Machine produces 5% defective parts. Sample of 400 parts every week. Stop and readjust if 8% or more defective.
- Find probability process will be stopped.
- P(\hat{p} > 0.08) = P(Z > \frac{0.08 - 0.05}{\sqrt{\frac{(0.05)(0.95)}{400}}}) = P(Z > 2.75) = 0.003