Week 8: Sampling Distributions

Overview

Title: Sampling Distributions (Chapter 10)
Course: MATH 10006 Business Statistics
Authors: Sharpe de Veaux, Velleman, Wright
Edition: Fourth Canadian Edition

Importance of Sampling

Why Sample?
- Time-consuming to contact the entire population.
- Costs may be prohibitive.
- It's physically impossible to gather data from the whole population.
- Some tests are destructive to the population unit.
- Results from a proper sample are often ‘good enough’ for making inferences about the population.
Inferential Statistics:
- This is the process of drawing conclusions about a population based on information gathered from a sample through descriptive statistics.

Key Terminology

Population vs. Sample Statistics:
- Population: The entirety of the group we are interested in studying.
- Sample: A subset of the population.
- Parameters: Descriptive measures of a population (e.g., mean $
  u = ext{mean}$, standard deviation $
  u = ext{standard deviation}$).
- Statistics: Descriptive measures computed from a sample (e.g., sample mean $ar{y}$, sample standard deviation $s$).
- Size Notation: Population size is denoted by $N$, while sample size is denoted by $n$.

Variability in Statistics

Statistics as Variables:
- The calculated values of statistics from different samples can vary, defining them as variables.
- Even when samples are representative, different samples will yield estimations that exhibit variability, known as sampling variability.
- Sampling Error: While typically seen as an "error," this is more accurately just variability expected between samples.

Sampling Distribution Characteristics

Shape of Sampling Distribution:
- The distribution of sample proportions is unimodal, symmetric, and bell-shaped (normal).
- Center: The average of all potential sample proportions.
Spread: Defined by standard deviation ($ ext{SD}$) of all possible sample proportions, which can be expressed as:
ext{SD}(ar{p}) = rac{pq}{n}
Where $p$ is the sample proportion and $q = 1 - p$.
Mean of the Sampling Distribution:
- The mean of the sample proportions (${ar{p}}$) is equal to the population proportion ($p$).
- The approximation of normality is most accurate when $p$ is close to 0.5, requiring larger sample sizes $n$ to maintain accuracy when $p$ is further from 0.5.

Basic Conditions for Sampling Distribution

Independence Assumption: Outcomes should be independent.
Randomization Condition: Sample must be indicative of the population.
10% Condition: Sample size $n$ must be less than or equal to 10% of the total population size $N$.
Success/Failure Condition: The number of “successes” ($np$) and the number of “failures” ($nq$) must each be greater than or equal to 10, ensuring that conditions for the normal approximation are met.

Standardization of Sampling Distribution of the Proportion

Using the standardization formula to find z-scores:
z = rac{ar{p} - p}{ ext{SD}(ar{p})} where:
ext{SD}(ar{p}) = rac{pq}{n}

Example of Sampling Distribution of Proportions

Assume $p = 0.30$ (proportion). Then,
ext{SD}(ar{p}) = (0.30)(0.70) rac{1}{100} = 0.0458

Calculating z-score with a sample proportion $ar{p} = 0.33$:
z = rac{0.33 - 0.30}{0.0458}
ightarrow z = 0.655
This yields a percentage of approximately 25.62% under the normal distribution.

Sampling Distribution of the Mean

Visualizing Sampling Means: For any population, if one could assess all possible samples of size $n$, the distribution (or histogram) of the sample means $ar{y}$ would follow normal distribution principles as outlined in the Central Limit Theorem (CLT).

Shape: This distribution will also be unimodal, symmetric, and bell-shaped assuming enough sample size.
Center: Mean of all sample means equals the population mean $
u$.
Spread: Defined by the standard deviation (termed standard error) of sampling means:
ext{SE}(ar{y}) = rac{ ext{population standard deviation} au}{ ext{sample size} n}

Central Limit Theorem (CLT)

The CLT states that the sampling distribution of the sample mean will approximate normality for sufficiently large sample sizes ($n$) regardless of the population distribution shape, with convergence improving with larger sample sizes.

Practical Applications and Examples

Medical Study Scenario:
- For young adults born prematurely with very low birth weights (below 1500 grams): The mean systolic blood pressure was found to be $
  u = 120.7$ mm Hg with a standard deviation of $ au = 13.8$ mm Hg.
- Identifying the population: All young adults with parameters measured.
- For samples of size 30, mean and standard deviation of sample means are calculated as:
  - Mean = 120.7 mm Hg
  - SE = $ rac{13.8}{ ext{sample size} (30)} = 2.52$ mm Hg.
- Repeating for a sample size of 90, we have:
  - Mean = 120.7 mm Hg
  - SE = $ rac{13.8}{90} = 1.45$ mm Hg.
Heart Rate Assessment Study:
- Mean resting heart rate $
  u = 72$ bpm, with standard deviation $ au = 11.2$ bpm across a sample of size 29.
- Assessing probability for a sample mean exceeding 77.3 bpm and exploring necessary assumptions for normality in this population.
**Quality Control Study at Cola Inc.:
- Mean amount of cola measured as 1L with a population standard deviation $ au = 12.8$ mL. A random sample of 16 bottles showed a mean of 1.006 L.
- Calculating probability and z-scores to gauge whether the process is likely overfilling.

Homework/Practice Assignment

Assignments are related to chapter 10 concepts, focusing on finding normal model areas/probabilities from the preceding weeks.