Inferences from Samples to Populations: Understanding Sampling Distributions

Introduction to Sampling Distributions

  • Learning Goal: The primary objective is to understand the fundamental ideas underlying sampling distributions. This allows for the analysis of both a distribution of sample means (xˉ\bar{x}) and a distribution of sample proportions (p^\hat{p}).

  • Sampling Distribution Definition: The distribution of any sample statistic (such as a mean or proportion) calculated from all possible samples of a particular size is called a sampling distribution.

  • Technical Distinction: A distribution specifically showing the means of all possible samples is technically called a sampling distribution of sample means.

Fundamental Concepts and the Basic Idea of Sample Means

  • Small Population Example: Consider a population consisting of only three children with ages 44, 55, and 99.

    • Population Mean (μ\mu): The mean age is calculated as 4+5+93=6.0years\frac{4 + 5 + 9}{3} = 6.0\, \text{years}.

  • Sample Size n=1n = 1:

    • Possible samples: 4{4}, 5{5}, and 9{9}.

    • Sample means (xˉ\bar{x}): 4.04.0, 5.05.0, and 9.09.0.

    • Mean of the sample means: 4+5+93=6.0years\frac{4 + 5 + 9}{3} = 6.0\, \text{years}.

    • The mean of the sample means is exactly equal to the population mean (μ\mu).

  • Sample Size n=2n = 2 (Sampling with Replacement):

    • Sampling with Replacement: A method where one member is chosen at random, recorded, and then put back into the pool before the next member is chosen. This allows the same individual to be selected multiple times in a single sample.

    • Total possible samples for n=2n = 2 from a population of 33: 3×3=93 \times 3 = 9 possible samples.

    • Possible Samples and Means:

      1. Sample 4,4{4, 4}, Mean = 4.04.0

      2. Sample 4,5{4, 5}, Mean = 4.54.5

      3. Sample 4,9{4, 9}, Mean = 6.56.5

      4. Sample 5,4{5, 4}, Mean = 4.54.5

      5. Sample 5,5{5, 5}, Mean = 5.05.0

      6. Sample 5,9{5, 9}, Mean = 7.07.0

      7. Sample 9,4{9, 4}, Mean = 6.56.5

      8. Sample 9,5{9, 5}, Mean = 7.07.0

      9. Sample 9,9{9, 9}, Mean = 9.09.0

  • Frequency of Sample Means (n=2n = 2):

    • 4.04.0: Frequency 11

    • 4.54.5: Frequency 22

    • 5.05.0: Frequency 11

    • 6.56.5: Frequency 22

    • 7.07.0: Frequency 22

    • 9.09.0: Frequency 11

  • Observations for n=2n = 2:

    • The mean of these nine sample means is 4.0+4.5+6.5+4.5+5.0+7.0+6.5+7.0+9.09=6.0years\frac{4.0 + 4.5 + 6.5 + 4.5 + 5.0 + 7.0 + 6.5 + 7.0 + 9.0}{9} = 6.0\, \text{years}. This remains equal to the population mean.

    • The distribution starts to show clustering near the population mean of 6.06.0, appearing "more normal" than the distribution for n=1n = 1.

The Impact of Sample Size and the Central Limit Theorem

  • Influence of Larger Samples: As sample size increases (e.g., n=10n = 10), the distribution of sample means looks increasingly like a normal distribution.

  • Central Limit Theorem: This phenomenon, where the distribution of sample means approaches a normal distribution as the sample size increases regardless of the population's distribution shape, is a consequence of the Central Limit Theorem.

  • Unrealistic Scenarios: Drawing a sample of size n=10n = 10 from a population of only 33 requires multiple inclusions of the same individuals; however, it serves as a conceptual model for how larger samples narrow the distribution toward normality.

Sampling with Larger Populations

  • Practical Constraints: In real-world statistics, populations are often too large to survey entirely, making the true population mean (μ\mu) unknown and necessitating the use of sample means (xˉ\bar{x}) as estimates.

  • Sampling Error: This is the inherent error introduced by using a random sample to estimate a population parameter rather than the entire population.

    • Exclusions: Sampling error does not include errors from biased sampling, poorly worded survey questions, or recording mistakes.

  • Example: Web Research Hours (Data Set 8.1):

    • Population: 400400 students.

    • Population Mean (μ\mu): 3.88hours3.88\, \text{hours}.

    • Population Standard Deviation (σ\sigma): 2.40hours2.40\, \text{hours}.

    • Sample Statistics: A random sample of n=32n = 32 students might yield a sample mean of xˉ=4.38hours\bar{x} = 4.38\, \text{hours}.

    • Multiple samples of the same size will result in different sample means, and typically, no single sample mean exactly matches the true population mean.

Notation and Characteristics of Sampling Distributions

Notation for Means

Entity

Size

Mean

Standard Deviation

Population

NN

μ\mu

σ\sigma

Sample

nn

xˉ\bar{x}

ss

Characteristics of the Distribution of Sample Means
  1. Normality: The larger the sample size, the more closely the distribution approximates a normal distribution.

  2. Mean: The mean of the distribution of sample means is equal to the population mean (μxˉ=μ\mu_{\bar{x}} = \mu).

  3. Standard Deviation (Standard Error): The standard deviation of the distribution of sample means is the population standard deviation divided by the square root of the sample size: σxˉ=σn\sigma_{\bar{x}} = \frac{\sigma}{\sqrt{n}}.

  4. Rule of Thumb: A common guideline assumes the distribution of sample means is close to normal if the sample size is greater than 3030 (n > 30).

Probability and Standard Scores in Sample Means

  • Standard Score (zz) for Sample Means: Used to determine how extreme a sample mean is within the sampling distribution.

    • Formula: z=xˉμσxˉz = \frac{\bar{x} - \mu}{\sigma_{\bar{x}}}

  • Example Application (Web Research):

    • Given: n=32n = 32, μ=3.88\mu = 3.88, σ=2.40\sigma = 2.40.

    • Standard Error: σxˉ=2.40320.42\sigma_{\bar{x}} = \frac{2.40}{\sqrt{32}} \approx 0.42.

    • For a sample mean xˉ=5.01\bar{x} = 5.01: z=5.013.880.42=2.69z = \frac{5.01 - 3.88}{0.42} = 2.69.

    • Result: A zz-score of 2.692.69 corresponds to the 99.6499.64th percentile. The probability of selecting a sample with a mean greater than 5.015.01 is 10.9964=0.00361 - 0.9964 = 0.0036 (or 0.36%0.36\%).

Case Study: Sampling Texas Farms

  • Context: Texas has approximately 225,000225,000 farms.

  • Known Parameters: Population mean μ=582acres\mu = 582\, \text{acres}, population standard deviation σ=150acres\sigma = 150\, \text{acres}.

  • Problem: Find the probability of a random sample of n=100n = 100 farms having a mean size greater than 600acres600\, \text{acres}.

  1. Standard Error Calculation: σxˉ=150100=15acres\sigma_{\bar{x}} = \frac{150}{\sqrt{100}} = 15\, \text{acres}.

  2. Standard Score Calculation: z=60058215=1815=1.2z = \frac{600 - 582}{15} = \frac{18}{15} = 1.2.

  3. Probability Determination: A zz-score of 1.21.2 is in the 88.4988.49th percentile (probability 0.88490.8849 of being less than 600600). Thus, the probability of the mean being greater than 600600 is 10.8849=0.11511 - 0.8849 = 0.1151.

Sample Proportions

  • Population Proportion (pp): A population parameter representing the exact proportion of a population possessing a specific trait (e.g., car ownership).

    • Example: 240240 out of 400400 students own cars, so p=240400=0.6p = \frac{240}{400} = 0.6.

  • Sample Proportion (p^\hat{p}): The proportion observed within a sample.

    • Example: A sample of n=32n = 32 might yield a p^=0.75\hat{p} = 0.75.

  • Distribution of Sample Proportions: Results from calculating p^\hat{p} for all possible samples of a given size.

Notation for Proportions

Entity

Size

Proportion

Population

NN

pp

Sample

nn

p^\hat{p}

Characteristics of the Distribution of Sample Proportions
  1. Normality: Becomes more normal as sample size nn increases.

  2. Mean: Equal to the population proportion (μp^=p\mu_{\hat{p}} = p).

  3. Standard Deviation: Given by the formula σp^=p(1p)n\sigma_{\hat{p}} = \sqrt{\frac{p(1 - p)}{n}}.

Case Study: Analyzing Sample Proportion (Car Ownership)

  • Scenario: Population proportion p=0.6p = 0.6; Sample size n=32n = 32.

  • Selected Sample: Yields a sample proportion p^=0.75\hat{p} = 0.75.

  • Standard Deviation of Proportions: σp^=0.6(10.6)320.09\sigma_{\hat{p}} = \sqrt{\frac{0.6(1 - 0.6)}{32}} \approx 0.09.

  • Standard Score Calculation: z=0.750.60.09=1.67z = \frac{0.75 - 0.6}{0.09} = 1.67.

  • Probability: A zz-score of 1.671.67 corresponds to the 95.2595.25th percentile.

    • Probability of \text{proportion} < 0.75 is 0.95250.9525.

    • Probability of \text{proportion} > 0.75 is 10.9525=0.04751 - 0.9525 = 0.0475.

  • Interpretation: If 100100 random samples were taken, only about 55 of them would be expected to have a proportion of car owners higher than 0.750.75.

Questions & Discussion

  • Think About It: Suppose you choose only one sample of size n=32n = 32. According to Figure 8.4, are you more likely to choose a sample with a mean less than 2.52.5 or a sample with a mean less than 3.53.5? Explain.

    • Self-Correction/Context: Based on the normal distribution centered at 3.883.88, a value of 3.53.5 is closer to the mean than 2.52.5. Therefore, a larger area of the normal curve lies to the left of 3.53.5 than to the left of 2.52.5, making a sample mean less than 3.53.5 significantly more likely.