Lesson ch. 18 (1)

Chapter 18: Sampling Distribution Models - Day 1

Introduction to Sampling Distribution Models

  • Focus on modeling the distribution of sample proportions rather than using real repeated samples.

  • Conceptualize what happens when drawing many samples and examining their proportions.

Characteristics of Sample Proportions

  • The histogram of sample proportions drawn repeatedly:

    • Unimodal: One clear peak indicating the sample proportion.

    • Symmetric: Balanced on each side of the center.

    • Centered: Around the true population proportion (p).

Normal Model for Sample Proportions

  • The amazing fact:

    • The distribution of sample proportions can be modeled by a normal distribution.

  • Key parameters of the normal model:

    • Mean (center): The true population proportion (p).

    • Standard Deviation: Calculated using the formula:

      • [ \sigma = \sqrt{\frac{pq}{n}} ]

      • Where ( q ) is the complement of ( p ).

      • This distribution centers on ( p ) with standard deviation affecting the spread.

  • Normal curve representation:

    • 68% of sample proportions fall within 1 standard deviation of the mean (p).

    • 95% fall within 2 standard deviations.

    • 99.7% within 3 standard deviations.

Validating the Normal Model

  • The normal model improves with larger sample sizes.

  • Two assumptions and conditions must be met:

    1. Independence of Sampled Values: Sampled values must be independent of each other.

    2. Large Enough Sample Size (n): Sufficiently large to apply the normal model.

Checking Assumptions and Conditions

  • Independence Condition:

    • Use the 10% condition: The sample size ( n ) must be no larger than 10% of the population if sampling without replacement.

  • Large Enough Condition:

    • Success-Failure Condition: Ensure ( n \cdot p ) and ( n \cdot q ) both exceed 10.

Understanding Sampling Distribution Models

  • A distribution of sample proportions (( \hat{p} )) emerges and is modeled as a normal distribution:

    • Mean of ( \hat{p} ): True population proportion (p).

    • Standard Deviation of ( \hat{p} ): ( \sqrt{\frac{pq}{n}} ).

  • Although we imagine these sampling distribution models, they are crucial for drawing conclusions about populations from data samples.

Example 1: Proportion of Speeders

  • Scenario: 80% of cars exceed the speed limit.

  • Normal model for 50 cars:

    • Centered at 80%

    • Standard deviation approximately 5.7% (calculated from ( \sqrt{ 0.8 \times 0.2/50} )).

    • Conditions validated:

      • 50 is less than 10% of a large population.

      • Success condition: ( 50 \cdot 0.8 = 40 ) (greater than 10) and ( 50 \cdot 0.2 = 10 ) (also greater than 10).

  • Expectation ranges based on standard deviations:

    • 68% probability of sample proportion between 74.3% and 85.7%.

    • 95% probability between 68.7% and 91.3%.

    • 99.7% probability between 63% and 97%.

Example 2: Groovy M&Ms

  • Scenario: 30% of candies in a bag are Groovy M&Ms.

  • Analysis for a bag of 250 M&Ms:

    • Normal model centered at 0.3 with standard deviation approximately 0.02898 ( (\sqrt{0.3 \times 0.7 / 250}) ).

  • Probability calculation for at least 25% Groovy M&Ms:

    • Validating model:

      • 250 < 10% of total M&Ms produced.

      • Success condition satisfied: ( n \cdot p = 75 ) and ( n \cdot q = 175 ).

    • Probability methods:

      • Normal CDF for 0.25 to 0.3 results in 0.4578; add 50% for symmetry gives 0.9578.

      • Alternatively, using extreme values gives the same result.

Conclusion

  • Importance of sampling distribution models: They allow conclusions about a population based on finite data, acting as a bridge from empirical data to statistical analysis.