Lesson ch. 18 (1)

Focus on modeling the distribution of sample proportions rather than using real repeated samples.
Conceptualize what happens when drawing many samples and examining their proportions.

The histogram of sample proportions drawn repeatedly:
- Unimodal: One clear peak indicating the sample proportion.
- Symmetric: Balanced on each side of the center.
- Centered: Around the true population proportion (p).

The amazing fact:
- The distribution of sample proportions can be modeled by a normal distribution.
Key parameters of the normal model:
- Mean (center): The true population proportion (p).
- Standard Deviation: Calculated using the formula:
  - [ \sigma = \sqrt{\frac{pq}{n}} ]
  - Where ( q ) is the complement of ( p ).
  - This distribution centers on ( p ) with standard deviation affecting the spread.
Normal curve representation:
- 68% of sample proportions fall within 1 standard deviation of the mean (p).
- 95% fall within 2 standard deviations.
- 99.7% within 3 standard deviations.

The normal model improves with larger sample sizes.
Two assumptions and conditions must be met:
1. Independence of Sampled Values: Sampled values must be independent of each other.
2. Large Enough Sample Size (n): Sufficiently large to apply the normal model.

Independence Condition:
- Use the 10% condition: The sample size ( n ) must be no larger than 10% of the population if sampling without replacement.
Large Enough Condition:
- Success-Failure Condition: Ensure ( n \cdot p ) and ( n \cdot q ) both exceed 10.

A distribution of sample proportions (( \hat{p} )) emerges and is modeled as a normal distribution:
- Mean of ( \hat{p} ): True population proportion (p).
- Standard Deviation of ( \hat{p} ): ( \sqrt{\frac{pq}{n}} ).
Although we imagine these sampling distribution models, they are crucial for drawing conclusions about populations from data samples.

Scenario: 80% of cars exceed the speed limit.
Normal model for 50 cars:
- Centered at 80%
- Standard deviation approximately 5.7% (calculated from ( \sqrt{ 0.8 \times 0.2/50} )).
- Conditions validated:
  - 50 is less than 10% of a large population.
  - Success condition: ( 50 \cdot 0.8 = 40 ) (greater than 10) and ( 50 \cdot 0.2 = 10 ) (also greater than 10).
Expectation ranges based on standard deviations:
- 68% probability of sample proportion between 74.3% and 85.7%.
- 95% probability between 68.7% and 91.3%.
- 99.7% probability between 63% and 97%.

Scenario: 30% of candies in a bag are Groovy M&Ms.
Analysis for a bag of 250 M&Ms:
- Normal model centered at 0.3 with standard deviation approximately 0.02898 ( (\sqrt{0.3 \times 0.7 / 250}) ).
Probability calculation for at least 25% Groovy M&Ms:
- Validating model:
  - 250 < 10% of total M&Ms produced.
  - Success condition satisfied: ( n \cdot p = 75 ) and ( n \cdot q = 175 ).
- Probability methods:
  - Normal CDF for 0.25 to 0.3 results in 0.4578; add 50% for symmetry gives 0.9578.
  - Alternatively, using extreme values gives the same result.

Importance of sampling distribution models: They allow conclusions about a population based on finite data, acting as a bridge from empirical data to statistical analysis.