Lesson 2.4: The Empirical Rule and Assessing Normality

Focus: The Empirical Rule and Assessing Normality in distributions.
Goals:
- Utilize the empirical rule to estimate the proportion of values in given intervals of a normal distribution.
- Determine the value that aligns with a specific percentile in a normal distribution.
- Use graphical and numerical evidence for assessing if a distribution is approximately normal.

Normal distributions are crucial in statistics due to the following reasons:
- Real Data Representation: They effectively describe certain distributions of real data. Examples include:
- Test scores, such as SAT and IQ tests.
- Repeated measurements of specific quantities (e.g., diameter of a tennis ball).
- Biological characteristics (e.g., sizes of crickets, corn yields).
- Chance Outcome Approximation: Normal distributions serve as good approximations for various chance outcomes, such as the proportion of heads from numerous coin tosses.
- Foundational in Statistical Methods: Several statistical inference methods, particularly those in Chapters 8–11 of the referenced textbook, rely on normal distributions.

Definition: The empirical rule states that in any normal distribution defined by the mean (μ) and standard deviation (σ):
- Approximately 68% of values lie within 1σ of the mean μ.
- Around 95% of values fall within 2σ of the mean μ.
- Nearly 99.7% of values are within 3σ of the mean μ.
Common Terminology: This rule is also referred to as the 68–95–99.7 rule, which collectively allows for quick estimation of data proportions within specified intervals.

Example with ITBS Vocabulary Scores:
- Distribution modeled as normal with mean μ = 6.84 and standard deviation σ = 1.55.
- Key Questions:
- What percent of students score below 3.74?
- What proportion scores between 5.29 and 9.94?
- Results: Using the empirical rule, about 2.5% of students have scores below 3.74. About 81.5% have scores between 5.29 and 9.94.
Standard Normal Distribution: Adjusting settings in a statistics applet to visualize the standard normal distribution:
- Mean = 0, Standard Deviation = 1.
- Percentages of data within standard deviation levels:
- Inside 1σ: ~68%
- Inside 2σ: ~95%
- Inside 3σ: ~99.7%

Necessary Steps:
- Begin with a graph:
- Identify if it is clearly skewed.
- Check for multiple peaks or non-bell shapes.
- Support findings with numerical summaries.
- Example with Calories in Breakfast Cereal:
- Graph shows a symmetric, single-peaked structure.
- Summary statistics reveal:
  - Within 1σ: 81.8%
  - Within 2σ: 92.2%
  - Within 3σ: 100.0%
Sampling and Generalization: For Double Stuf Oreo cookies, the empirical rule validates normality.
- Mean calculated at 6.742 and standard deviation at 0.184.
- Percentages from histogram analysis confirm the empirical rule (71.1% within 1SD and 95.6% within 2SD).

Assumption Risks: Assuming normality can be misleading without prior analysis. It is crucial to visually assess and calculate the data to confirm if the distribution follows normality.
- Don’t solely rely on symmetry and bell shape; empirical rule statistics must corroborate the normality assumption.
Right Skewed Distributions: Other quantitative variables can exhibit skewness, such as:
- Single-family house prices, survival times post-cancer treatment, number of siblings among statistics class students, etc.

The empirical rule is applicable only to normal distributions, thus requiring rigorous statistical analysis to ensure its utility. This includes graphical evaluations combined with calculations of data value distributions within standard deviations to verify normality.
The lesson illustrates crucial methods to apply the empirical rule and assess normality using real-world examples, bolstering statistical inference analysis in various fields.