Normal Distribution

Statistical Reasoning: Normal Distribution Study Notes

Density Curve

  • Density Curve Overview:

    • Density curves are used in exploratory data analysis (EDA).

    • They provide a smoothed approximation of histograms, which are discrete graphs representing ranges of continuous values.

    • The primary purpose of density curves is to facilitate easier calculations, particularly through integration, comparing to discrete bins.

  • Purpose of Density Curves:

    • Density curves serve similar functions as histograms:

    • Show overall patterns (shape, center, variability).

    • Identify striking deviations such as outliers.

    • Density functions are smoothed versions of histograms that allow the calculation of the area under the curve to represent probabilities.

Center and Variability of a Density Curve

  • Areas under a Density Curve:

    • The area under a density curve represents proportions of the total observations.

    • Median:

    • The median is defined as the point where half of the observations lie on either side, which coincides with half the area under the curve to its left.

    • For symmetric density curves, the median is located at the center.

    • Mean:

    • The mean is the arithmetic average of all observations.

    • For normal distributions, the mean and the median are equal due to symmetry.

Normal Distribution

  • Characteristics of Normal Distribution:

    • Normal distribution is defined to be symmetrical, single-peaked, and bell-shaped.

    • A specific normal distribution is fully characterized by its mean and standard deviation.

    • Changing the mean affects the location along the axis but does not alter the shape.

    • Changing the standard deviation alters the shape of the curve:

      • A larger standard deviation results in a wider and flatter curve, indicating greater variability in data.

The 68–95–99.7% Rule

  • This rule provides a guideline for the distribution of observations in a normal distribution:

    • Approximately 68% of the observations fall within one standard deviation of the mean ( ext{Mean} ext{ ± } 1 imes ext{SD} ).

    • Approximately 95% of the observations fall within two standard deviations of the mean ( ext{Mean} ext{ ± } 2 imes ext{SD} ).

    • Approximately 99.7% of the observations fall within three standard deviations of the mean ( ext{Mean} ext{ ± } 3 imes ext{SD} ).

    • The total area under any probability distribution curve sums to 100%.

Case Study: Heights of Young Women

  • The height distribution for women aged 18 to 24 is approximately normal with:

    • Mean = 63.7 inches

    • Standard deviation = 2.5 inches

  • Application of the 68–95–99.7% Rule:

    • 68% of data: 63.7 ext{ ± } 2.5 = [61.2, 66.2]

    • 95% of data: 63.7 ext{ ± } (2 imes 2.5) = 63.7 ext{ ± } 5 = [58.7, 68.7]

    • 99.7% of data: 63.7 ext{ ± } (3 imes 2.5) = 63.7 ext{ ± } 7.5 = [56.2, 71.2]

  • Conclusions from Data Analysis:

    • 50% of all young women are taller than 63.7 inches, which is the mean.

    • 34% of young women are within the range from 63.7 inches to 66.2 inches.

    • The segment between the mean and mean + 1 standard deviation shows that 34% are between 63.7 and 66.2 inches, then calculated as:

    • [63.7, 63.7 + 2.5]

Standardized Normal Distribution: Z-Score

  • Z-Score Definition:

    • A z-score standardizes values of a variable for comparison, converting the normal distribution to a standard normal distribution with mean = 0 and standard deviation = 1.

    • The formula for calculating a z-score is:

    • z = rac{ ext{Observation} - ext{Mean}}{ ext{Standard Deviation}}

    • A positive z-score indicates an observation above the mean, while a negative z-score indicates an observation below the mean.

Case Study: ACT versus SAT Scores

  • Performance Comparison:

    • Madison scored 600 on the SAT Mathematics exam.

    • Gabriel scored 21 on the ACT test.

    • SAT scores are normally distributed with:

    • Mean = 500

    • Standard deviation = 100

    • ACT scores are normally distributed with:

    • Mean = 18

    • Standard deviation = 6

  • Z-Score Calculation:

    • Madison's z-score: z_{ ext{Madison}} = rac{600 - 500}{100} = 1

    • Gabriel's z-score: z_{ ext{Gabriel}} = rac{21 - 18}{6} = 0.5

Percentiles of Normal Distributions

  • Percentiles Defined:

    • The median is the 50th percentile.

    • The first and third quartiles are the 25th and 75th percentiles respectively.

  • Finding Percentiles:

    • Percentiles for a specific z-score can be found using statistical tables or software (e.g., R, Python, Excel).

    • A specified percentile (cth percentile) denotes that c percent of observations lie below a given value and the remainder above.

Z-Score Table Examples

  • Example 1: The percentile at a z-score of 1.5 is the 93rd percentile.

  • Example 2: The z-score at the 42nd percentile is -0.2.