Chapter 6.2: Applying the Normal Distribution

Overview of Normal Distribution

The standard normal distribution curve is a foundational concept in statistics, crucial for solving a wide array of practical problems across various fields, including science, social sciences, and engineering.
It is characterized by its bell-shaped and symmetrical curve, where the mean, median, and mode are all equal and located at the center of the distribution.
The total area under the curve is equal to 1, representing 100% of the data.
Requirement: For these methods to be applicable, the variable under consideration must be normally distributed or approximately normally distributed. This is often checked through graphical methods (histograms, Q-Q plots) or statistical tests.
Empirical Rule (68-95-99.7 Rule): For normally distributed data, approximately 68% of the data falls within one standard deviation of the mean (\mu \pm \sigma), 95% within two standard deviations (\mu \pm 2\sigma), and 99.7% within three standard deviations (\mu \pm 3\sigma).

Transformation to Standard Normal Distribution

To effectively analyze and compare data from any normal distribution, it must first be standardized and transformed into a standard normal distribution. This transformation allows us to use a single table (the z-table) to find probabilities and percentiles, regardless of the original mean and standard deviation of the data.
The transformation converts original data values (x) into z-values (also known as z-scores or standard scores) which represent the number of standard deviations an element is from the mean.
- Formula:
  
  z = \frac{x - \mu}{\sigma}
  where:
- z = z-value (standard score), indicating how many standard deviations 'x' is above or below the mean.
- x = value from the original distribution, the raw data point being transformed.
- \mu = mean of the original distribution, representing the central tendency.
- \sigma = standard deviation of the original distribution, measuring the spread of the data.
A positive z-score indicates the value is above the mean, while a negative z-score indicates it is below the mean. A z-score of 0 means the value is exactly at the mean.

Example: Standardized Test Scores

Given: Scores for a standardized test are normally distributed with:
- Mean (\mu) = 100
- Standard deviation (\sigma) = 15
When transformed to z-values, the distributions will align:
- The z-distribution (standard normal distribution) always has a mean of 0 and a standard deviation of 1. This standardization allows for universal comparison.

Finding Areas Under the Curve

Finding the area under the normal curve is equivalent to finding the probability that a randomly selected observation falls within a specified range, or the percentage of data points within that range.
To find the area under any normal curve:
1. Draw the curve and shade the desired area. This visual step helps in understanding the problem and interpreting the z-table results correctly, especially for areas to the right or between two values.
2. Convert x-values to z-values using the transformation formula: z = \frac{x - \mu}{\sigma}. Every x-value must be converted to its corresponding z-score.
3. Find the corresponding area under the standard normal distribution curve. This is typically done using a z-table which provides the cumulative area to the left of a specific z-score. For areas to the right, subtract from 1; for areas between two z-scores, subtract the smaller cumulative area from the larger one.

Example: Liters of Blood

Given Data: For a specific population, the amount of blood in adults is normally distributed with:
- Mean (\mu) = 5.2 liters
- Standard deviation (\sigma) = 0.3 liters
- Problem: Find the percentage of people with less than 4.4 liters of blood.
Process:
- Draw the curve, shade the area to the left of 4.4 liters.
- Conversion to z-value:
  
  z = \frac{4.4 - 5.2}{0.3} = \frac{-0.8}{0.3} \approx -2.67
- Use z-table to find the area corresponding to z = -2.67:
- The area to the left of z = -2.67 is approximately 0.0038.
- Result: This implies that only about 0.38% of people have less than 4.4 liters of blood, which is a very small percentage.

Example: Newspaper Waste

Given Data: The amount of newspaper waste generated by households per month is normally distributed with:
- Mean (\mu) = 28 pounds
- Standard deviation (\sigma) = 2 pounds

Part A: Between 27 and 31 Pounds

Find the probability of a household generating between 27 and 31 pounds of newspaper waste.
Process:
- Convert both 27 and 31 pounds to their respective z-values:
- For 27 pounds:
  
  z_1 = \frac{27 - 28}{2} = \frac{-1}{2} = -0.5
- For 31 pounds:
  
  z_2 = \frac{31 - 28}{2} = \frac{3}{2} = 1.5
- Find the areas to the left of each z-value using the z-table:
- Area left of z = 1.5: 0.9332 (from z-table)
- Area left of z = -0.5: 0.3085 (from z-table)
- To find the area between these two values, subtract the smaller cumulative area from the larger one:
  
  Area = 0.9332 - 0.3085 = 0.6247
- Result: The probability of a household generating between 27 and 31 pounds of newspaper waste is 0.6247, or 62.47%.

Part B: More Than 32 Pounds

Goal: Find the probability of a household generating more than 32 pounds of newspaper waste.
Process:
- Convert 32 pounds to its z-value:
- For 32 pounds:
  
  z = \frac{32 - 28}{2} = \frac{4}{2} = 2
- Find the area left of z = 2 from the z-table (approximately 0.9772).
- Since the problem asks for the area more than 32 pounds (i.e., to the right of z = 2), subtract the cumulative area from 1:
  
  P(X > 32) = 1 - P(X < 32) = 1 - \text{Area left of z=2} = 1 - 0.9772 = 0.0228
- Result: There is approximately a 0.0228 or 2.28% probability that a household generates more than 32 pounds of newspaper waste per month.

Example: Desktop PC Electricity Usage

Given Data: The electricity usage of a certain brand of desktop PCs is normally distributed with:
- Mean (\mu) = 120 watts
- Standard deviation (\sigma) = 6 watts
- Problem: How many out of 500 randomly selected PCs will use less than 106 watts?
Process:
1. Convert 106 watts to its z-value:
  
  z = \frac{106 - 120}{6} = \frac{-14}{6} \approx -2.33
2. Lookup the area to the left of z = -2.33 in the z-table (approximately 0.0099). This represents the proportion of PCs using less than 106 watts.
3. Multiply this proportion by the total number of PCs (500) to find the expected number:
  
  \text{Number of PCs} = 0.0099 \times 500 = 4.95
- Result: Rounding to the nearest whole number (as you can't have a fraction of a PC), approximately 5 PCs out of 500 will use less than 106 watts of electricity.

Finding Data Values Given a Probability

Instead of finding areas (probabilities) from x-values, we can reverse the process to find specific x-values (data points) corresponding to given probabilities or percentiles. This is often used for setting thresholds or cut-off points.
The formula to find 'x' based on a known z-score, mean, and standard deviation is derived by rearranging the z-score formula:

x = z \times \sigma + \mu

Example: Police Academy Qualification

Given Data: Scores on a police academy entrance exam are normally distributed with:
- Mean (\mu) = 200
- Standard deviation (\sigma) = 20
- Requirement: Only the top 10% of applicants qualify.
Process:
1. To find the top 10%, we need to identify the z-score that has 10% of the area to its right. This is equivalent to finding the z-score that has 90% of the area to its left (cumulative area = 0.90).
2. Consult the z-table to find the z-value closest to a cumulative area of 0.90. We find that an area of 0.8997 corresponds to z = 1.28. (Note: 0.9015 corresponds to z = 1.29. We typically choose the closest z-score or interpolate).
3. Apply the formula to find the corresponding x-value (the minimum score needed):
  
  x = 1.28 \times 20 + 200 = 25.6 + 200 = 225.6
- Result: Since scores are typically whole numbers, a score of 226 or higher is needed to qualify for the police academy, ensuring placement within the top 10%.

Example: Blood Pressure Study

Given: Blood pressure readings for a population are normally distributed with:
- Mean (\mu) = 120 mm Hg
- Standard deviation (\sigma) = 8 mm Hg
- Requirement: Find the range of blood pressure readings that constitutes the middle 60% of the population.
Process:
1. If the middle 60% is desired, then 100% - 60% = 40% of the population falls into the two tails (20% in the lower tail and 20% in the upper tail).
2. For the upper threshold: We need the z-value for a cumulative area (from the far left) of 0.20 + 0.60 = 0.80. From the z-table, the z-score corresponding to a cumulative area of 0.80 is approximately z = 0.84.
  - Use formula for x1 (upper limit):
    x1 = 0.84 \times 8 + 120 = 6.72 + 120 = 126.72
3. For the lower threshold: We need the z-value for a cumulative area of 0.20 (the lower 20%). From the z-table, the z-score corresponding to a cumulative area of 0.20 is approximately z = -0.84. (Due to symmetry, if z = 0.84 corresponds to area 0.80, then -z = -0.84 corresponds to area 1 - 0.80 = 0.20).
  - Use formula for x2 (lower limit):
    x2 = -0.84 \times 8 + 120 = -6.72 + 120 = 113.28
- Result: The middle 60% of blood pressure readings for this population falls between approximately 113.28 mm Hg and 126.72 mm Hg.

Classwork

Further practice exercises will be provided to reinforce these concepts and skills.
Questions and clarifications can be directed to the instructional support staff or discussed in class.