Chapter 6.2: Applying the Normal Distribution

Chapter 6.2: Applying the Normal Distribution
Overview of Normal Distribution
  • The standard normal distribution curve is a foundational concept in statistics, crucial for solving a wide array of practical problems across various fields, including science, social sciences, and engineering.

  • It is characterized by its bell-shaped and symmetrical curve, where the mean, median, and mode are all equal and located at the center of the distribution.

  • The total area under the curve is equal to 1, representing 100% of the data.

  • Requirement: For these methods to be applicable, the variable under consideration must be normally distributed or approximately normally distributed. This is often checked through graphical methods (histograms, Q-Q plots) or statistical tests.

  • Empirical Rule (68-95-99.7 Rule): For normally distributed data, approximately 68% of the data falls within one standard deviation of the mean (\mu \pm \sigma), 95% within two standard deviations (\mu \pm 2\sigma), and 99.7% within three standard deviations (\mu \pm 3\sigma).

Transformation to Standard Normal Distribution
  • To effectively analyze and compare data from any normal distribution, it must first be standardized and transformed into a standard normal distribution. This transformation allows us to use a single table (the z-table) to find probabilities and percentiles, regardless of the original mean and standard deviation of the data.

  • The transformation converts original data values (x) into z-values (also known as z-scores or standard scores) which represent the number of standard deviations an element is from the mean.

    • Formula:

      z = \frac{x - \mu}{\sigma}
      where:

    • z = z-value (standard score), indicating how many standard deviations 'x' is above or below the mean.

    • x = value from the original distribution, the raw data point being transformed.

    • \mu = mean of the original distribution, representing the central tendency.

    • \sigma = standard deviation of the original distribution, measuring the spread of the data.

  • A positive z-score indicates the value is above the mean, while a negative z-score indicates it is below the mean. A z-score of 0 means the value is exactly at the mean.

Example: Standardized Test Scores

  • Given: Scores for a standardized test are normally distributed with:

    • Mean (\mu) = 100

    • Standard deviation (\sigma) = 15

  • When transformed to z-values, the distributions will align:

    • The z-distribution (standard normal distribution) always has a mean of 0 and a standard deviation of 1. This standardization allows for universal comparison.

Finding Areas Under the Curve
  • Finding the area under the normal curve is equivalent to finding the probability that a randomly selected observation falls within a specified range, or the percentage of data points within that range.

  • To find the area under any normal curve:

    1. Draw the curve and shade the desired area. This visual step helps in understanding the problem and interpreting the z-table results correctly, especially for areas to the right or between two values.

    2. Convert x-values to z-values using the transformation formula: z = \frac{x - \mu}{\sigma}. Every x-value must be converted to its corresponding z-score.

    3. Find the corresponding area under the standard normal distribution curve. This is typically done using a z-table which provides the cumulative area to the left of a specific z-score. For areas to the right, subtract from 1; for areas between two z-scores, subtract the smaller cumulative area from the larger one.

Example: Liters of Blood

  • Given Data: For a specific population, the amount of blood in adults is normally distributed with:

    • Mean (\mu) = 5.2 liters

    • Standard deviation (\sigma) = 0.3 liters

    • Problem: Find the percentage of people with less than 4.4 liters of blood.

  • Process:

    • Draw the curve, shade the area to the left of 4.4 liters.

    • Conversion to z-value:

      z = \frac{4.4 - 5.2}{0.3} = \frac{-0.8}{0.3} \approx -2.67

    • Use z-table to find the area corresponding to z = -2.67:

    • The area to the left of z = -2.67 is approximately 0.0038.

    • Result: This implies that only about 0.38% of people have less than 4.4 liters of blood, which is a very small percentage.

Example: Newspaper Waste
  • Given Data: The amount of newspaper waste generated by households per month is normally distributed with:

    • Mean (\mu) = 28 pounds

    • Standard deviation (\sigma) = 2 pounds

Part A: Between 27 and 31 Pounds

  • Find the probability of a household generating between 27 and 31 pounds of newspaper waste.

  • Process:

    • Convert both 27 and 31 pounds to their respective z-values:

    • For 27 pounds:

      z_1 = \frac{27 - 28}{2} = \frac{-1}{2} = -0.5

    • For 31 pounds:

      z_2 = \frac{31 - 28}{2} = \frac{3}{2} = 1.5

    • Find the areas to the left of each z-value using the z-table:

    • Area left of z = 1.5: 0.9332 (from z-table)

    • Area left of z = -0.5: 0.3085 (from z-table)

    • To find the area between these two values, subtract the smaller cumulative area from the larger one:

      Area = 0.9332 - 0.3085 = 0.6247

    • Result: The probability of a household generating between 27 and 31 pounds of newspaper waste is 0.6247, or 62.47%.

Part B: More Than 32 Pounds

  • Goal: Find the probability of a household generating more than 32 pounds of newspaper waste.

  • Process:

    • Convert 32 pounds to its z-value:

    • For 32 pounds:

      z = \frac{32 - 28}{2} = \frac{4}{2} = 2

    • Find the area left of z = 2 from the z-table (approximately 0.9772).

    • Since the problem asks for the area more than 32 pounds (i.e., to the right of z = 2), subtract the cumulative area from 1:

      P(X > 32) = 1 - P(X < 32) = 1 - \text{Area left of z=2} = 1 - 0.9772 = 0.0228

    • Result: There is approximately a 0.0228 or 2.28% probability that a household generates more than 32 pounds of newspaper waste per month.

Example: Desktop PC Electricity Usage
  • Given Data: The electricity usage of a certain brand of desktop PCs is normally distributed with:

    • Mean (\mu) = 120 watts

    • Standard deviation (\sigma) = 6 watts

    • Problem: How many out of 500 randomly selected PCs will use less than 106 watts?

  • Process:

    1. Convert 106 watts to its z-value:

      z = \frac{106 - 120}{6} = \frac{-14}{6} \approx -2.33

    2. Lookup the area to the left of z = -2.33 in the z-table (approximately 0.0099). This represents the proportion of PCs using less than 106 watts.

    3. Multiply this proportion by the total number of PCs (500) to find the expected number:

      \text{Number of PCs} = 0.0099 \times 500 = 4.95

    • Result: Rounding to the nearest whole number (as you can't have a fraction of a PC), approximately 5 PCs out of 500 will use less than 106 watts of electricity.

Finding Data Values Given a Probability
  • Instead of finding areas (probabilities) from x-values, we can reverse the process to find specific x-values (data points) corresponding to given probabilities or percentiles. This is often used for setting thresholds or cut-off points.

  • The formula to find 'x' based on a known z-score, mean, and standard deviation is derived by rearranging the z-score formula:

    x = z \times \sigma + \mu

Example: Police Academy Qualification

  • Given Data: Scores on a police academy entrance exam are normally distributed with:

    • Mean (\mu) = 200

    • Standard deviation (\sigma) = 20

    • Requirement: Only the top 10% of applicants qualify.

  • Process:

    1. To find the top 10%, we need to identify the z-score that has 10% of the area to its right. This is equivalent to finding the z-score that has 90% of the area to its left (cumulative area = 0.90).

    2. Consult the z-table to find the z-value closest to a cumulative area of 0.90. We find that an area of 0.8997 corresponds to z = 1.28. (Note: 0.9015 corresponds to z = 1.29. We typically choose the closest z-score or interpolate).

    3. Apply the formula to find the corresponding x-value (the minimum score needed):

      x = 1.28 \times 20 + 200 = 25.6 + 200 = 225.6

    • Result: Since scores are typically whole numbers, a score of 226 or higher is needed to qualify for the police academy, ensuring placement within the top 10%.

Example: Blood Pressure Study

  • Given: Blood pressure readings for a population are normally distributed with:

    • Mean (\mu) = 120 mm Hg

    • Standard deviation (\sigma) = 8 mm Hg

    • Requirement: Find the range of blood pressure readings that constitutes the middle 60% of the population.

  • Process:

    1. If the middle 60% is desired, then 100% - 60% = 40% of the population falls into the two tails (20% in the lower tail and 20% in the upper tail).

    2. For the upper threshold: We need the z-value for a cumulative area (from the far left) of 0.20 + 0.60 = 0.80. From the z-table, the z-score corresponding to a cumulative area of 0.80 is approximately z = 0.84.

      • Use formula for x1 (upper limit):
        x
        1 = 0.84 \times 8 + 120 = 6.72 + 120 = 126.72

    3. For the lower threshold: We need the z-value for a cumulative area of 0.20 (the lower 20%). From the z-table, the z-score corresponding to a cumulative area of 0.20 is approximately z = -0.84. (Due to symmetry, if z = 0.84 corresponds to area 0.80, then -z = -0.84 corresponds to area 1 - 0.80 = 0.20).

      • Use formula for x2 (lower limit):
        x
        2 = -0.84 \times 8 + 120 = -6.72 + 120 = 113.28

    • Result: The middle 60% of blood pressure readings for this population falls between approximately 113.28 mm Hg and 126.72 mm Hg.

Classwork
  • Further practice exercises will be provided to reinforce these concepts and skills.

  • Questions and clarifications can be directed to the instructional support staff or discussed in class.