7.2 - The Normal Distribution and Z-Scores
Normal Distribution and Z-Scores
Terms
Normal Distribution: A probability distribution around a central value, forming a bell-like shape.
- The "tails" extend infinitely, but in reality, data far from the center has near-zero probability.
- Sample data clusters around the central peak, creating a bell curve.
- Probabilities for continuous distributions are calculated by finding the area under a graph for a range of values, while discrete distributions use counting techniques.
Normal Distribution (definition 2): A common continuous probability distribution where data is symmetrically distributed and unimodal about the mean.
- Example: Student grade averages in a class.
Positively Skewed:
- Not symmetric, unimodal.
- Tail pulled to the right.
- More values are below the mean.
- Example: Number of children in Canadian families.
Negatively Skewed:
- Not symmetric, unimodal.
- Tail pulled to the left.
- More values are above the mean.
- Example: Speed of a car before it loses control.
Modes:
- A distribution with two "humps" is called bimodal.
- Occurs when a population consists of two groups with different attributes.
- The mean falls between the two modes.
- Example: Adult shoe sizes (men and women).
Example 1: Use a Frequency Distribution to Estimate Probabilities
- FruityFizz Soft Drinks bottles its products in containers marked 500ml. A sample of 200 bottles was tested.
| Volume (mL) | Frequency, f | Relative Freq. rf |
|---|---|---|
| 490-492 | 0 | 0/200=0 |
| 492-494 | 0 | 0/200=0 |
| 494-496 | 2 | 2/200 = 0.01 |
| 496-498 | 11 | 11/200 = 0.055 |
| 498-500 | 43 | 43/200 = 0.215 |
| 500-502 | 81 | 81/200 = 0.405 |
| 502-504 | 48 | 48/200 = 0.240 |
| 504-506 | 14 | 14/200 = 0.070 |
| 506-508 | 1 | 1/200 = 0.05 |
| 508-510 | 0 | 0/200 = 0 |
- a. Add a relative frequency column to the table.
- Relative frequency is calculated by dividing each frequency by the total number of bottles tested.
- rf = \frac{frequency}{Total}
- b. Use the table to determine the probability that a given bottle will contain less than 500mL of soft drink.
- Add the relative frequencies for volumes less than 500 mL:
- P(less \ than \ 500 \ mL) = 0 + 0 + 0.01 + 0.055 + 0.215 = 0.280
- c. Use the table to determine the probability that a given bottle will contain between 498mL and 502mL of soft drink.
- P(498 \ mL ≤ x ≤ 502 \ mL) = 0.215 + 0.405 = 0.620
- d. Is it possible to determine the probability that a given bottle will contain exactly 500mL of soft drink using the table? Explain.
- No, the table cannot determine the probability of a single value because the interval width is 0, resulting in a probability of 0.
- e. Below you will find two graphs:
- İ. First one represents the Frequency Data
- ii. Second on represents the Relative Frequency Data
- How does the shape of the graph of the Frequency Data compare to the shape of the graph of the Relative Frequency Data?
- Both graphs have the same shape (resembles a "bell curve").
- f. Can you use the area under the Relative Frequency graph to answer parts b) and c)? Explain.
- No, the area under the relative frequency graph is not equal to 1 (total).
- A probability density graph is needed where the total area under the curve equals 1. (probability density).
Note: If a variable is expected to follow a normal distribution, a representative sample can be taken. The mean and standard deviation of the sample can approximate the mean and standard deviation of the underlying normal distribution. The approximation becomes more accurate with more data.
Normal Distribution
The bell curve is a common continuous distribution example, often found in physical, social, and psychological sciences, hence the name "normal distribution."
In normal distributions, data is symmetrically distributed about the mean.
A population following a normal distribution is described by its mean, \mu, and standard deviation, \sigma.
The mean, median, and mode are all equal.
A smaller standard deviation results in a narrower graph.
Only 3 out of 1000 points fall outside 3 standard deviations from the center line.
The image shows the percentage of data within each standard deviation:
- 68% within 1 standard deviation.
- 95% within 2 standard deviations.
- 99.7% within 3 standard deviations.
Example 2: Spot Landing Contest (Determining Mean, Standard Deviation, and Z-Scores for Continuous Data)
Forty aircraft (n=40) participated in a spot landing contest. The landing zone was 30 m long, with the target line at the 15 m mark. The touchdown positions are expected to follow a normal distribution.
a. Determine the mean and standard deviation of the spot landing data.
- \Sigma x = 585.2
- \Sigma x^2 = 8975.68
- Mean: \bar{x} = \frac{\Sigma x}{n} = \frac{585.2}{40} = 14.63 m
Standard deviation:
- s = \sqrt{\frac{\Sigma x^2 - n \cdot (\bar{x})^2}{n-1}} = \sqrt{\frac{8975.68 - 40 \cdot (14.63)^2}{40-1}} = 3.259 m
b. What is the z-score for a pilot who lands her plane at a position of 18.3 m?
- z = \frac{x - \bar{x}}{s} = \frac{18.3 - 14.63}{3.259} = 1.12
c. What is the probability that a pilot lands at a position of 18.3 m or less? What is the probability that it lands at a position of more than 18.3 m?
- P(X \leq 18.3) = 0.8686 (from table)
- P(X > 18.3) = 1 - P(X \leq 18.3) = 1 - 0.8686 = 0.1314
d. What is the probability that a pilot lands at a position between 12.2 m and 18.3 m?
- P(12.2 \leq X \leq 18.3) = P(X \leq 18.3) - P(X \leq 12.2)
- Find z-score for 12.2:
- z = \frac{12.2 - 14.63}{3.259} = -0.75
- P(X \leq 12.2) = 0.2266 (from table)
- P(12.2 \leq X \leq 18.3) = 0.8686 - 0.2266 = 0.642
- The probability that a pilot lands at a position between 12.2 and 18.3 is 0.642.
Standard Normal Distribution Table
- Table values represent the area to the LEFT of the z-score.
Table Provided with Z scores and respective P(X ≤ x) values.
Standard Normal Distribution Tables
- Table values represent the area to the LEFT of the z-score.
Table Provided with Z scores and respective P(X ≤ x) values