Continuous Distributions – Normal & Uniform Study Notes

Introduction: Comparing Athletes (Example from Slide 2)

Thiam: gold in Olympics, long jump = $6.58\,\text{m}$ , which is $0.5\,\text{m}$ above the average for women’s heptathlon.
Johnson-Thompson: 200 m time = $23.26\,\text{s}$ , which is $1.5\,\text{s}$ faster (below) than the average.
Question: Whose performance is more impressive? (Motivation: compare performances using standard deviations.)

How Many Standard Deviations Above? (Slide 3)

Given:

Long jump: mean $\mu{\text{long}} = 6.17\,\text{m}$ , standard deviation $\sigma{\text{long}} = 0.247\,\text{m}$ , value $X = 6.58\,\text{m}$
200 m run: mean $\mu{\text{run}} = 24.58\,\text{s}$ , standard deviation $\sigma{\text{run}} = 0.654\,\text{s}$ , value $X = 23.26\,\text{s}$
1 SD above the mean for long jump: $\mu{\text{long}} + \sigma{\text{long}} = 6.17 + 0.247 = 6.417\,\text{m}$
2 SD above the mean for long jump: $\mu{\text{long}} + 2\sigma{\text{long}} = 6.17 + 2\times 0.247 = 6.664\,\text{m}$
- Interpretation: 6.58 m is just under 2 SD above the mean for the long jump.
1 SD below the mean for 200 m run: $\mu{\text{run}} - \sigma{\text{run}} = 24.58 - 0.654 = 23.926\,\text{s}$
2 SD below the mean for 200 m run: $\mu{\text{run}} - 2\sigma{\text{run}} = 24.58 - 2\times 0.654 = 23.272\,\text{s}$
- Interpretation: 23.26 s is just over 2 SD below the mean.
To find how far a value is from the mean in standard deviations (z-score):
- Formula: $z = \frac{X - \mu}{\sigma}$
Calculations (z-scores):
- Long jump: $z_{\text{long}} = \frac{6.58 - 6.17}{0.247} \approx 1.66$
- 200 m run: $z_{\text{run}} = \frac{23.26 - 24.58}{0.654} \approx -2.02$
Conclusion: Magnitude comparison shows the 200 m run value is about $|z| \approx 2.02$ standard deviations from its mean (below), while the long jump value is about $|z| \approx 1.66$ standard deviations from its mean (above). In SD terms, Johnson-Thompson’s performance is farther from the mean in magnitude, suggesting a larger deviation from her mean relative to her distribution.

The z-score (Slide 4)

The z-score measures the distance of a value from the mean in standard deviations.
Interpretations:
- Positive z-score: value is above the mean.
- Negative z-score: value is below the mean.
- Small z-score: value is close to the mean relative to the data set.
- Large z-score: value is far from the mean relative to the data set.
Key formula: $z = \frac{X - \mu}{\sigma}$

The Normal Distribution N and The Standard Normal Distribution Z (Slide 5)

The normal distribution is denoted as $X \sim N(\mu, \sigma^2)$ (bell-shaped, symmetric).
The standard normal distribution is $Z \sim N(0, 1)$ .
Properties:
- Mean = median = mode = $\mu$ for X; location determined by $\mu$ .
- Spread determined by $\sigma$ for X; standard deviation is $\sigma$ .
- The normal distribution is unimodal and symmetric around the mean.

Finding Normal Probabilities (Slide 6)

Example: If $X \sim N(\mu=18, \sigma=5)$ , find P(X < 19.5).
Method:
- Standardize: $z = \frac{19.5 - 18}{5} = 0.3$ .
- So, P(X < 19.5) = P(Z < 0.3).
Software tip: In MINITAB, Graph > Probability Distribution Plot can be used to obtain this probability.

Using Z Score N(0, 1) to get the same probability (Slide 7)

Convert the problem to the standard normal: P(X < 19.5) = P(Z < 0.3) where $Z \sim N(0,1)$ .
Steps in software:
- Graph > Probability Distribution Plot; then View Probability to read off the value.

Finding Normal Upper Tail Probabilities (Slide 8)

To find P(X > 19.5) when $X \sim N(\mu, \sigma)$ :
- Use upper tail: P(X > 19.5) = 1 - P(X \le 19.5) = 1 - P(Z \le 0.3) with $Z \sim N(0,1)$ .

Finding Normal Value given Probability (Slide 9)

Given $X \sim N(\mu=18, \sigma=5)$ , solve for $x$ such that P(X > x) = 0.7.
Equivalent to: $P(X \le x) = 0.3$ .
In standard form: find $z{0.3}$ where $P(Z \le z{0.3}) = 0.3$ , then set
- $x = \mu + \sigma z{0.3} = 18 + 5 z{0.3}$ .
Note: z-values for common probabilities can be looked up in z-tables or computed numerically.

The 68-95-99.7 (Empirical) Rule (Slide 10)

In a unimodal, symmetric (Normal) distribution, approximately:
- 68% of values lie within one standard deviation of the mean: P(\mu - \sigma < X < \mu + \sigma) \approx 0.68
- 95% lie within two standard deviations: P(\mu - 2\sigma < X < \mu + 2\sigma) \approx 0.95
- 99.7% lie within three standard deviations: P(\mu - 3\sigma < X < \mu + 3\sigma) \approx 0.997
Example: If $X \sim N(\mu=60, \sigma=4)$ , then:
- P(56 < X < 64) = P(60-\sigma < X < 60+\sigma) = 0.68
- P(52 < X < 68) = P(60-2\sigma < X < 60+2\sigma) = 0.95
- P(48 < X < 72) = P(60-3\sigma < X < 60+3\sigma) = 0.997
In Z terms: $Z \sim N(0,1)$ , so
- P(-1 < Z < 1) = 0.68
- P(-2 < Z < 2) = 0.95
- P(-3 < Z < 3) = 0.997

The Uniform Distribution (Slide 11)

The uniform distribution assigns equal probability to all outcomes in an interval [a, b].
PDF (probability density function):
- $f(x) = \begin{cases} \dfrac{1}{b-a}, & a \le x \le b \ 0, & \text{otherwise} \end{cases}$
Also called the rectangular distribution.
Example: $X \sim \text{Uniform}(2, 6)$ with parameters: a = 2, b = 6.
Mean and standard deviation:
- $\mu = \dfrac{a+b}{2} = \dfrac{2+6}{2} = 4$
- $\sigma = \sqrt{\dfrac{(b-a)^2}{12}} = \sqrt{\dfrac{(6-2)^2}{12}} = \sqrt{\dfrac{16}{12}} = \sqrt{\dfrac{4}{3}} \approx 1.1547$
Probability for a sub-interval: $P(3 \le X \le 5) = \dfrac{5-3}{6-2} = \dfrac{2}{4} = \dfrac{1}{2}$

Uniform Distribution Using MINITAB (Slide 12)

Steps to find P(3 ≤ X ≤ 5) for X ~ Uniform(2, 6):
- Graph > Probability Distribution Plot > View Probability (MINITAB).

Comparing Performances: Why We Need Standardization

The Big Idea & Purpose: When comparing different types of athletic performances, like a long jump distance and a 200m sprint time, a direct comparison of raw numbers is misleading. We need a way to put them on a common scale to truly assess which performance is relatively more impressive or unusual.
The Challenge: Thiam's long jump is $6.58\text{ m}$ , $0.5\text{ m}$ above average. Johnson-Thompson's 200m time is $23.26\text{ s}$ , $1.5\text{ s}$ faster than average. How do we objectively compare them?
The Solution: We use standard deviations to measure how far each performance deviates from its average, relative to the typical spread of scores in that event. This allows us to compare their "unusualness" on a standardized scale.

The Z-Score: Your Tool for Standardized Comparison

What it is (The Function): The z-score quantifies how many standard deviations an individual data point ( $X$ ) is from the mean ( $\mu$ ) of its distribution. It converts any data point into a standardized unit.
Why it's Useful (The Rationale): It allows us to:
- Compare data points from different distributions (e.g., long jump vs. 200m run).
- Understand exactly how "typical" or "extreme" a single observation is within its own dataset.
How to Calculate (The Formula): $z = \frac{X - \mu}{\sigma}$
- Where:
  - $X$ is the individual data point.
  - $\mu$ is the population mean.
  - $\sigma$ is the population standard deviation.
**How to Interpret (The Thinking Framework):
- Sign:
  - Positive z-score: The value is above the mean.
  - Negative z-score: The value is below the mean.
- Magnitude (Absolute Value $|z|$ ):
  - Small $|z|$ (e.g., between -1 and 1): The value is close to the mean, relatively common or typical.
  - Large $|z|$ (e.g., greater than 2 or less than -2): The value is far from the mean, considered unusual or extreme.
Example Applications:
- For Thiam's long jump ( $X = 6.58\text{ m}$ , $\mu = 6.17\text{ m}$ , $\sigma = 0.247\text{ m}$ ): $z_{\text{long}} = \frac{6.58 - 6.17}{0.247} \approx 1.66$
  - Interpretation: Her jump is $1.66$ standard deviations above the average long jump.
- For Johnson-Thompson's 200m run ( $X = 23.26\text{ s}$ , $\mu = 24.58\text{ s}$ , $\sigma = 0.654\text{ s}$ ): $z_{\text{run}} = \frac{23.26 - 24.58}{0.654} \approx -2.02$
  - Interpretation: Her run is $2.02$ standard deviations below the average run time (faster).
Conclusion from Comparison: The magnitude of Johnson-Thompson's z-score ( $2.02$ ) is greater than Thiam's ( $1.66$ ). This suggests Johnson-Thompson's 200m performance is relatively more unusual or farther from the mean in standard deviation terms, making it arguably more impressive within its own context.

The Normal Distribution: The "Bell Curve" of Data

The Big Idea & Purpose: The normal distribution (often called the "bell curve") is a fundamental and widely observed probability distribution. It models many natural phenomena (e.g., heights, IQ scores, measurement errors) and is crucial for statistical inference because of its predictable shape and properties.
How to Identify & Key Characteristics:
- Shape: Always bell-shaped and perfectly symmetric.
- Unimodal: Has a single peak.
- Central Tendency: The mean ( $\mu$ ), median, and mode are all equal and located at the center of the distribution.
- Parameters: It's completely defined by just two values:
  - Its mean ( $\mu$ ), which determines its location.
  - Its standard deviation ( $\sigma$ ), which determines its spread (how wide or narrow the bell is).
- Notation: $X \sim N(\mu, \sigma^2)$
**The Standard Normal Distribution ( $Z\sim N(0,1)$ ):
- Purpose: This is a special, universal normal distribution with a mean of $0$ and a standard deviation of $1$ . It serves as a reference point for all other normal distributions.
- How it Connects: Any normal distribution ( $X$ ) can be transformed into the standard normal distribution ( $Z$ ) using the z-score formula. This allows us to use standard tables or software to calculate probabilities for any normally distributed variable.

Solving Problems with the Normal Distribution: Probabilities and Values

Finding Normal Probabilities (The "How to Figure Out" for Likelihood)

Goal: To determine the probability that a randomly selected observation from a normal distribution falls within a certain range (e.g., P(X < x), P(X > x), or P(x1 < X < x2)).
The Approach (Applying the Z-score):
1. Standardize: Convert the given $X$ value(s) into z-score(s) using $z = \frac{X - \mu}{\sigma}$ . This effectively translates the problem from your specific normal distribution ( $N(\mu, \sigma^2)$ ) to the standard normal distribution ( $N(0,1)$ ).
2. Look Up/Calculate: Use a z-table, statistical calculator, or software (like MINITAB's Probability Distribution Plot) to find the probability associated with that z-score.
Interpretation: The result, a probability between 0 and 1, tells you the proportion of values in the distribution that are expected to be less than (or greater than, or between) your specified value(s).
Example (Upper Tail Probability): To find P(X > 19.5) if $X \sim N(\mu=18, \sigma=5)$ , first standardize: $z = \frac{19.5 - 18}{5} = 0.3$ . Then, P(X > 19.5) = P(Z > 0.3) = 1 - P(Z \le 0.3).

Finding a Normal Value Given a Probability (The "How to Work Backwards")

Goal: To find the specific data value ( $x$ ) that corresponds to a given probability (e.g., finding the cutoff for the top 10% or the 75th percentile).
**The Approach:
1. Find the Z-score: Use a z-table or software to find the z-score ( $z_{\text{prob}}$ ) that corresponds to the given cumulative probability. (e.g., if you want the value such that P(X > x) = 0.7, this means $P(X \le x) = 0.3$ . Find the z-score associated with a cumulative probability of $0.3$ ).
2. Unstandardize: Convert this z-score back to the original scale of your distribution using the formula: $X = \mu + z_{\text{prob}}\sigma$ .
Interpretation: The calculated $X$ value is the specific point in your distribution that divides the data according to the given probability.

The 68-95-99.7 (Empirical) Rule: A Quick Estimator for Normal Distributions

The Big Idea & Purpose: This rule is a highly useful approximation that provides a quick way to understand the spread and distribution of data in any unimodal, symmetric (especially normal) dataset without needing detailed calculations. It offers a fast "back-of-the-envelope" estimation.
How to Identify & Apply: Use this rule when you have a dataset that appears roughly bell-shaped and symmetric, and you need a quick estimate of proportions within certain distances from the mean.
**The Rule (How to Interpret/Think About It):
- $68\%\text{ Rule:}$ Approximately $68\%$ of the data falls within one standard deviation ( $\sigma$ ) of the mean ($\mu) (i.e., between $\mu - \sigma$ and $\mu + \sigma$ ).
- $95\%\text{ Rule:}$ Approximately $95\%$ of the data falls within two standard deviations ( $\sigma$ ) of the mean ($\mu) (i.e., between $\mu - 2\sigma$ and $\mu + 2\sigma$ ).
- $99.7\%\text{ Rule:}$ Approximately $99.7\%$ of the data falls within three standard deviations ( $\sigma$ ) of the mean ($\mu) (i.e., between $\mu - 3\sigma$ and $\mu + 3\sigma$ ).
Connection to Z-scores: This rule directly maps to z-scores of $\pm 1$ , $\pm 2$ , and $\pm 3$ in the standard normal distribution.

The Uniform Distribution: When All Outcomes Are Equally Likely

The Big Idea & Purpose: Unlike the normal distribution where values cluster around the mean, the uniform distribution models situations where every value within a specific interval has the exact same chance of occurring. It's often used when there's no inherent bias towards any particular outcome in a given range.
How to Identify & Key Characteristics:
- Shape: Its probability density function (PDF) is perfectly flat (rectangular) over a specified interval [a, b]. There are no peaks or valleys.
- Equal Probability: Every single value between 'a' and 'b' has the same probability density.
- Parameters: Defined by its minimum value (a) and maximum value (b).
- Notation: $X \sim \text{Uniform}(a, b)$
**How to Work With It (The Formulas):
- Probability Density Function (PDF):
  $f(x) = \begin{cases} \frac{1}{b-a}, & a \le x \le b \ 0, & \text{otherwise} \end{cases}$
- Mean: $\mu = \frac{a+b}{2}$ (simply the midpoint of the interval).
- Standard Deviation: $\sigma = \sqrt{\frac{(b-a)^2}{12}}$ (quantifies the spread).
- Probability for a Sub-interval ( $P(c \le X \le d)$ ): For any sub-interval $[c, d]$ within $[a, b]$ , the probability is the ratio of the sub-interval's length to the total interval's length:
  $P(c \le X \le d) = \frac{d-c}{b-a}$
Usefulness: Commonly used to model random number generation, waiting times when events occur at a constant rate, or scenarios where there's no a priori reason to favor one outcome over another within a given range.
Example: For $X \sim \text{Uniform}(2, 6)$ , the mean is $4$ , and the probability of $P(3 \le X \le 5)$ is $\frac{5-3}{6-2} = \frac{2}{4} = \frac{1}{2}$ .