Biostatistics: Continuous Probability Distributions

Introduction to Probability Distributions

Probability distributions are categorized into two main types:
- Discrete Probability Distributions (Discrete PD): Includes the Binomial, Poisson, and Hypergeometric distributions.
- Continuous Probability Distributions (Continuous PD): Includes the Normal (Gaussian), Uniform, and Exponential distributions.
Discrete Variable Example: Adult male shoe sizes represent discrete values (e.g., whole numbers and half-sizes). In a diagram of discrete values, changing the vertical scale affects the appearance, but the shape and horizontal scale remain the same.
Probability of Discrete Intervals: The probability of a shoe size taking a value in a specific interval is found by summing the areas of the rectangles over that interval. For example, if we consider the probability of a shoe size being equal to or less than 9 ( $P \leq 9$ ), the probability will be lower if the range considered is strictly less than 9.

Continuous Random Variables

Definition: Continuous random variables are variables that can take any value within a specific interval. Because they are measured rather than counted, all their possible values cannot be listed.
Common Examples: Measurements such as height, weight, and temperature.
Probability Calculation:
- Because possible values cannot be listed, the discrete probability method $P(X = x)$ is not applicable for continuous variables.
- For continuous variables, $P(X = x) = 0$ . Logic dictates using intervals, such as $P(a < X < b)$ , instead of single points.
- As intervals become increasingly small, the area of individual rectangles becomes increasingly smaller until the area must be found under a smooth curve rather than by adding rectangles.
Probability Density Function (PDF): The curve representing the distribution of a continuous random variable is known as the PDF.
Key Properties:
- The total area under the probability density curve is equal to $1$ .
- The likelihood of outcomes within a range is found by calculating the area under the PDF above the interval of interest.
- Example: For foot length (a continuous random variable), the probability of a randomly chosen male having a foot length less than 9 inches ( $P(X < 9)$ ) is the area under the curve to the left of the value 9. Note that $P(X < 9) = P(X \leq 9)$ in continuous distributions.

Uniform Probability Distribution

Definition: A distribution where all outcomes are equally likely; the probability is proportional to the interval's length.
Density Curve Shape: Rectangle.
Probability Density Function:
- $f(x) = \frac{1}{b - a}$ for $a < x < b$
- $f(x) = 0$ elsewhere.
- $a$ : The smallest value the variable can assume.
- $b$ : The largest value the variable can assume.
Expected Value (Mean) of X:
- $E(X) = \frac{a + b}{2}$
Variance of X:
- $\text{Var}(X) = \frac{(b - a)^2}{12}$

Exponential Distribution

Definition: Widely used in engineering and science, this distribution describes the time or distance until a specific event happens (based on a Poisson point process).
Typical Applications: The amount of time until the next earthquake occurs or the duration a car battery lasts.
Probability Density Function (with parameter $\lambda$ ):
- $f(x; \lambda) = \lambda e^{-\lambda x}$ for $x \geq 0$
- $f(x; \lambda) = 0$ for $x < 0$
- Parameter $\lambda > 0$ .
Alternative Form (using mean $\mu$ ):
- $f(x) = \frac{1}{\mu} e^{-\frac{x}{\mu}}$
- Where $x \geq 0$ and $\mu > 0$ . Here, $\mu$ is the mean or expected value.

Normal Probability Distribution (Gaussian Distribution)

Overview: The most widely used distribution in statistics, named after mathematician Karl Friedrich Gauss.
Determination of Shape and Spread:
- The shape (center) is determined by the mean ( $\mu$ ).
- The spread (width) is determined by the standard deviation ( $\sigma$ ).
- Comparison of curves:
  - Curves with the same mean but different standard deviations: The curve with the larger $\sigma$ is wider, flatter, and shows more spread.
  - Curves with the same $\sigma$ but different means: The curves have the same width/shape but are shifted horizontally to different centers.
Key Characteristics:
- Symmetry: The distribution is symmetric about the mean; the measure of skewness is zero.
- Central Tendency: The highest point is at the mean, which is also equal to the median and the mode ( $\mu = \text{median} = \text{mode}$ ).
- Mean Values: The mean can be any numerical value: negative, zero, or positive.
- Total Area: Total area under the curve is $1$ . Specifically, $0.5$ lies to the left of the mean and $0.5$ lies to the right.
- Inflection Points: These are the points where the slope changes direction. The slope increases to the left of $\mu - \sigma$ , decreases between $\mu - \sigma$ and $\mu + \sigma$ , and starts increasing again to the right of $\mu + \sigma$ .

The Empirical Rule (68-95-99.7 Rule)

This rule defines the percentage of values within standard deviations of the mean for a normal distribution:
- $68.26\%$ of values are within $\pm 1$ standard deviation: $P(\mu - \sigma < X < \mu + \sigma) \approx 0.68$
- $95.44\%$ of values are within $\pm 2$ standard deviations: $P(\mu - 2\sigma < X < \mu + 2\sigma) \approx 0.95$
- $99.72\%$ of values are within $\pm 3$ standard deviations: $P(\mu - 3\sigma < X < \mu + 3\sigma) \approx 0.997$
Tail Probabilities:
- Probability further than $1\sigma$ from the mean: $\frac{1 - 0.68}{2} = 0.16$ in each tail.
- Probability further than $2\sigma$ from the mean: $\frac{1 - 0.95}{2} = 0.025$ in each tail.
- Probability further than $3\sigma$ from the mean: $\frac{1 - 0.997}{2} = 0.0015$ in each tail.

Standard Normal Probability Distribution

Definition: A normal distribution with a mean of $0$ and a standard deviation of $1$ .
The Z-Score: The letter $z$ is used to designate the standard normal random variable. The $z$ -score measures the number of standard deviations a value $x$ is from the mean.
Conversion Formula:
- $Z = \frac{x - \mu}{\sigma}$
Z-Score Properties:
- $Z$ is negative for values below the mean.
- $Z$ is positive for values above the mean.

Practical Examples and Calculations

Foot Length Case Study

Parameters: Adult male foot length is normally distributed with $\mu = 11$ and $\sigma = 1.5$ .
Example A: Probability of foot length between 8 and 14 inches.
- 8 is at $\mu - 2\sigma$ ( $11 - 3 = 8$ ).
- 14 is at $\mu + 2\sigma$ ( $11 + 3 = 14$ ).
- Answer: $0.95$ or $95\%$ .
Example B: The interval for a $0.997$ probability.
- $\mu \pm 3\sigma \rightarrow 11 \pm 3(1.5) \rightarrow 11 \pm 4.5$ .
- Answer: Between $6.5$ and $15.5$ inches.
Example C: Foot length for which only $2.5\%$ of males are larger.
- This corresponds to the upper tail of $2\sigma$ .
- $11 + 2(1.5) = 14$ .
- Answer: $14$ inches.
Example D (Relative Z-scores):
- Men: $\mu = 11$ , $\sigma = 1.5$ . Ross's foot = $13.25$ inches.
  - Ross's $Z = \frac{13.25 - 11}{1.5} = 1.5$
- Women: $\mu = 9.5$ , $\sigma = 1.2$ . Candace's foot = $11.6$ inches.
  - Candace's $Z = \frac{11.6 - 9.5}{1.2} = 1.75$
- Conclusion: Candace's foot is longer relative to her gender (group) because her $z$ -score ( $1.75$ ) is higher than Ross's ( $1.5$ ).

Hypertension and Tree Diameter Examples

Mild Hypertension: Defined as Diastolic Blood Pressure (DBP) between $90$ and $100\,mm\,Hg$ . Given $X \sim N(80, 12)$ .
- $P(90 < X < 100) = P(\frac{90-80}{12} < Z < \frac{100-80}{12}) = P(0.83 < Z < 1.67)$ .
- Using tables: $0.952 - 0.797 = 0.155$ .
- Result: About $15.5\%$ of the population is mild hypertensive.
Tree Diameter: Mean $8\,inches$ , SD $2\,inches$ . Find $P(X > 12)$ .
- $P(X > 12) = 1 - P(X < 12) = 1 - P(Z < \frac{12 - 8}{2}) = 1 - P(Z < 2)$ .
- Using tables: $1 - 0.977 = 0.023$ .
- Result: $2.3\%$ of trees have an unusually large diameter.

Percentiles

Formula: $x = \mu + Z_p \sigma$
DBP Example ( $\mu=80, \sigma=12$ ):
- Upper 5th percentile ( $X_{.05}$ ): $80 + 1.64(12) = 99.68$ .
- Lower 5th percentile ( $X_{.95}$ or designated as the lower threshold): $80 - 1.64(12) = 60.32$ .

Excel Functions for Standard Normal Distribution

NORM.S.DIST(z): Computes the cumulative probability given a $z$ -value.
- Example: =NORM.S.DIST(1) yields approximately $0.8413$ .
- Interval probability: =NORM.S.DIST(1) - NORM.S.DIST(0) calculates $P(0.00 \leq z \leq 1.00)$ .
- Upper tail probability: =1 - NORM.S.DIST(1.58) calculates $P(z > 1.58)$ .
NORM.S.INV(probability): Computes the $z$ -value given a cumulative probability.
- Example: =NORM.S.INV(0.9) finds the $z$ -value with $0.10$ in the upper tail ( $1.28$ ).
- Example: =NORM.S.INV(0.975) finds the $z$ -value with $0.025$ in the upper tail ( $1.96$ ).
- Example: =NORM.S.INV(0.025) finds the $z$ -value with $0.025$ in the lower tail ( $-1.96$ ).