Biostatistics: Continuous Probability Distributions

Introduction to Probability Distributions

  • Probability distributions are categorized into two main types:
    • Discrete Probability Distributions (Discrete PD): Includes the Binomial, Poisson, and Hypergeometric distributions.
    • Continuous Probability Distributions (Continuous PD): Includes the Normal (Gaussian), Uniform, and Exponential distributions.
  • Discrete Variable Example: Adult male shoe sizes represent discrete values (e.g., whole numbers and half-sizes). In a diagram of discrete values, changing the vertical scale affects the appearance, but the shape and horizontal scale remain the same.
  • Probability of Discrete Intervals: The probability of a shoe size taking a value in a specific interval is found by summing the areas of the rectangles over that interval. For example, if we consider the probability of a shoe size being equal to or less than 9 (P9P \leq 9), the probability will be lower if the range considered is strictly less than 9.

Continuous Random Variables

  • Definition: Continuous random variables are variables that can take any value within a specific interval. Because they are measured rather than counted, all their possible values cannot be listed.
  • Common Examples: Measurements such as height, weight, and temperature.
  • Probability Calculation:
    • Because possible values cannot be listed, the discrete probability method P(X=x)P(X = x) is not applicable for continuous variables.
    • For continuous variables, P(X=x)=0P(X = x) = 0. Logic dictates using intervals, such as P(a<X<b)P(a < X < b), instead of single points.
    • As intervals become increasingly small, the area of individual rectangles becomes increasingly smaller until the area must be found under a smooth curve rather than by adding rectangles.
  • Probability Density Function (PDF): The curve representing the distribution of a continuous random variable is known as the PDF.
  • Key Properties:
    • The total area under the probability density curve is equal to 11.
    • The likelihood of outcomes within a range is found by calculating the area under the PDF above the interval of interest.
    • Example: For foot length (a continuous random variable), the probability of a randomly chosen male having a foot length less than 9 inches (P(X<9)P(X < 9)) is the area under the curve to the left of the value 9. Note that P(X<9)=P(X9)P(X < 9) = P(X \leq 9) in continuous distributions.

Uniform Probability Distribution

  • Definition: A distribution where all outcomes are equally likely; the probability is proportional to the interval's length.
  • Density Curve Shape: Rectangle.
  • Probability Density Function:
    • f(x)=1baf(x) = \frac{1}{b - a} for a<x<ba < x < b
    • f(x)=0f(x) = 0 elsewhere.
    • aa: The smallest value the variable can assume.
    • bb: The largest value the variable can assume.
  • Expected Value (Mean) of X:
    • E(X)=a+b2E(X) = \frac{a + b}{2}
  • Variance of X:
    • Var(X)=(ba)212\text{Var}(X) = \frac{(b - a)^2}{12}

Exponential Distribution

  • Definition: Widely used in engineering and science, this distribution describes the time or distance until a specific event happens (based on a Poisson point process).
  • Typical Applications: The amount of time until the next earthquake occurs or the duration a car battery lasts.
  • Probability Density Function (with parameter λ\lambda):
    • f(x;λ)=λeλxf(x; \lambda) = \lambda e^{-\lambda x} for x0x \geq 0
    • f(x;λ)=0f(x; \lambda) = 0 for x<0x < 0
    • Parameter λ>0\lambda > 0.
  • Alternative Form (using mean μ\mu):
    • f(x)=1μexμf(x) = \frac{1}{\mu} e^{-\frac{x}{\mu}}
    • Where x0x \geq 0 and μ>0\mu > 0. Here, μ\mu is the mean or expected value.

Normal Probability Distribution (Gaussian Distribution)

  • Overview: The most widely used distribution in statistics, named after mathematician Karl Friedrich Gauss.
  • Determination of Shape and Spread:
    • The shape (center) is determined by the mean (μ\mu).
    • The spread (width) is determined by the standard deviation (σ\sigma).
    • Comparison of curves:
      • Curves with the same mean but different standard deviations: The curve with the larger σ\sigma is wider, flatter, and shows more spread.
      • Curves with the same σ\sigma but different means: The curves have the same width/shape but are shifted horizontally to different centers.
  • Key Characteristics:
    • Symmetry: The distribution is symmetric about the mean; the measure of skewness is zero.
    • Central Tendency: The highest point is at the mean, which is also equal to the median and the mode (μ=median=mode\mu = \text{median} = \text{mode}).
    • Mean Values: The mean can be any numerical value: negative, zero, or positive.
    • Total Area: Total area under the curve is 11. Specifically, 0.50.5 lies to the left of the mean and 0.50.5 lies to the right.
    • Inflection Points: These are the points where the slope changes direction. The slope increases to the left of μσ\mu - \sigma, decreases between μσ\mu - \sigma and μ+σ\mu + \sigma, and starts increasing again to the right of μ+σ\mu + \sigma.

The Empirical Rule (68-95-99.7 Rule)

  • This rule defines the percentage of values within standard deviations of the mean for a normal distribution:
    • 68.26%68.26\% of values are within ±1\pm 1 standard deviation: P(μσ<X<μ+σ)0.68P(\mu - \sigma < X < \mu + \sigma) \approx 0.68
    • 95.44%95.44\% of values are within ±2\pm 2 standard deviations: P(μ2σ<X<μ+2σ)0.95P(\mu - 2\sigma < X < \mu + 2\sigma) \approx 0.95
    • 99.72%99.72\% of values are within ±3\pm 3 standard deviations: P(μ3σ<X<μ+3σ)0.997P(\mu - 3\sigma < X < \mu + 3\sigma) \approx 0.997
  • Tail Probabilities:
    • Probability further than 1σ1\sigma from the mean: 10.682=0.16\frac{1 - 0.68}{2} = 0.16 in each tail.
    • Probability further than 2σ2\sigma from the mean: 10.952=0.025\frac{1 - 0.95}{2} = 0.025 in each tail.
    • Probability further than 3σ3\sigma from the mean: 10.9972=0.0015\frac{1 - 0.997}{2} = 0.0015 in each tail.

Standard Normal Probability Distribution

  • Definition: A normal distribution with a mean of 00 and a standard deviation of 11.
  • The Z-Score: The letter zz is used to designate the standard normal random variable. The zz-score measures the number of standard deviations a value xx is from the mean.
  • Conversion Formula:
    • Z=xμσZ = \frac{x - \mu}{\sigma}
  • Z-Score Properties:
    • ZZ is negative for values below the mean.
    • ZZ is positive for values above the mean.

Practical Examples and Calculations

Foot Length Case Study
  • Parameters: Adult male foot length is normally distributed with μ=11\mu = 11 and σ=1.5\sigma = 1.5.
  • Example A: Probability of foot length between 8 and 14 inches.
    • 8 is at μ2σ\mu - 2\sigma (113=811 - 3 = 8).
    • 14 is at μ+2σ\mu + 2\sigma (11+3=1411 + 3 = 14).
    • Answer: 0.950.95 or 95%95\%.
  • Example B: The interval for a 0.9970.997 probability.
    • μ±3σ11±3(1.5)11±4.5\mu \pm 3\sigma \rightarrow 11 \pm 3(1.5) \rightarrow 11 \pm 4.5.
    • Answer: Between 6.56.5 and 15.515.5 inches.
  • Example C: Foot length for which only 2.5%2.5\% of males are larger.
    • This corresponds to the upper tail of 2σ2\sigma.
    • 11+2(1.5)=1411 + 2(1.5) = 14.
    • Answer: 1414 inches.
  • Example D (Relative Z-scores):
    • Men: μ=11\mu = 11, σ=1.5\sigma = 1.5. Ross's foot = 13.2513.25 inches.
      • Ross's Z=13.25111.5=1.5Z = \frac{13.25 - 11}{1.5} = 1.5
    • Women: μ=9.5\mu = 9.5, σ=1.2\sigma = 1.2. Candace's foot = 11.611.6 inches.
      • Candace's Z=11.69.51.2=1.75Z = \frac{11.6 - 9.5}{1.2} = 1.75
    • Conclusion: Candace's foot is longer relative to her gender (group) because her zz-score (1.751.75) is higher than Ross's (1.51.5).
Hypertension and Tree Diameter Examples
  • Mild Hypertension: Defined as Diastolic Blood Pressure (DBP) between 9090 and 100mmHg100\,mm\,Hg. Given XN(80,12)X \sim N(80, 12).
    • P(90<X<100)=P(908012<Z<1008012)=P(0.83<Z<1.67)P(90 < X < 100) = P(\frac{90-80}{12} < Z < \frac{100-80}{12}) = P(0.83 < Z < 1.67).
    • Using tables: 0.9520.797=0.1550.952 - 0.797 = 0.155.
    • Result: About 15.5%15.5\% of the population is mild hypertensive.
  • Tree Diameter: Mean 8inches8\,inches, SD 2inches2\,inches. Find P(X>12)P(X > 12).
    • P(X>12)=1P(X<12)=1P(Z<1282)=1P(Z<2)P(X > 12) = 1 - P(X < 12) = 1 - P(Z < \frac{12 - 8}{2}) = 1 - P(Z < 2).
    • Using tables: 10.977=0.0231 - 0.977 = 0.023.
    • Result: 2.3%2.3\% of trees have an unusually large diameter.
Percentiles
  • Formula: x=μ+Zpσx = \mu + Z_p \sigma
  • DBP Example (μ=80,σ=12\mu=80, \sigma=12):
    • Upper 5th percentile (X.05X_{.05}): 80+1.64(12)=99.6880 + 1.64(12) = 99.68.
    • Lower 5th percentile (X.95X_{.95} or designated as the lower threshold): 801.64(12)=60.3280 - 1.64(12) = 60.32.

Excel Functions for Standard Normal Distribution

  • NORM.S.DIST(z): Computes the cumulative probability given a zz-value.
    • Example: =NORM.S.DIST(1) yields approximately 0.84130.8413.
    • Interval probability: =NORM.S.DIST(1) - NORM.S.DIST(0) calculates P(0.00z1.00)P(0.00 \leq z \leq 1.00).
    • Upper tail probability: =1 - NORM.S.DIST(1.58) calculates P(z>1.58)P(z > 1.58).
  • NORM.S.INV(probability): Computes the zz-value given a cumulative probability.
    • Example: =NORM.S.INV(0.9) finds the zz-value with 0.100.10 in the upper tail (1.281.28).
    • Example: =NORM.S.INV(0.975) finds the zz-value with 0.0250.025 in the upper tail (1.961.96).
    • Example: =NORM.S.INV(0.025) finds the zz-value with 0.0250.025 in the lower tail (1.96-1.96).