Biostatistics: Continuous Probability Distributions
Introduction to Probability Distributions
- Probability distributions are categorized into two main types:
- Discrete Probability Distributions (Discrete PD): Includes the Binomial, Poisson, and Hypergeometric distributions.
- Continuous Probability Distributions (Continuous PD): Includes the Normal (Gaussian), Uniform, and Exponential distributions.
- Discrete Variable Example: Adult male shoe sizes represent discrete values (e.g., whole numbers and half-sizes). In a diagram of discrete values, changing the vertical scale affects the appearance, but the shape and horizontal scale remain the same.
- Probability of Discrete Intervals: The probability of a shoe size taking a value in a specific interval is found by summing the areas of the rectangles over that interval. For example, if we consider the probability of a shoe size being equal to or less than 9 (P≤9), the probability will be lower if the range considered is strictly less than 9.
Continuous Random Variables
- Definition: Continuous random variables are variables that can take any value within a specific interval. Because they are measured rather than counted, all their possible values cannot be listed.
- Common Examples: Measurements such as height, weight, and temperature.
- Probability Calculation:
- Because possible values cannot be listed, the discrete probability method P(X=x) is not applicable for continuous variables.
- For continuous variables, P(X=x)=0. Logic dictates using intervals, such as P(a<X<b), instead of single points.
- As intervals become increasingly small, the area of individual rectangles becomes increasingly smaller until the area must be found under a smooth curve rather than by adding rectangles.
- Probability Density Function (PDF): The curve representing the distribution of a continuous random variable is known as the PDF.
- Key Properties:
- The total area under the probability density curve is equal to 1.
- The likelihood of outcomes within a range is found by calculating the area under the PDF above the interval of interest.
- Example: For foot length (a continuous random variable), the probability of a randomly chosen male having a foot length less than 9 inches (P(X<9)) is the area under the curve to the left of the value 9. Note that P(X<9)=P(X≤9) in continuous distributions.
- Definition: A distribution where all outcomes are equally likely; the probability is proportional to the interval's length.
- Density Curve Shape: Rectangle.
- Probability Density Function:
- f(x)=b−a1 for a<x<b
- f(x)=0 elsewhere.
- a: The smallest value the variable can assume.
- b: The largest value the variable can assume.
- Expected Value (Mean) of X:
- E(X)=2a+b
- Variance of X:
- Var(X)=12(b−a)2
Exponential Distribution
- Definition: Widely used in engineering and science, this distribution describes the time or distance until a specific event happens (based on a Poisson point process).
- Typical Applications: The amount of time until the next earthquake occurs or the duration a car battery lasts.
- Probability Density Function (with parameter λ):
- f(x;λ)=λe−λx for x≥0
- f(x;λ)=0 for x<0
- Parameter λ>0.
- Alternative Form (using mean μ):
- f(x)=μ1e−μx
- Where x≥0 and μ>0. Here, μ is the mean or expected value.
Normal Probability Distribution (Gaussian Distribution)
- Overview: The most widely used distribution in statistics, named after mathematician Karl Friedrich Gauss.
- Determination of Shape and Spread:
- The shape (center) is determined by the mean (μ).
- The spread (width) is determined by the standard deviation (σ).
- Comparison of curves:
- Curves with the same mean but different standard deviations: The curve with the larger σ is wider, flatter, and shows more spread.
- Curves with the same σ but different means: The curves have the same width/shape but are shifted horizontally to different centers.
- Key Characteristics:
- Symmetry: The distribution is symmetric about the mean; the measure of skewness is zero.
- Central Tendency: The highest point is at the mean, which is also equal to the median and the mode (μ=median=mode).
- Mean Values: The mean can be any numerical value: negative, zero, or positive.
- Total Area: Total area under the curve is 1. Specifically, 0.5 lies to the left of the mean and 0.5 lies to the right.
- Inflection Points: These are the points where the slope changes direction. The slope increases to the left of μ−σ, decreases between μ−σ and μ+σ, and starts increasing again to the right of μ+σ.
The Empirical Rule (68-95-99.7 Rule)
- This rule defines the percentage of values within standard deviations of the mean for a normal distribution:
- 68.26% of values are within ±1 standard deviation: P(μ−σ<X<μ+σ)≈0.68
- 95.44% of values are within ±2 standard deviations: P(μ−2σ<X<μ+2σ)≈0.95
- 99.72% of values are within ±3 standard deviations: P(μ−3σ<X<μ+3σ)≈0.997
- Tail Probabilities:
- Probability further than 1σ from the mean: 21−0.68=0.16 in each tail.
- Probability further than 2σ from the mean: 21−0.95=0.025 in each tail.
- Probability further than 3σ from the mean: 21−0.997=0.0015 in each tail.
Standard Normal Probability Distribution
- Definition: A normal distribution with a mean of 0 and a standard deviation of 1.
- The Z-Score: The letter z is used to designate the standard normal random variable. The z-score measures the number of standard deviations a value x is from the mean.
- Conversion Formula:
- Z=σx−μ
- Z-Score Properties:
- Z is negative for values below the mean.
- Z is positive for values above the mean.
Practical Examples and Calculations
- Parameters: Adult male foot length is normally distributed with μ=11 and σ=1.5.
- Example A: Probability of foot length between 8 and 14 inches.
- 8 is at μ−2σ (11−3=8).
- 14 is at μ+2σ (11+3=14).
- Answer: 0.95 or 95%.
- Example B: The interval for a 0.997 probability.
- μ±3σ→11±3(1.5)→11±4.5.
- Answer: Between 6.5 and 15.5 inches.
- Example C: Foot length for which only 2.5% of males are larger.
- This corresponds to the upper tail of 2σ.
- 11+2(1.5)=14.
- Answer: 14 inches.
- Example D (Relative Z-scores):
- Men: μ=11, σ=1.5. Ross's foot = 13.25 inches.
- Ross's Z=1.513.25−11=1.5
- Women: μ=9.5, σ=1.2. Candace's foot = 11.6 inches.
- Candace's Z=1.211.6−9.5=1.75
- Conclusion: Candace's foot is longer relative to her gender (group) because her z-score (1.75) is higher than Ross's (1.5).
Hypertension and Tree Diameter Examples
- Mild Hypertension: Defined as Diastolic Blood Pressure (DBP) between 90 and 100mmHg. Given X∼N(80,12).
- P(90<X<100)=P(1290−80<Z<12100−80)=P(0.83<Z<1.67).
- Using tables: 0.952−0.797=0.155.
- Result: About 15.5% of the population is mild hypertensive.
- Tree Diameter: Mean 8inches, SD 2inches. Find P(X>12).
- P(X>12)=1−P(X<12)=1−P(Z<212−8)=1−P(Z<2).
- Using tables: 1−0.977=0.023.
- Result: 2.3% of trees have an unusually large diameter.
Percentiles
- Formula: x=μ+Zpσ
- DBP Example (μ=80,σ=12):
- Upper 5th percentile (X.05): 80+1.64(12)=99.68.
- Lower 5th percentile (X.95 or designated as the lower threshold): 80−1.64(12)=60.32.
Excel Functions for Standard Normal Distribution
- NORM.S.DIST(z): Computes the cumulative probability given a z-value.
- Example:
=NORM.S.DIST(1) yields approximately 0.8413. - Interval probability:
=NORM.S.DIST(1) - NORM.S.DIST(0) calculates P(0.00≤z≤1.00). - Upper tail probability:
=1 - NORM.S.DIST(1.58) calculates P(z>1.58).
- NORM.S.INV(probability): Computes the z-value given a cumulative probability.
- Example:
=NORM.S.INV(0.9) finds the z-value with 0.10 in the upper tail (1.28). - Example:
=NORM.S.INV(0.975) finds the z-value with 0.025 in the upper tail (1.96). - Example:
=NORM.S.INV(0.025) finds the z-value with 0.025 in the lower tail (−1.96).