Week 3 (biostats) notes
The Normal Distribution (Gaussian)
- Appropriate for continuous data; symmetric, bell-shaped
- Parameters: mean μ (central value), standard deviation σ (spread)
- Notation: X ~ N(μ, σ^2)
Population vs. Sample; Notation conventions
- Greek letters for population parameters: μ (mean), σ (SD), π (proportion)
- Latin letters for sample statistics:
- sample mean:
- sample SD: s
- sample proportion: p̂
- Example mappings: population mean μ, sample mean x̄; population SD σ, sample SD s; population proportion π, sample proportion p̂
Summary Statistics (recap)
- Central tendency: mean, median, mode
- Dispersion: SD, IQR, range
Checking for Normality
- Use histogram or boxplot to assess symmetry
- If data are roughly symmetric with mean ≈ median ≈ mode, favors normality
- Skew indicators:
- Left-skew (negatively skewed): mean ≤ median ≤ mode
- Right-skew (positively skewed): mode ≤ median ≤ mean
Reference Ranges for the Normal Distribution
- 68% within ±1 SD: [μ − σ, μ + σ]
- 95% within ±2 SD: [μ − 2σ, μ + 2σ]
- 99.7% within ±3 SD: [μ − 3σ, μ + 3σ]
68–95–99.7% Rule (summary)
- 68% of observations lie between μ − σ and μ + σ
- 95% lie between μ − 2σ and μ + 2σ
- 99.7% lie between μ − 3σ and μ + 3σ
Standard Normal Distribution and Z-score
- Standard normal: Z ~ N(0,1)
- Z-score:
- Use when reference range is not suitable or when comparing across different scales
- If μ and σ are unknown, use sample mean and sample SD when sample size is large
Probability calculations with the normal distribution
- For any X ~ N(μ, σ^2):
- Probability between a and b:
- Φ(z) is the CDF of the standard normal distribution
- Using the Z-table: to find P(Z > z), compute 1 − Φ(z)
Example: Pre-operative creatinine data (illustrative of distribution comparisons)
- Normal-like vs skewed groups can be compared using mean/median and IQR
- No dialysis (roughly symmetric): mean ≈ median
- Yes dialysis (right-skewed): mean > median; report using median and IQR for skewed data
Worked example: Intracranial pressure (ICP) data
- Given: ICP ~ N(μ = 20, σ = 5)
- a) Median ICP? → 20 mmHg (Normally distributed: mean = median)
- b) P(15 ≤ ICP ≤ 25)? → P(−1 ≤ Z ≤ 1) = 0.68
- c) 95% limits? → [μ − 2σ, μ + 2σ] = [10, 30] mmHg
- d) P(ICP > 27.5)?
- Z for 27.5 = (27.5 − 20) / 5 = 1.5
- P(Z > 1.5) ≈ 0.0668 → about 6.68%
- Stepwise approach: diagram → compute Z → use standard normal table
Practical notes on using the standard normal table
- Always start with a diagram to visualize regions
- Z-table often provides area to the left; convert to the area of interest accordingly
- For Z = 1.5, P(Z > 1.5) ≈ 0.0668
- If needed, compute probabilities by transforming to Z and using Φ or the table
Quick reference formulas
- Normal:
- Z-score:
- P(a ≤ X ≤ b) =
- 68–95–99.7% rule: within ±σ, ±2σ, ±3σ respectively
- 95% limits: μ ± 2σ; 99.7% limits: μ ± 3σ
Tools and resources mentioned
- GraphPad Prism for histograms/box plots and basic tests
- Useful for creating graphs and performing simple statistical tests
Takeaway for quick recall
- Normal distribution is symmetric with μ and σ as core parameters
- Use Z-scores to compare values across different scales
- When data are skewed, report median and IQR; use mean/SD for roughly normal data
- For ICP-style data, you can answer: median, probability within a range, and 95% limits using μ and σ