Chapter 5: The Normal Distribution and Other Continuous Distributions

Chapter 5 Learning Objectives

  • Compute probabilities from the normal distribution and understand its characteristics.

  • Utilize the normal distribution to solve complex business problems.

  • Determine if a dataset is approximately normally distributed using normal probability plots.

  • Compute probabilities from the uniform distribution.

  • Compute probabilities from the exponential distribution.

Continuous Probability Distributions

  • Definition: A continuous random variable can assume any value within a defined interval or on a continuum (uncountable number of values).

  • Real-World Examples:     * Thickness of a manufactured item.     * Time required to complete a specific task.     * Temperature of a chemical solution.     * Height measured in inches.

  • Variable values depend solely on the ability to measure precisely and accurately.

  • Cumulative Distribution Function (CDF): Let F(x)F(x) be the CDF for a continuous random variable XX. It expresses the probability that XX does not exceed a value xx:     * F(x)=P(Xx)F(x) = P(X \leq x)     * For two possible values aa and bb where a < b, the probability that XX lies between them is: P(a < X < b) = F(b) - F(a).

  • Probability Density Function (PDF): The density function f(x)f(x) for random variable XX has the following core properties:     1. f(x)0f(x) \geq 0 for all values of xx.     2. The total area under the probability density function f(x)f(x) over all possible values of XX equals 1.01.0.     3. The probability that XX lies between two values is the area under the density graph between those values.     4. The cumulative density function F(x0)F(x_0) is defined as the area under the PDF from the minimum value up to x0x_0:         * F(x0)=xmx0f(x)dxF(x_0) = \int_{x_{m}}^{x_0} f(x)\,dx         * Where xmx_{m} is the minimum value of XX.     5. The probability of any individual point is always zero: P(X=a)=0P(X = a) = 0. Therefore: P(a \leq X \leq b) = P(a < X < b).

The Normal Distribution

  • Characteristics:     * Bell-shaped and perfectly symmetrical around the mean.     * The Mean, Median, and Mode are all equal.     * Infinite theoretical range spanning from -\infty to ++\infty.

  • Parameters:     * Mean (μ\mu): Determines the location (center) of the distribution. Shifting μ\mu moves the distribution left or right.     * Standard Deviation (σ\sigma): Determines the spread (width). Increasing σ\sigma increases the spread and flattens the curve.

  • Normal Density Function Formula:     * f(x)=12πσe12(xμσ)2f(x) = \frac{1}{\sqrt{2\pi}\sigma} e^{-\frac{1}{2}(\frac{x-\mu}{\sigma})^2}     * e2.71828e \approx 2.71828 (mathematical constant).     * π3.14159\pi \approx 3.14159 (mathematical constant).     * xx = any value of the continuous variable.

  • Standardized Normal Distribution (Z):     * Any normal distribution can be transformed into the standardized normal distribution.     * Mean μ=0\mu = 0.     * Standard Deviation σ=1\sigma = 1.     * Translation formula (Z-score): Z=XμσZ = \frac{X - \mu}{\sigma}.     * Z-values specify the number of standard deviations a value is from the mean. Positive Z-values are above the mean, and negative Z-values are below the mean.

  • Z-score Example: If XN(100,502)X \sim N(100, 50^2), for X=200X = 200:     * Z=20010050=2.0Z = \frac{200 - 100}{50} = 2.0     * This indicates the value is 22 standard deviations (increments of 5050) above the mean of 100100.

Probability Calculations Using the Z-Table

  • Cumulative Table: The Standardized Normal Table (e.g., Appendix E.2) provides probabilities for the area to the left of a specific Z-score (from \infty to ZZ).     * Example: P(Z < 2.00) = 0.9772.

  • General Procedure:     1. Draw the normal curve for the problem in terms of XX.     2. Translate XX-values to ZZ-values.     3. Use the Z-table to find the required area.

  • Download Time Example:     * Scenario: Mean download time μ=18.0s\mu = 18.0\,\text{s}, σ=5.0s\sigma = 5.0\,\text{s}.     * Problem: Find P(X < 18.6).     * Step: Z=18.618.05.0=0.12Z = \frac{18.6 - 18.0}{5.0} = 0.12.     * Result: P(Z < 0.12) = 0.5478.

  • Upper Tail Probabilities: To find P(X > 18.6), use the complement rule:     * P(Z > 0.12) = 1.0 - P(Z \leq 0.12) = 1.0 - 0.5478 = 0.4522.

  • Probabilities Between Two Values: To find P(18 < X < 18.6), calculate Z-scores for both points:     * Z1=18185=0Z_1 = \frac{18-18}{5} = 0; Z2=18.6185=0.12Z_2 = \frac{18.6-18}{5} = 0.12.     * P(0 < Z < 0.12) = P(Z < 0.12) - P(Z < 0) = 0.5478 - 0.5000 = 0.0478.

  • Finding X for a Known Probability: Given a probability, find the corresponding XX value.     1. Find the Z-value for the known probability in the table.     2. Convert to units of XX using the formula: X=μ+ZσX = \mu + Z\sigma.     3. Example: Find XX such that 20%20\% of download times are less than XX.        * P(Z < ?) = 0.20 \rightarrow Z = -0.84.        * X=18.0+(0.84)(5.0)=18.04.2=13.8sX = 18.0 + (-0.84)(5.0) = 18.0 - 4.2 = 13.8\,\text{s}.

Evaluating Normality

  • Theoretical Properties: Normal data should be bell-shaped (symmetrical), follow the empirical rule, have an Interquartile Range (IQR) 1.33σ\approx 1.33\sigma, and a range 6σ\approx 6\sigma.

  • Visual Assessment:     * Small data: Use stem-and-leaf displays or boxplots to check symmetry.     * Large data: Check histograms or polygons for bell shape.

  • Descriptive Measures: Compare mean, median, and mode for similarity.

  • Normal Probability Plot (Q-Q Plot):     * Data is arranged into an ordered array.     * Standardized normal quantile values (Z) are calculated.     * Observed data (XX) is plotted on the vertical axis against quantile values (ZZ) on the horizontal axis.     * Linearity indicates a normal distribution. Nonlinear/curved plots indicate deviations like left-skew, right-skew, or rectangular distributions.

The Uniform Distribution

  • Definition: Outcomes are equally likely over a given range (rectangular distribution).

  • Density Function:     * f(x)={1baamp;aXb 0amp;otherwisef(x) = \begin{cases} \frac{1}{b-a} &amp; a \leq X \leq b \ 0 &amp; \text{otherwise} \end{cases}

  • Summary Measures:     * Mean: μ=a+b2\mu = \frac{a+b}{2}     * Standard Deviation: σ=(ba)212\sigma = \sqrt{\frac{(b-a)^2}{12}}

  • Example (Range 2 to 6):     * f(x)=162=0.25f(x) = \frac{1}{6-2} = 0.25     * μ=2+62=4\mu = \frac{2+6}{2} = 4     * σ=16121.1547\sigma = \sqrt{\frac{16}{12}} \approx 1.1547     * Probability calculation P(3X5)=(Base)×(Height)=(53)×0.25=0.5P(3 \leq X \leq 5) = (\text{Base}) \times (\text{Height}) = (5-3) \times 0.25 = 0.5.

The Exponential Distribution

  • Definition: Often used to model time between occurrences/arrivals.

  • Probability Density Function:     * f(x)=λeλXf(x) = \lambda e^{-\lambda X} for X > 0     * λ\lambda = mean number of arrivals per unit time.

  • Summary Measures:     * Mean time between arrivals (μ\mu) = 1λ\frac{1}{\lambda}.     * Standard deviation (σ\sigma) = 1λ\frac{1}{\lambda}.

  • Cumulative Probability: Probability that arrival time is less than specified time xx:     * P(\text{arrival time} < x) = 1 - e^{-\lambda x}.

  • Example: Arrivals at 1515 per hour (λ=15\lambda = 15). Probability between customers is less than 33 minutes (0.050.05 hours):     * P(X < 0.05) = 1 - e^{-(15)(0.05)} = 1 - e^{-0.75} \approx 0.5276.

Normal Approximation of the Binomial Distribution

  • Rationale: Binomial calculations become tedious as nn grows large. A normal distribution with the same mean and standard deviation can approximate it.

  • Requirements: Approximation is valid if nπ5n\pi \geq 5 and n(1π)5n(1-\pi) \geq 5.

  • Parameters:     * μ=np\mu = np     * σ=np(1p)\sigma = \sqrt{np(1-p)}

  • Continuity Adjustment: Since binomial is discrete and normal is continuous, adjust points into intervals.     * P(X=k)P(X = k) becomes P(k - 0.5 < W < k + 0.5).     * P(X25)P(X \geq 25) becomes P(W > 24.5).

  • Example: tire production (n=1600n = 1600, defect rate p=0.08p = 0.08). Probability of 150150 or fewer defects:     * μ=1600×0.08=128\mu = 1600 \times 0.08 = 128     * σ=1600×0.08×0.92=10.85\sigma = \sqrt{1600 \times 0.08 \times 0.92} = 10.85     * P(X150.5)Z=150.512810.852.07P(X \leq 150.5) \rightarrow Z = \frac{150.5 - 128}{10.85} \approx 2.07     * P(Z < 2.07) = 0.9808.

Joint Distributions and Sums of Variables

  • Joint CDF: Defines the probability that variables X1,X2XkX_1, X_2 \dots X_k are simultaneously less than specified values: F(x_1, \dots, x_k) = P(X_1 < x_1 \cap \dots \cap X_k < x_k).

  • Independence: Random variables are independent if and only if F(x1,,xk)=F(x1)F(x2)F(xk)F(x_1, \dots, x_k) = F(x_1)F(x_2)\dots F(x_k).

  • Covariance and Correlation:     * Cov(X,Y)=E[(Xμx)(Yμy)]=E(XY)μxμyCov(X, Y) = E[(X - \mu_x)(Y - \mu_y)] = E(XY) - \mu_x\mu_y.     * If X,YX, Y are independent, Cov(X,Y)=0Cov(X, Y) = 0.     * p=Corr(X,Y)=Cov(X,Y)σxσyp = Corr(X, Y) = \frac{Cov(X, Y)}{\sigma_x\sigma_y}.

  • Sums and Differences:     * E(X±Y)=μx±μyE(X \pm Y) = \mu_x \pm \mu_y.     * Var(X±Y)=σx2+σy2±2Cov(X,Y)Var(X \pm Y) = \sigma_x^2 + \sigma_y^2 \pm 2Cov(X, Y).

  • Linear Combinations: For W=aX+bYW = aX + bY:     * μw=aμx+bμy\mu_w = a\mu_x + b\mu_y     * σw2=a2σx2+b2σy2+2abCov(X,Y)\sigma_w^2 = a^2\sigma_x^2 + b^2\sigma_y^2 + 2abCov(X, Y).

Applied Problems and Exercises

  • Coffee Shop Staffing: μ=150\mu = 150, σ=20\sigma = 20.     1. Quiet Day (X < 130): Z=1.0P0.1587Z = -1.0 \rightarrow P \approx 0.1587.     2. Busy Day (X > 180): Z = 1.5 \rightarrow P(Z > 1.5) = 1 - 0.9332 = 0.0668.     3. Average Day (150 < X < 170): Z1=0,Z2=1.00.84130.5000=0.3413Z_1 = 0, Z_2 = 1.0 \rightarrow 0.8413 - 0.5000 = 0.3413.

  • Mathematics Test: μ=65,σ=12\mu = 65, \sigma = 12.     * Percentage below score 8383: Z=1.593.32%Z = 1.5 \rightarrow 93.32\%.     * Top 10%10\% distinction: Z=1.28X=65+1.28(12)=80.36Z = 1.28 \rightarrow X = 65 + 1.28(12) = 80.36.

  • Cholesterol Classification: μ=190,σ=35\mu = 190, \sigma = 35.     * High cholesterol (X > 240): Z=1.43P=0.0764Z = 1.43 \rightarrow P = 0.0764.     * Conditional: P(X240X200)=0.076410.61260.197P(X \geq 240 | X \geq 200) = \frac{0.0764}{1 - 0.6126} \approx 0.197.

  • Battery Lifetime: μ=500,σ=40 cycles\mu = 500, \sigma = 40\ cycles.     * Warranty life for failure rate < 2\%: P(Z < ?) = 0.02 \rightarrow Z \approx -2.05 \rightarrow X \approx 418\ cycles.