Density Curves and Normal Distribution

Module 3 - Section 2: Density Curves and Normal Distribution

Instructor: Rosana Fok

1. Introduction to Density Curves and Normal Model

  • Continuous random variables are represented using histograms.

  • Histograms break down the measurement scale into class intervals.

  • The area of the rectangle for each interval is proportional to the relative frequency of the data.

  • A smooth curve can fit onto the histogram, termed a density curve.

Example: Histogram of Walking Shoe Prices
  • Data categorization by price intervals from 0-100, with specific frequencies noted.


2. Properties of Density Curves

  • Conditions for Density Curves:

    • Must always remain on or above the horizontal axis.

    • Total area under the curve equals 1.

    • Area under the curve over an interval represents the proportion of total observations within that range.

    • Formal expression for probability between values:
      P(a<x<b) is represented by the area under the curve between a and b.

    • No probability is assigned to an exact value:
      P(x=a)=0P(x = a) = 0 (this is not true for discrete random variables).

    • The following equalities hold:
      P(a < x < b) = P(a ≤ x ≤ b) = P(a < x ≤ b) = P(a ≤ x < b)


3. Example of a Density Curve

  • Uniform Distribution Example:

    • Consider a density curve in the interval [0, 5].

    • Verify area under curve:

    • Area = length × width = 5imes0.2=15 imes 0.2 = 1

    • Probability calculations using this density curve:

    • P(X3)=extareaundercurvefromto3=3imes0.2=0.6P(X ≤ 3) = ext{area under curve from -∞ to 3} = 3 imes 0.2 = 0.6

    • P(1X2)=extareaundercurvefrom1to2=1imes0.2=0.2P(1 ≤ X ≤ 2) = ext{area under curve from 1 to 2} = 1 imes 0.2 = 0.2

    • P(X > 3) = ext{area under curve from 3 to ∞} = 2 imes 0.2 = 0.4

    • Alternatively, calculate using complementary probability:
      P(X > 3) = 1 - P(X ≤ 3) = 1 - 0.6 = 0.4


4. Understanding Normal Distribution

  • Many numerical variables exhibit bell-shaped histograms, such as heights, weights, and lifetimes of bulbs.

  • The normal distribution serves as an effective model for such types of data.

  • Notable as the most significant and widely utilized probability distribution.


5. Properties of Normal Distributions

  • Normal distribution curves have specific characteristics:

    • Symmetric, unimodal, and bell-shaped.

    • Each curve is determined by its mean (µ) and standard deviation (σ).

    • The mean (µ) marks the center of distribution and the peak of the density function.

    • The standard deviation (σ) dictates the spread/thickness of the curve.

    • Notation for normal models:
      N(µ,σ)N(µ,σ) corresponds to a Normal distribution with mean µ and standard deviation σ.


6. Z-scores and Standard Normal Distribution

  • Standardized values of Normal data are termed z-scores.

  • The formula for computing z-scores is given by:
    z=xμσz = \frac{x - \mu}{\sigma}

  • Z-scores follow a standard normal distribution, represented as
    zN(0,1)z \sim N(0,1).


7. Assessing Normality Assumption

  • Applying a normal model rests on the assumption that the data distribution is indeed normal.

  • Given the impracticality of directly verifying this assumption, check for the following:

    • Nearly Normal Condition: The distribution should appear unimodal and symmetric.

    • Validation methods include:

    • Creating a histogram.

    • Generating a Normal probability plot.

    • Drawing a Q-Q plot.


8. Normal Probability Plot

  • A specialized graphical format to evaluate the appropriateness of a normal model.

  • If data distribution is normal, the plot will align along a diagonal line.

  • Deviations from this line suggest non-normal distribution traits.


9. Visual Representations of Normal Probability Plots

  • Histogram and Normal Probability Plot:

    • Near normal data illustrates a histogram and a normal probability plot indicating similarity to a straight line.

    • Skewed Distribution:

    • Displays a histogram and normal probability plot reflecting asymmetry.


10. The Empirical Rule (68-95-99.7 Rule)

  • The Empirical Rule is derived from observational patterns, showing that normal curves effectively model various variables.

  • The rule applies specifically to normal distributions, encapsulating the following:

    • Approximately 68% of observations are within 1 standard deviation (σ) of the mean (µ).

    • µ±σµ ± σ

    • Approximately 95% are within 2 standard deviations.

    • µ±2σµ ± 2σ

    • Approximately 99.7% are within 3 standard deviations.

    • µ±3σµ ± 3σ

  • Representation of the rule visually correlates with a bell curve, denoting areas under the curve.


11. Example of Applying the Empirical Rule

  • Case Study:

    • Heights of 112 children follow a normal distribution characterized by:

    • Mean (µ) = 104.5

    • Standard Deviation (σ) = 16.3

    • Fill out the standard deviation intervals based on the Empirical Rule for different k values:

    • k=1:µ±1σk = 1: µ ± 1σ

    • k=2:µ±2σk = 2: µ ± 2σ

    • k=3:µ±3σk = 3: µ ± 3σ


12. Closing

  • Thank you for engaging with this module regarding density curves and the principles of normal distribution!