Normal Distribution
Normal Distribution
Introduction to Normal Distribution
This module covers the normal distribution, a fundamental concept in statistics, including how transformations affect its parameters, properties of its density curve, the empirical rule, z-scores, and the use of the standard normal table for probability calculations, as well as graphical assessment using Q-Q plots.
Shifting vs. Scaling
These are transformations applied to data that affect its statistical properties:
Shifting (Center and Position): Adding or subtracting a constant value to each data point.
Effect on Mean: The mean changes by the amount of the constant added or subtracted.
Effect on Standard Deviation: The standard deviation (and thus spread) remains unchanged.
Example (Bonus Marks): If X is the original mark and a bonus of 5% (interpreted as 5 percentage points) is given, making the new mark S = X + 5. The new mean score will be \muS = \muX + 5, while the standard deviation will remain the same, \sigmaS = \sigmaX.
Scaling (Spread and Shape): Multiplying or dividing each data point by a constant value.
Effect on Mean: The mean is multiplied or divided by the same constant.
Effect on Standard Deviation: The standard deviation is also multiplied or divided by the absolute value of the same constant.
Effect on Variance: The variance is multiplied or divided by the square of the constant.
Formulas provided for standard deviation and variance when scaling by a factor d:
New standard deviation: s{new} = s{original} \times d
New variance: s{new}^2 = s{original}^2 \times d^2
Example (Scaling Marks): If X is the original mark and each mark is scaled up by 5%, meaning S = 1.05 \times X. The new mean score will be \muS = 1.05 \times \muX, and the new standard deviation will be \sigmaS = 1.05 \times \sigmaX. For example, if a student got 60%, their new mark would be 60 \times 1.05 = 63%.
Density Curve
A density curve is a graphical representation of the distribution of a continuous variable. It must satisfy specific conditions:
Non-negativity: A density curve is always on or above the horizontal axis (i.e., its values are never negative, f(x) \ge 0).
Total Area: The total area between the horizontal axis and under the density curve must equal 1 (or 100%), representing the total probability or proportion of observations.
Probability as Area: The probability of an observation falling within an interval (a, b), denoted as P(a < x < b), is equal to the area under the curve between a and b. This also represents the percentage of observations that will fall in that interval.
Continuous Variables Properties: For continuous variables, certain probability statements hold:
The probability of a single exact value is zero: P(x = a) = 0.
The inclusion or exclusion of endpoints does not change the probability for an interval: P(a < x < b) = P(a \le x \le b) = P(a < x \le b) = P(a \le x < b).
Example (Density Curve from Page 5):
Verification of Area = 1 (by geometry): Assuming a uniform distribution from 0 to 2 with a height (density) of 0.5. The area is Base \times Height = (2 - 0) \times 0.5 = 2 \times 0.5 = 1.
Probability that X is less than 1: P(X < 1) is the area from 0 to 1. Area = (1 - 0) \times 0.5 = 1 \times 0.5 = 0.5.
Probability that X is greater than 1.5: P(X > 1.5) is the area from 1.5 to 2. Area = (2 - 1.5) \times 0.5 = 0.5 \times 0.5 = 0.25.
Empirical Rule (68-95-99.7% Rule)
This rule applies specifically to data that follows a normal distribution, describing the approximate percentage of observations that fall within a certain number of standard deviations from the mean:
Approximately 68% of observations fall within 1 standard deviation (\pm 1\sigma) of the mean (\mu).
Approximately 95% of observations fall within 2 standard deviations (\pm 2\sigma) of the mean (\mu).
Approximately 99.7% of observations fall within 3 standard deviations (\pm 3\sigma) of the mean (\mu).
Example (Standardized Exam Completion Time):
Given: Mean (\mu = 70 minutes), Standard Deviation (\sigma = 10 minutes).
Percentage of students completing the exam in under an hour (60 minutes):
60 minutes is one standard deviation below the mean (70 - 10 = 60).
According to the empirical rule, 68% of students complete between 60 and 80 minutes (\mu \pm 1\sigma).
The remaining 100\% - 68\% = 32\% are outside this range, split equally below 60 and above 80. So, 32\% / 2 = 16\% complete in under 60 minutes.
Percentage of students completing the exam between 60 and 70 minutes:
This interval is from one standard deviation below the mean to the mean (\mu - \sigma to \mu).
This represents exactly half of the 68% interval: 68\% / 2 = 34\%.
Time interval for the central 95% of students:
The central 95% of observations fall within 2 standard deviations of the mean (\mu \pm 2\sigma).
The interval is 70 \pm (2 \times 10) = 70 \pm 20.
Therefore, the central 95% of students would be found between 50 and 90 minutes.
Standardized Score / z-score
The z-score is a measure of relative standing that indicates how many standard deviations an observation is away from the mean.
Relative Standing: It positions a data point relative to the rest of the distribution.
No Units: Z-scores are unitless, allowing comparison across different distributions.
Interpretation: A z-score tells