Normality and Z-Scores in the Distribution of Sample Means

Conditions for a Normal Distribution of Sample Means

Determining whether the distribution of sample means (MM) will be normal depends on two specific criteria. According to the rule of thumb, the distribution of means reaches normality if either of the following conditions is true:

  • Condition 1: The Parent Population is Normal. If the original population from which samples are drawn follows a normal bell-curve distribution, the distribution of sample means will also be normal, regardless of sample size.
  • Condition 2: Sufficient Sample Size. If the parent population is not normal, the distribution of sample means will still become normal if the sample size (nn) is large enough. The threshold for this is generally considered to be n25 to 30n \ge 25 \text{ to } 30.

It is important to note that you only need one of these conditions to be true for the distribution of means to be normal. It is also acceptable if both are true. The distribution will only fail to be normal if both conditions are false (i.e., the parent population is non-normal and the sample size is less than 2525 to 3030).

Analysis of Parent Population Shapes
  • Bimodal or Right-Skewed: These shapes are definitively "not normal." If the parent population is bimodal or right-skewed, we must rely entirely on the sample size (nn) to determine if the distribution of sample means is normal.
  • Bell Curve Shaped: This is synonymous with a "normal" distribution. If the parent population is described this way, the distribution of sample means is normal regardless of sample size.
Scenarios for Determining Normality
  1. Non-normal Parent Population + Small Sample Size (n<25n < 25): The distribution of sample means is not normal because neither condition is met.
  2. Non-normal Parent Population + Large Sample Size (n25n \ge 25): The distribution of sample means is normal based on the second condition (sample size).
  3. Normal Parent Population + Small Sample Size: The distribution of sample means is normal based on the first condition (parent population).
  4. Normal Parent Population + Large Sample Size: The distribution of sample means is normal because both conditions are satisfied.

Descriptive Statistics for the Distribution of Sample Means

When asked to describe the central tendency, variability, and shape of a distribution of sample means, the process is straightforward based on the Central Limit Theorem:

  • Central Tendency (Expected Value): The expected value of the mean (E[M]E[M]) is always equal to the population mean (μ\mu). If a problem identifies μ=100\mu = 100, then the expected value for the distribution of sample means is simply 100100.
  • Variability (Standard Error): This is the "width" of the distribution. It is calculated by dividing the population standard deviation (σ\sigma) by the square root of the sample size (nn). The formula for standard error (σM\sigma_M) is: σM=σn\sigma_M = \frac{\sigma}{\sqrt{n}}.
  • Shape: The shape is described as either "normal" or "not normal" based on the two conditions discussed previously.

The Z-Score Formula for Sample Means

Previously in statistics (Chapter 6), the focus was on the distribution of individual scores (XX). In Chapter 7, the focus shifts to the distribution of sample means (MM). This requires an adjustment to the z-score formula.

Comparison of Formulas

Z-score for Individual Scores:z=Xμσz = \frac{X - \mu}{\sigma}

  • XX: The individual raw score.
  • μ\mu: The mean of the scores.
  • σ\sigma: The standard deviation of the scores.

Z-score for Sample Means:z=MμσMz = \frac{M - \mu}{\sigma_M}

  • MM: The mean of a specific sample.
  • μ\mu: The mean of the distribution of means (which equals the population mean μ\mu as established by theorem).
  • σM\sigma_M: The standard error (the standard deviation of the distribution of means).

In this formula, σM\sigma_M measures how spread out the sample means are, just as σ\sigma measures how spread out individual scores are.

Comparative Example: Individual Score vs. Sample Mean

Data Context: SAT scores form a normal distribution with μ=500\mu = 500 and σ=100\sigma = 100.

Part A: Probability for an Individual Student

Question: What is the probability of randomly selecting a student who has an SAT score greater than 525525?

  1. Identify Equation: Use the standard z-score for scores: z=Xμσz = \frac{X - \mu}{\sigma}.
  2. Calculate Z-score:z=525500100=25100=+0.25z = \frac{525 - 500}{100} = \frac{25}{100} = +0.25
  3. Find Proportion: In the unit normal table, look up the tail proportion for z=0.25z = 0.25.
    • Tail Proportion (pp) = 0.40130.4013 or 40.13%40.13\%
Part B: Probability for a Sample of Students

Question: What is the probability of randomly selecting a sample of 1616 students with a mean SAT score greater than 525525?

  1. Identify Equation: Use the z-score for sample means: z=MμσMz = \frac{M - \mu}{\sigma_M}.
  2. Calculate Standard Error (σM\sigma_M):σM=σn=10016=1004=25\sigma_M = \frac{\sigma}{\sqrt{n}} = \frac{100}{\sqrt{16}} = \frac{100}{4} = 25
  3. Calculate Z-score:z=52550025=2525=+1.00z = \frac{525 - 500}{25} = \frac{25}{25} = +1.00
  4. Find Proportion: In the unit normal table, look up the tail proportion for z=1.00z = 1.00.
    • Tail Proportion (pp) = 0.15870.1587 or 15.87%15.87\%

Probability and Distribution Shape Analysis

Comparing the results of the SAT example highlights the difference between individual scores and sample averages:

  • Probability Distribution: It is much more likely to find a single individual with a high score (40.13%40.13\% chance) than it is to find a group of 1616 people whose combined average is that same high score (15.87%15.87\% chance).
  • Standard Error vs. Standard Deviation: Note how the denominator changed from 100100 to 2525 when moving from an individual to a sample. The distribution of sample means is much tighter (less spread out) than the distribution of individual scores.
  • Visualizing the Z-Table: When finding probabilities "greater than" a positive z-score, we are always looking for the area in the tail (shading to the right on a standard normal curve).