Understanding Standard Deviations, Empirical Rule, and Z-Scores
Understanding Position Relative to the Mean
- Measuring Position: Our position within a dataset is often described by how many standard deviations we are away from the center (the average or mean).
- 0 standard deviations away: Perfectly at the center, exactly average.
- 1 standard deviation below average: A specific distance below the mean.
- 3 standard deviations below average: Further down from the mean.
- 1 standard deviation above average: A specific distance above the mean.
- Statistical vs. Everyday Vocabulary: There's an overlap between statistical terminology and everyday language, which can be a happy coincidence for understanding.
- Central 95%: What happens in the central 95% of a distribution is considered "typical," "usual," "common," "expected," or having "not significant values."
Probability and Shaded Areas
- Visualizing Probability: We start to understand probability in terms of shaded areas under a curve, particularly a normal distribution curve.
- Examples: This concept applies to a wide range of data points, such as 95% of books, 95% of health scores, or 95% of heights.
The Empirical Rule (68-95-99.7 Rule)
- Core Principle: This rule provides a quick estimate for the proportion of data that falls within one, two, or three standard deviations from the mean in a normal distribution.
- One Standard Deviation: If you go one standard deviation up and one standard deviation down from the center, you capture the central 68% of the population.
- This is true regardless of the specific units (e.g., 50 feet, 3 inches, 80 dollars).
- Two Standard Deviations: If you go two standard deviations up and two standard deviations down from the center, you capture the central 95% of the population.
- Three Standard Deviations: If you go three standard deviations up and three standard deviations down from the center, you capture the central 99.7% of the curve.
- Importance of Sketching: Even in a technologically advanced era, sketching these curves on paper is highly beneficial for understanding, as it's often simpler than complex notation.
Applying the Empirical Rule: Example Calculation
- Scenario: Determining the proportion of the curve between specific standard deviation points, such as from +1 to +2 standard deviations.
- Methodology:
- The central 95% covers from −2 to +2 standard deviations.
- The central 68% covers from −1 to +1 standard deviations.
- To find the area between −2 and −1 (or +1 and +2) standard deviations, subtract the central 68% from the central 95% to get the total area of the two outer segments: (95%−68%)=27%.
- Since the normal curve is symmetrical, divide this difference by 2 to get the area of one segment: 27%/2=13.5%. Therefore, 13.5% of the data falls between +1 and +2 standard deviations (and similarly between −2 and −1 standard deviations).
Relative vs. Absolute Comparisons: The Boy and Girl Height Example
- The Problem: Comparing the heights of an 8-year-old boy who is 49 inches tall and a 6-year-old girl who is 47 inches tall.
- Absolute Terms: In absolute terms, the boy is taller (49 inches is greater than 47 inches). However, this comparison is incomplete without context.
- Need for Context: To make a meaningful comparison, contextual factors like age, gender, and individual growth patterns (e.g., growth spurts) must be considered.
- Analogy to Fractions: Just as you need a common denominator to compare fractions (e.g., 110/123 vs. 57/71), you need a common frame of reference to compare values from different distributions.
- Analogy to Money Value: Comparing 5 dollars today to 2 dollars in 1980 is difficult without adjusting for inflation, as the units aren't directly comparable.
- Information Required for Relative Comparison: To compare quantities on an equal footing, you need the mean (average) and the standard deviation for each group being compared.
Boy's Height Analysis (8-year-olds)
- Population Mean Height (μ): 50 inches.
- Population Standard Deviation (σ): 2.2 inches.
- Normal Range (using Empirical Rule):
- 68% Range: Between 47.8 inches and 52.2 inches (i.e., μ±1σ=50±2.2).
- 95% Range: Between 45.6 inches and 54.4 inches (i.e., μ±2σ=50±2(2.2)).
- Boy's Height: 49 inches.
- This height is below average for an 8-year-old boy (49 < 50).
Girl's Height Analysis (6-year-olds)
- Population Mean Height (μ): 46 inches.
- Population Standard Deviation (σ): 2.1 inches.
- Girl's Height: 47 inches.
- This height is above average for a 6-year-old girl (47 > 46).
The Z-Score: Standardizing Comparisons
- Definition: The z-score (or standard score) represents the number of standard deviations an individual data point is from the mean of its distribution. It standardizes values, allowing for direct comparison across different datasets.
- Formula: z=(x−μ)/σ
- x is the individual data point.
- μ is the population mean.
- σ is the population standard deviation.
- Boy's Z-score:
- He is 49 inches tall, mean is 50, standard deviation is 2.2.
- z=(49−50)/2.2≈−0.45
- Interpretation: The boy is approximately 0.45 standard deviations below the average height for an 8-year-old boy.
- Girl's Z-score:
- She is 47 inches tall, mean is 46, standard deviation is 2.1.
- z=(47−46)/2.1≈0.47
- Interpretation: The girl is approximately 0.47 standard deviations above the average height for a 6-year-old girl.
- Relative Conclusion: Even though the boy is absolutely taller, the girl is relatively taller for her age, as her z-score is positive and numerically larger than the boy's absolute z-score (meaning she's further above her mean than he is below his).
- Practical Implications for Parents: Both children are considered "pretty average" for their age groups. Both fall within the central 68% range and are quite close to their respective means, indicating nothing unusual or special about their heights compared to their peers.
Requirement for Normal Distribution
- Application Scope: The Empirical Rule, z-scores, and the concept of standard deviations from a mean are applicable only when the data follows a bell-shaped (normal) distribution.
- Counter-example: Lottery Tickets: Lottery tickets, which have a uniform distribution (every number is equally likely), do not fit this model. There is no central mean around which values cluster; thus, concepts like z-scores are not applicable.
Connecting Standard Deviations to Scenario-Specific Units
- Bridging the Gap: It's crucial to connect the abstract concept of standard deviations to the concrete units of the scenario (e.g., inches, heart rate).
- Example: Saying a child is one standard deviation below average is equivalent to saying an 8-year-old boy is 47.8 inches tall or that a heart rate is concerning because only 2% of people have one that fast.