Normal Distribution Notes: Venus Williams First Serve Speeds (Ch 2.2)
Setup and Notation
In these problems, we model Venus Williams’ first-serve speeds as a normal distribution. The given parameters are:
- Mean (μ) = 115 mph
- Standard deviation (σ) = 6 mph
Let X denote the first-serve speed. Then X ∼ N(μ, σ²).
We distinguish between solving for probabilities (areas under the curve) and solving for data values (specific speeds corresponding to a given percentile). The calculator functionality discussed uses two main tools: - Normal CDF: used to find probabilities (areas under the curve) for a given interval or tail.
- Inverse Normal (Inverse N): used to find data values corresponding to a given percentile (area to the left).
A few practical notes from the session: - When solving for a probability like P(X > a), you can use normalCDF with a lower bound a and a very large upper bound (e.g., 10,000) to approximate infinity, or compute 1 − Φ((a − μ)/σ).
- When solving for a percentile (a data value at a given left-tail probability p), use inverseNorm(p, μ, σ). If your calculator only supports left-tail input and you need the right tail (e.g., the top 10%), use the complement: pright = 0.10 corresponds to pleft = 0.90 for invNorm(0.90, μ, σ).
- In all TI calculator inputs, be mindful of proper input order and delimiters (e.g., normalCDF(lower, upper, μ, σ)).
Part A: Proportion exceeding 120 mph
Setup and goal:
- We want P(X > 120) for X ∼ N(μ = 115, σ = 6).
- We can either use a z-score to transform to the standard normal or use the calculator directly with a normal CDF.
- Z-score form: z = (120 − μ)/σ = (120 − 115)/6 = 5/6 ≈ 0.8333. Then P(X > 120) = 1 − Φ(0.8333).
- Φ(0.8333) ≈ 0.7977, so P(X > 120) ≈ 1 − 0.7977 = 0.2023.
Calculator route (as demonstrated):
- Use Normal CDF with lower = 120, upper = 10,000 (or a large number), μ = 115, σ = 6:
P(X > 120) = ext{NormalCDF}(120, 10000, 115, 6) \,=\, 0.2023. - Note the input formatting: some calculators require commas between inputs (e.g., 120, 10000, 115, 6).
- Result: approximately
Interpretation:
- About 20.23% of Venus’ serves are expected to exceed 120 mph under the provided normal model.
Part B: Proportion between 100 mph and 110 mph
Goal:
- Compute P(100 < X < 110).
- Use either the CDF difference or the standard-normal transformation:
P(100 < X < 110) = igl[ ext{Φ}igl( frac{110 - μ}{σ}igr) - ext{Φ}igl( frac{100 - μ}{σ}igr)igr] = ext{Φ}(-0.8333) - ext{Φ}(-2.5). - Values: Φ(-0.8333) ≈ 0.2023 and Φ(-2.5) ≈ 0.0062, giving
P(100 < X < 110) ≈ 0.2023 - 0.0062 = 0.1961.
Calculator route:
- Using normalCDF on [100, 110] with μ = 115, σ = 6:
P(100 < X < 110) = ext{NormalCDF}(100, 110, 115, 6). - Result (as shown): approximately
Interpretation:
- About 19.6% of serves fall between 100 mph and 110 mph under this model.
Part C: The fastest 10% (the 90th percentile) and the inverse norm
Goal:
- Find the speed x such that P(X ≤ x) = 0.90, i.e., the 90th percentile. This gives the speed above which the top 10% of serves fall.
Method 1 (inverse norm with left-tail probability):
- Since invNorm typically uses the left-tail area, use area p = 0.90 with μ and σ:
- With μ = 115 and σ = 6, we get
- Reported value in the session: approximately
Notes on calculator tails:
- Some calculators allow you to specify the tail (left/center/right). If you need the area to the right of x (i.e., the upper tail), you can use p = 0.90 for the left-tail, or equivalently use p = 0.10 for the right-tail depending on the calculator. In devices without a tail option, you can use the left-tail 0.90 (invNorm) to obtain the 90th percentile.
Interpretation:
- The speed corresponding to the fastest 10% of serves is about
IQR for Venus Williams’ first serve speeds
Goal:
- Find Q1 (25th percentile) and Q3 (75th percentile) and compute the interquartile range (IQR).
Formulas:
- With μ = 115, σ = 6, and z-values: z{0.25} ≈ -0.67449, z{0.75} ≈ 0.67449.
- Therefore:
- Interquartile range:
Observed result in the session:
- IQR ≈
Assessing normality with the 68-95-99.7 rule
Idea:
- The rule states that for a normal distribution:
- About 68% of data lie within one standard deviation of the mean.
- About 95% lie within two standard deviations.
- About 99.7% lie within three standard deviations.
Examples discussed:
- Calories per serving data (mean ≈ 106.883):
- Within one SD: 81.8% (counted from the data), which is notably higher than 68%; this is evidence against normality for this data set.
- Within two SD: 99.2% (close to 95% but higher), and within three SD: 100% (as expected for any finite data set). Overall, this distribution did not closely fit the 68-95-99.7 rule, so not approximately normal.
- Chapter 1 test scores (sample size 32):
- Within one SD: ≈ 65.63% (close to 68%), within two SD: ≈ 93.75% (close to 95%), within three SD: 100% (as expected).
- This data set was judged to be approximately normal since the proportions were sufficiently close to the rule, unlike the cereal example.
How to justify a normal-approximation claim:
- Compare the observed proportions within 1, 2, and 3 SDs to 68%, 95%, and 99.7%. If they are reasonably close, you may justify using a normal model. If not, the distribution is not approximately normal.
- A quick diagnostic is to use a normal CDF to compute these proportions and compare to the rule.
Normal probability plots
Concept:
- A normal probability plot (also called a Q-Q plot against the standard normal) compares the data against a theoretical normal distribution by plotting the data (or their order statistics) versus the corresponding z-scores (or normal quantiles).
What to look for: - A linear pattern in the plot indicates that the data are approximately normal.
- Deviations from linearity suggest departures from normality.
Examples from the session:
- IQ scores: The normal probability plot showed a roughly linear pattern, supporting approximate normality.
- Number of siblings: The plot was clearly curved/nonlinear, indicating non-normality (and the underlying distribution is skewed).
- Calories per serving: Despite a symmetric, bell-shaped dot plot, the normal probability plot was not linear, suggesting the data are not approximately normal.
Takeaway:
- Do not rely solely on a symmetric-looking dot plot; check the normal probability plot to assess normality.
Practical calculator tips and conceptual takeaways
- Distinguish when you need a probability (normal CDF) versus a data value (inverse normal):
- Use Normal CDF for P(a < X < b) or P(X > a).
- Use Inverse Normal to find a data value x corresponding to a given percentile (e.g., 0.90, 0.75).
- When your problem specifies a right tail but your calculator only takes a left-tail input, convert using complements: if you want the right-tail area p, use left-tail input 1 − p.
- For unbounded upper limits in normal CDF, use a large upper bound (e.g., 10,000) to approximate infinity.
- Remember the relationships for percentiles in a normal model:
- 90th percentile:
- 75th percentile:
- 25th percentile:
Connections to broader concepts
- These exercises reinforce the core idea that many real-world measurements (like speeds, test scores, biological measurements) can be modeled as approximately normal when the measurement error is additive and independent. The standard normal framework (z-scores, Φ, invΦ) is fundamental for converting raw data to standardized units, enabling straightforward probability and percentile calculations.
- The 68-95-99.7 rule is a quick heuristic for checking normality and planning sample sizes or confidence intervals in introductory statistics. When data deviate meaningfully from these benchmarks, consider nonparametric methods or transformations.
Quick recap of numerical results ( Venus Williams example )
- P(X > 120) ≈
- P(100 < X < 110) ≈
- x_{0.90} ≈
- Q1 ≈ , Q3 ≈
- IQR ≈
- Normality assessment (example datasets) shows how the 68-95-99.7 rule can support or challenge the assumption of normality depending on how close the observed proportions are to the rule.
Practice and next steps
- Remember to practice both Normal CDF and Inverse Normal with a variety of μ and σ, and to translate between tail areas and left-tail inputs when needed.
- Explore the Canvas 2.2 practice questions and the upcoming quick quizzes as mentioned in the session, to reinforce these concepts before the next assessment.