Chapter 2 Describing Distributions with Numbers
Measuring Center
Mean:
- The arithmetic average of a set of observations.
- Formula: ( ar{X} = \frac{x1 + x2 + … + x_n}{n} ) where ( n ) is the number of observations.
Median:
- The midpoint of a distribution, dividing observations into two equal halves.
- To find the median:
- Arrange observations in ascending order.
- If ( n ) is odd: Median = middle observation.
- If ( n ) is even: Median = average of two middle observations.
Comparison between Mean and Median:
- Useful for understanding data distributions.
- In symmetric distributions, they are close or equal.
- In skewed distributions, the mean tails toward the skew, while the median remains central.
Measuring Variability
Variability describes how spread out observations are.
Quartiles:
- Divide data into four parts:
- Q1: Median of the lower half of data.
- Q3: Median of the upper half of data.
- Interquartile Range (IQR): ( IQR = Q3 - Q1 )
Five-Number Summary:
- Minimum, Q1, Median, Q3, Maximum.
- Combines center and spread into one summary.
Boxplots:
- Visual representation of the five-number summary.
- Box spans Q1 to Q3 with a line at the median; "whiskers" extend to min and max values.
Spotting Suspected Outliers
Outliers can skew interpretations of data.
Use IQR to identify outliers:
- A point is a suspected outlier if it is > 1.5 * IQR above Q3 or below Q1.
Example for New York data:
- Q1=13.5, Q3=50, IQR=36.5 → 1.5 * IQR = 54.75
- Q1 - 1.5 * IQR = -41.25; Q3 + 1.5 * IQR = 104.75
- No observations fall outside these bounds, hence no outliers.
Measuring Variability: Standard Deviation
Measures how much each observation differs from the mean.
- Calculation Steps:
- Calculate the mean.
- Compute each deviation from the mean.
- Square each deviation.
- Calculate the variance using the formula ( s^2 = \frac{\sum (x_i - \bar{X})^2}{n - 1} )
- Standard deviation ( s = \sqrt{variance} ).
Properties of Standard Deviation:
- Always non-negative; zero only when all observations are equal.
- Sensitive to outliers.
- Units match the original data, unlike variance which is in squared units.
Choosing Measures of Center and Variability
- Mean and Standard Deviation: Appropriate for symmetric, outlier-free distributions.
- Median and IQR: Better for skewed distributions or those with outliers.
Examples of Technology in Statistics
- Tools such as graphing calculators, JMP, and Excel can perform statistical operations.
- Familiarity with outputs is crucial for interpreting results regardless of software.
Organizing a Statistical Problem
- Follow a four-step process:
- State: Define the practical question.
- Plan: Identify necessary statistical operations.
- Solve: Conduct calculations and create graphs.
- Conclude: Provide a practical conclusion relevant to the problem context.