Chapter 2 Describing Distributions with Numbers

Measuring Center

  • Mean:

    • The arithmetic average of a set of observations.
    • Formula: ( ar{X} = \frac{x1 + x2 + … + x_n}{n} ) where ( n ) is the number of observations.
  • Median:

    • The midpoint of a distribution, dividing observations into two equal halves.
    • To find the median:
    • Arrange observations in ascending order.
    • If ( n ) is odd: Median = middle observation.
    • If ( n ) is even: Median = average of two middle observations.
  • Comparison between Mean and Median:

    • Useful for understanding data distributions.
    • In symmetric distributions, they are close or equal.
    • In skewed distributions, the mean tails toward the skew, while the median remains central.

Measuring Variability

  • Variability describes how spread out observations are.

  • Quartiles:

    • Divide data into four parts:
    • Q1: Median of the lower half of data.
    • Q3: Median of the upper half of data.
    • Interquartile Range (IQR): ( IQR = Q3 - Q1 )
  • Five-Number Summary:

    • Minimum, Q1, Median, Q3, Maximum.
    • Combines center and spread into one summary.
  • Boxplots:

    • Visual representation of the five-number summary.
    • Box spans Q1 to Q3 with a line at the median; "whiskers" extend to min and max values.

Spotting Suspected Outliers

  • Outliers can skew interpretations of data.

  • Use IQR to identify outliers:

    • A point is a suspected outlier if it is > 1.5 * IQR above Q3 or below Q1.
  • Example for New York data:

    • Q1=13.5, Q3=50, IQR=36.5 → 1.5 * IQR = 54.75
    • Q1 - 1.5 * IQR = -41.25; Q3 + 1.5 * IQR = 104.75
    • No observations fall outside these bounds, hence no outliers.

Measuring Variability: Standard Deviation

  • Measures how much each observation differs from the mean.

    • Calculation Steps:
    1. Calculate the mean.
    2. Compute each deviation from the mean.
    3. Square each deviation.
    4. Calculate the variance using the formula ( s^2 = \frac{\sum (x_i - \bar{X})^2}{n - 1} )
    5. Standard deviation ( s = \sqrt{variance} ).
  • Properties of Standard Deviation:

    • Always non-negative; zero only when all observations are equal.
    • Sensitive to outliers.
    • Units match the original data, unlike variance which is in squared units.

Choosing Measures of Center and Variability

  • Mean and Standard Deviation: Appropriate for symmetric, outlier-free distributions.
  • Median and IQR: Better for skewed distributions or those with outliers.

Examples of Technology in Statistics

  • Tools such as graphing calculators, JMP, and Excel can perform statistical operations.
  • Familiarity with outputs is crucial for interpreting results regardless of software.

Organizing a Statistical Problem

  • Follow a four-step process:
    1. State: Define the practical question.
    2. Plan: Identify necessary statistical operations.
    3. Solve: Conduct calculations and create graphs.
    4. Conclude: Provide a practical conclusion relevant to the problem context.