Chapter 3 Part 3 Numerically Summarizing Data

Chapter 3: Numerically Summarizing Data

3.4 Measures of Position and Outliers

  • Learning Objectives:

    • Determine and interpret z-scores

    • Interpret percentiles

    • Determine and interpret quartiles

    • Determine and interpret the interquartile range

    • Check a set of data for outliers

Objective 1: Determine and Interpret Z-Scores

  • Z-Score Definition:

    • Represents the distance of a data value from the mean in terms of standard deviations.

    • Calculated as:

      • Z = (Data value - Mean) / Standard Deviation

    • Two types of z-scores: population z-score and sample z-score.

    • Characteristics:

      • Unitless

      • Mean = 0

      • Standard Deviation = 1

Example: Comparing Z-Scores (Imene vs. Akito)

  • Imene:

    • Score: 88

    • Mean (m) = 73.2, Standard Deviation (s) = 8.5

  • Akito:

    • Score: 91

    • Mean (m) = 75.8, Standard Deviation (s) = 9.2

    • Z-Scores computed showed:

      • Imene: 1.74

      • Akito: 1.65

    • Conclusion: Imene performed better relatively despite both scoring above average.

Objective 2: Interpret Percentiles

  • Definition:

    • The kth percentile (Pk) is a value where k percent of observations are less than or equal to that value.

  • Example:

    • Antonia's SAT Mathematics score of 600 is in the 74th percentile, meaning 74% scored below her and 26% above.

Objective 3: Determine and Interpret Quartiles

  • Definition:

    • Quartiles split data into four equal parts:

      • Q1: 25th percentile

      • Q2: 50th percentile (median)

      • Q3: 75th percentile

  • Finding Quartiles:

    1. Arrange data in ascending order.

    2. Determine Q2 (median).

    3. Divide data into halves to find Q1 and Q3.

  • Example:

    • Using Chicago ride-share data:

      • Q1 = 2.2 miles

      • Q2 (Median) = 4.85 miles

      • Q3 = 9.4 miles

Objective 4: Determine and Interpret the Interquartile Range (IQR)

  • Definition:

    • IQR measures the range of the middle 50% of data: IQR = Q3 - Q1.

    • For the ride-share data, IQR = 9.4 - 2.2 = 7.2 miles.

Objective 5: Check a Set of Data for Outliers

  • Definition:

    • Outliers are extreme observations that can skew analysis and may result from errors.

  • Steps to Identify Outliers:

    1. Calculate Q1 and Q3.

    2. Compute IQR.

    3. Determine fences:

    • Lower Fence = Q1 - 1.5(IQR)

    • Upper Fence = Q3 + 1.5(IQR)

    1. Identify outliers as values below the lower fence or above the upper fence.

Example: Checking for Outliers

  • Calculated Q1 = 2.2, Q3 = 9.4.

  • IQR found to be 7.2.

  • Fences:

    • Lower Fence = 2.2 - 10.8 = -8.6 miles (no outliers)

    • Upper Fence = 9.4 + 10.8 = 20.2 miles (rides 21.3, 21.6, 23.2, and 42.3 miles identified as outliers).

3.5 The Five-Number Summary and Boxplots

  • Learning Objectives:

    • Compute the five-number summary

    • Draw and interpret boxplots

Objective 1: Compute the Five-Number Summary

  • Five-Number Summary Includes:

    • Minimum

    • Q1

    • Median (Q2)

    • Q3

    • Maximum

  • Example:

    • For the Chicago ride-share data:

      • Summary: MIN = 0.4, Q1 = 2.2, M = 4.85, Q3 = 9.4, MAX = 42.3

Objective 2: Draw and Interpret Boxplots

  • Steps to Draw a Boxplot:

    1. Calculate lower and upper fences.

    2. Create a number line, marking Q1, Q2, Q3.

    3. Form the box and draw "whiskers" to the data values within the fences.

    4. Mark outliers with an asterisk.

  • Example:

    • Analysis of shared vs. non-shared ride distances indicated differences in median distance and spread, suggesting a preference for sharing rides for shorter distances.

robot