Statistics - Z-Scores, Boxplots, Quartiles, Percentiles & Outliers

Z Scores

  • Z-score indicates how many standard deviations a value is from its population mean.

  • z = 1: value is one standard deviation above the mean.

  • z = -2: value is two standard deviations below the mean.

Computing a Z Score

  • Let x be a value from a population with mean \mu and standard deviation \sigma.
  • The z-score for x is calculated as: z = \frac{x - \mu}{\sigma}.

Example

  • Mean height for adult men in the US: \mu = 69.4 inches, \sigma = 3.1 inches.

  • Mean height for adult women in the US: \mu = 63.8 inches, \sigma = 2.8 inches.

  • Man's height: 73 inches. Woman's height: 68 inches.

  • Z-score for the man's height:

    • z = \frac{73 - 69.4}{3.1} = 1.16.
  • Z-score for the woman's height:

    • z = \frac{68 - 63.8}{2.8} = 1.5.
  • The woman is taller relative to the population because of a higher z score.

Empirical Rule and Z Scores

  • For bell-shaped populations:
    • Approximately 68% of data have z scores between -1 and 1.
    • Approximately 95% of data have z scores between -2 and 2.
    • Almost all data have z scores between -3 and 3.

Boxplots

  • Boxplot: A graph presenting the five-number summary and additional data information.
  • Modified boxplot: a type of boxplot.

Constructing a Box Plot

  • Data: Number of students absent in a middle school in Northwestern Montana during January.
  • Step 1: Compute quartiles using technology (e.g., TI-84 Plus).
    • Q_1 = 45
    • Q_2 \text{ (median)} = 51
    • Q_3 = 59
  • Step 2: Draw vertical lines at Q1, Q2, and Q3; complete the box with horizontal lines.
  • Step 3: Calculate the interquartile range (IQR).
    • IQR = Q3 - Q1 = 59 - 45 = 14.
  • Compute outlier boundaries:
    • Lower outlier boundary: Q_1 - 1.5 \times IQR = 45 - 1.5 \times 14 = 24.
    • Upper outlier boundary: Q_3 + 1.5 \times IQR = 59 + 1.5 \times 14 = 80.
  • Step 4: Find the largest data value less than the upper boundary (77) and draw a horizontal line from Q_3 to it.
  • Step 5: Find the smallest data value greater than the lower boundary (41) and draw a horizontal line from Q_1 to it.
  • Step 6: Identify outliers (e.g., 100) and plot them separately.

Skewness and Boxplots

  • Right Skew:
    • Median closer to Q1 than Q3.
    • Upper whisker longer than lower whisker.
  • Left Skew:
    • Median closer to Q3 than Q1.
    • Lower whisker longer than upper whisker.
  • Symmetric:
    • Median approximately halfway between Q1 and Q3.
    • Whiskers approximately equal in length.

Quartiles

  • Quartiles divide a dataset into four equal parts.
  • Every dataset has three quartiles: Q1, Q2, and Q_3.
    • Q_1: separates the lowest 25% from the highest 75%.
    • Q_2: (median) separates the lowest 50% from the highest 50%.
    • Q_3: separates the lowest 75% from the highest 25%.

Calculating Quartiles

  1. Arrange data in increasing order.
  2. Let n = number of values.
    • For Q_1: L = 0.25 \times n.
    • For Q_3: L = 0.75 \times n.
  3. If L is a whole number, the quartile is the average of the values in positions L and L+1.
  4. If L is not a whole number, round up to the next whole number, and the quartile is the value in that position.
  5. Q_2 is the median.

Example

  • Annual rainfall in Los Angeles during February over several years (45 values, already sorted).
    • For Q1: L = 0.25 \times 45 = 11.25 \approx 12. Q1 = 0.92.
    • For Q3: L = 0.75 \times 45 = 33.75 \approx 34. Q3 = 4.89.
    • Median: Q_2 = 3.21.

Five-Number Summary

  • Consists of: minimum, Q1, median, Q3, maximum.
  • Rainfall data summary: 0.14, 0.92, 3.21, 4.89, 13.68.

Using Technology for Quartiles

  • Different technologies may use different procedures for finding quartiles.
  • Example using TI-84 Plus calculator.

Detecting Outliers

  • Outlier: A value much larger or smaller than the other values in a data set.
  • Outliers can result from errors or reflect extreme values in the population.

IQR Method

  1. Find Q1 and Q3.
  2. Compute IQR = Q3 - Q1.
  3. Compute outlier boundaries:
    • Lower boundary: Q_1 - 1.5 \times IQR.
    • Upper boundary: Q_3 + 1.5 \times IQR.
  4. Any data value below the lower boundary or above the upper boundary is an outlier.

Example

  • Absent students data: Q1 = 45, Q3 = 59.
    • IQR = 59 - 45 = 14.
    • Lower boundary: 45 - 1.5 \times 14 = 24.
    • Upper boundary: 59 + 1.5 \times 14 = 80.
    • The value 100 is greater than the upper boundary and is an outlier.

Interpreting Quartiles

  • The median divides the dataset into two parts.
  • Quartiles divide a dataset into four parts.
    • Q_1: separates the lowest 25% from the highest 75%.
    • Q_2: (median) separates the lowest 50% from the highest 50%.
    • Q_3: separates the lowest 75% from the highest 25%.

Examples

  • Kayla's high B grade is more likely to be on the third quartile.
  • Zorida (Q1), Phoebe (Q3), and Joanne (median); Zorida had the shortest average sleep duration.

Percentiles

  • Percentiles divide a dataset into hundredths.
  • The p^{th} percentile separates the lowest p% of the data from the highest 100-p%.
  • Example: The 1st percentile separates the lowest 1% from the highest 99%.

Calculating Percentiles

  1. Arrange data in increasing order.
  2. Let n = the number of values in the dataset.
  3. For the p^{th} percentile, calculate L = \frac{p}{100} \times n.
  4. If L is a whole number, the percentile is the average of the numbers in positions L and L+1.
  5. If L is not a whole number, round it up to the next higher whole number, and the percentile is the number in this position.

Example

  • Rainfall data in Los Angeles (45 values, already sorted); find the 60th percentile.
    • L = \frac{60}{100} \times 45 = 27.
    • 60th percentile = \frac{3.58 + 3.71}{2} = 3.645.

Finding Percentile for a Given Value

  1. Arrange data in increasing order.
  2. Let x be the value whose percentile is to be computed.
  3. Percentile = 100 \times \frac{\text{number of values less than } x + 0.5}{\text{number of data values}}.
  4. Round the result to the nearest whole number.

Example

  • In 1989, rainfall was 1.9 inches; what percentile does this correspond to?
    • 17 values are less than 1.9.
    • Percentile = 100 \times \frac{17 + 0.5}{45} = 38.9 \approx 39. The value 1.9 corresponds to the 39th percentile.