Statistics - Z-Scores, Boxplots, Quartiles, Percentiles & Outliers

Z Scores

  • Z-score indicates how many standard deviations a value is from its population mean.

  • z=1z = 1: value is one standard deviation above the mean.

  • z=2z = -2: value is two standard deviations below the mean.

Computing a Z Score

  • Let xx be a value from a population with mean μ\mu and standard deviation σ\sigma.
  • The z-score for xx is calculated as: z=xμσz = \frac{x - \mu}{\sigma}.
Example
  • Mean height for adult men in the US: μ=69.4\mu = 69.4 inches, σ=3.1\sigma = 3.1 inches.

  • Mean height for adult women in the US: μ=63.8\mu = 63.8 inches, σ=2.8\sigma = 2.8 inches.

  • Man's height: 73 inches. Woman's height: 68 inches.

  • Z-score for the man's height:

    • z=7369.43.1=1.16z = \frac{73 - 69.4}{3.1} = 1.16.
  • Z-score for the woman's height:

    • z=6863.82.8=1.5z = \frac{68 - 63.8}{2.8} = 1.5.
  • The woman is taller relative to the population because of a higher z score.

Empirical Rule and Z Scores

  • For bell-shaped populations:
    • Approximately 68% of data have z scores between -1 and 1.
    • Approximately 95% of data have z scores between -2 and 2.
    • Almost all data have z scores between -3 and 3.

Boxplots

  • Boxplot: A graph presenting the five-number summary and additional data information.
  • Modified boxplot: a type of boxplot.

Constructing a Box Plot

  • Data: Number of students absent in a middle school in Northwestern Montana during January.
  • Step 1: Compute quartiles using technology (e.g., TI-84 Plus).
    • Q1=45Q_1 = 45
    • Q2 (median)=51Q_2 \text{ (median)} = 51
    • Q3=59Q_3 = 59
  • Step 2: Draw vertical lines at Q1, Q2, and Q3; complete the box with horizontal lines.
  • Step 3: Calculate the interquartile range (IQR).
    • IQR=Q<em>3Q</em>1=5945=14IQR = Q<em>3 - Q</em>1 = 59 - 45 = 14.
  • Compute outlier boundaries:
    • Lower outlier boundary: Q11.5×IQR=451.5×14=24Q_1 - 1.5 \times IQR = 45 - 1.5 \times 14 = 24.
    • Upper outlier boundary: Q3+1.5×IQR=59+1.5×14=80Q_3 + 1.5 \times IQR = 59 + 1.5 \times 14 = 80.
  • Step 4: Find the largest data value less than the upper boundary (77) and draw a horizontal line from Q3Q_3 to it.
  • Step 5: Find the smallest data value greater than the lower boundary (41) and draw a horizontal line from Q1Q_1 to it.
  • Step 6: Identify outliers (e.g., 100) and plot them separately.

Skewness and Boxplots

  • Right Skew:
    • Median closer to Q<em>1Q<em>1 than Q</em>3Q</em>3.
    • Upper whisker longer than lower whisker.
  • Left Skew:
    • Median closer to Q<em>3Q<em>3 than Q</em>1Q</em>1.
    • Lower whisker longer than upper whisker.
  • Symmetric:
    • Median approximately halfway between Q<em>1Q<em>1 and Q</em>3Q</em>3.
    • Whiskers approximately equal in length.

Quartiles

  • Quartiles divide a dataset into four equal parts.
  • Every dataset has three quartiles: Q<em>1Q<em>1, Q</em>2Q</em>2, and Q3Q_3.
    • Q1Q_1: separates the lowest 25% from the highest 75%.
    • Q2Q_2: (median) separates the lowest 50% from the highest 50%.
    • Q3Q_3: separates the lowest 75% from the highest 25%.

Calculating Quartiles

  1. Arrange data in increasing order.
  2. Let nn = number of values.
    • For Q1Q_1: L=0.25×nL = 0.25 \times n.
    • For Q3Q_3: L=0.75×nL = 0.75 \times n.
  3. If LL is a whole number, the quartile is the average of the values in positions LL and L+1L+1.
  4. If LL is not a whole number, round up to the next whole number, and the quartile is the value in that position.
  5. Q2Q_2 is the median.
Example
  • Annual rainfall in Los Angeles during February over several years (45 values, already sorted).
    • For Q<em>1Q<em>1: L=0.25×45=11.2512L = 0.25 \times 45 = 11.25 \approx 12. Q</em>1=0.92Q</em>1 = 0.92.
    • For Q<em>3Q<em>3: L=0.75×45=33.7534L = 0.75 \times 45 = 33.75 \approx 34. Q</em>3=4.89Q</em>3 = 4.89.
    • Median: Q2=3.21Q_2 = 3.21.

Five-Number Summary

  • Consists of: minimum, Q<em>1Q<em>1, median, Q</em>3Q</em>3, maximum.
  • Rainfall data summary: 0.14, 0.92, 3.21, 4.89, 13.68.

Using Technology for Quartiles

  • Different technologies may use different procedures for finding quartiles.
  • Example using TI-84 Plus calculator.

Detecting Outliers

  • Outlier: A value much larger or smaller than the other values in a data set.
  • Outliers can result from errors or reflect extreme values in the population.

IQR Method

  1. Find Q<em>1Q<em>1 and Q</em>3Q</em>3.
  2. Compute IQR=Q<em>3Q</em>1IQR = Q<em>3 - Q</em>1.
  3. Compute outlier boundaries:
    • Lower boundary: Q11.5×IQRQ_1 - 1.5 \times IQR.
    • Upper boundary: Q3+1.5×IQRQ_3 + 1.5 \times IQR.
  4. Any data value below the lower boundary or above the upper boundary is an outlier.
Example
  • Absent students data: Q<em>1=45Q<em>1 = 45, Q</em>3=59Q</em>3 = 59.
    • IQR=5945=14IQR = 59 - 45 = 14.
    • Lower boundary: 451.5×14=2445 - 1.5 \times 14 = 24.
    • Upper boundary: 59+1.5×14=8059 + 1.5 \times 14 = 80.
    • The value 100 is greater than the upper boundary and is an outlier.

Interpreting Quartiles

  • The median divides the dataset into two parts.
  • Quartiles divide a dataset into four parts.
    • Q1Q_1: separates the lowest 25% from the highest 75%.
    • Q2Q_2: (median) separates the lowest 50% from the highest 50%.
    • Q3Q_3: separates the lowest 75% from the highest 25%.
Examples
  • Kayla's high B grade is more likely to be on the third quartile.
  • Zorida (Q1), Phoebe (Q3), and Joanne (median); Zorida had the shortest average sleep duration.

Percentiles

  • Percentiles divide a dataset into hundredths.
  • The pthp^{th} percentile separates the lowest pp% of the data from the highest 100p100-p%.
  • Example: The 1st percentile separates the lowest 1% from the highest 99%.

Calculating Percentiles

  1. Arrange data in increasing order.
  2. Let nn = the number of values in the dataset.
  3. For the pthp^{th} percentile, calculate L=p100×nL = \frac{p}{100} \times n.
  4. If LL is a whole number, the percentile is the average of the numbers in positions LL and L+1L+1.
  5. If LL is not a whole number, round it up to the next higher whole number, and the percentile is the number in this position.
Example
  • Rainfall data in Los Angeles (45 values, already sorted); find the 60th percentile.
    • L=60100×45=27L = \frac{60}{100} \times 45 = 27.
    • 60th percentile = 3.58+3.712=3.645\frac{3.58 + 3.71}{2} = 3.645.

Finding Percentile for a Given Value

  1. Arrange data in increasing order.
  2. Let xx be the value whose percentile is to be computed.
  3. Percentile =100×number of values less than x+0.5number of data values= 100 \times \frac{\text{number of values less than } x + 0.5}{\text{number of data values}}.
  4. Round the result to the nearest whole number.
Example
  • In 1989, rainfall was 1.9 inches; what percentile does this correspond to?
    • 17 values are less than 1.9.
    • Percentile =100×17+0.545=38.939= 100 \times \frac{17 + 0.5}{45} = 38.9 \approx 39. The value 1.9 corresponds to the 39th percentile.