1.3 Describing Quantitative Data With Numbers

FINDING THE CENTER — MEDIAN

  • the median is the middle school

    • put data in order and calculate (n+1)/2

  • represents the 50th percentile of the data, meaning that 50% of all measurements are less than or equal to the median, and 50% are greater than or equal to the median

FINDING THE SPREAD — IQR

  • the simplest way of describing the spread of a distribution is by using the range

    • the range is the difference between the maximum and the minimum value

    • the range is a single value, not an interval

  • described as interquartile range (IQR)

    • the range of the middle 50% of the data

THE FIVE-NUMBER SUMMARY

  • the five-number summary of a data set is the minimum, Q1, median, Q3, maximum

  • a boxplot is a graphical display that uses the five number summary to picture a distribution

    • length of the box is the interquartile range

    • length of the entire plot is the range

  • if a distribution has outliers, they are usually marked separately

  • boxplots that mark the outliers separately are called modified boxplots

FINDING THE CENTER AND SPREAD OF SYMMETRIC DISTRIBUTIONS

  • the median is one way of describing the center of a distribution, especially when the distribution is skewed; method for describing the center of symmetric distribution is with the mean

  • the population mean is the average of all values in the entire population

  • the sample mean is the average of all the values in a sample from a population

  • the best method of measuring spread of a symmetric distribution is the standard deviation

    • can be thought as the average distance that each observation lies away from the mean

      • the symbol “s” is used to represent the standard deviation of a distribution

  • variance of a distribution is s²

GENERAL RULES — MEASURE OF CENTER AND SPREAD

  • with fairly symmetric distributions, the mean and median will be close together

  • the mean will be closer to the tail of the distribution than the median

  • standard deviation should be used when the mean is used for the measure of teh center

  • not affected by outliers, referred to as resistant measures of center and spread

  • mean and standard deviation are strongly influenced by outliers, and are referred to as nonresistant