1.3 Describing Quantitative Data With Numbers
FINDING THE CENTER — MEDIAN
the median is the middle school
put data in order and calculate (n+1)/2
represents the 50th percentile of the data, meaning that 50% of all measurements are less than or equal to the median, and 50% are greater than or equal to the median
FINDING THE SPREAD — IQR
the simplest way of describing the spread of a distribution is by using the range
the range is the difference between the maximum and the minimum value
the range is a single value, not an interval
described as interquartile range (IQR)
the range of the middle 50% of the data
THE FIVE-NUMBER SUMMARY
the five-number summary of a data set is the minimum, Q1, median, Q3, maximum
a boxplot is a graphical display that uses the five number summary to picture a distribution
length of the box is the interquartile range
length of the entire plot is the range
if a distribution has outliers, they are usually marked separately
boxplots that mark the outliers separately are called modified boxplots
FINDING THE CENTER AND SPREAD OF SYMMETRIC DISTRIBUTIONS
the median is one way of describing the center of a distribution, especially when the distribution is skewed; method for describing the center of symmetric distribution is with the mean
the population mean is the average of all values in the entire population
the sample mean is the average of all the values in a sample from a population
the best method of measuring spread of a symmetric distribution is the standard deviation
can be thought as the average distance that each observation lies away from the mean
the symbol “s” is used to represent the standard deviation of a distribution
variance of a distribution is s²
GENERAL RULES — MEASURE OF CENTER AND SPREAD
with fairly symmetric distributions, the mean and median will be close together
the mean will be closer to the tail of the distribution than the median
standard deviation should be used when the mean is used for the measure of teh center
not affected by outliers, referred to as resistant measures of center and spread
mean and standard deviation are strongly influenced by outliers, and are referred to as nonresistant