Describing Distributions

PART OF 3D DCMP ASSIGNMENT: DESCRIBING DISTRIBUTIONS

The features used to describe the distribution of a quantitative variable are the shape, center, spread, and presence of outliers.

  • Shape: The overall pattern (left skewed, right skewed, symmetric) and the number of peaks (unimodal, bimodal, multimodal, uniform).

  • Center: A measure that describes where the middle of the distribution is. The center is a number that describes a typical value. For example, one way to think about center is that it could be the point in the distribution where about half of the observations are below it and half are above it.

  • Spread: A measure of how far apart the data are. In this lesson, the range is used to measure spread. The range is the difference between the maximum value and minimum value.

  • Outliers: Unusual observations that are outside the general pattern of the distribution.

    The description of shape includes two parts: (1) the overall pattern (left skewed, right skewed, symmetric) and (2) the number of peaks (unimodal, bimodal, multimodal, uniform).

    The overall pattern can be described as one of the following:

  • Symmetric: The left and right sides of the distribution (closely) mirror each other. If you drew a vertical line down the center of the distribution and folded the distribution in half, the left and right sides would closely match one another.

  • Left skewed: The distribution has a longer tail to the left.

  • Right skewed: The distribution has a longer tail to the right.

    In addition to the overall pattern, the description of shape also includes the number of peaks. This is also known as the modality. The modality can be described as one of the following:

  • Unimodal: There is one prominent peak.

  • Bimodal: There are two prominent peaks.

  • Multimodal: There are three or more prominent peaks.

  • Uniform: There are no prominent peaks.

The next feature is the center. For now, we can use the histogram to get an approximate value of the center. (In a later activity, you will learn statistics used to describe the center more precisely.)

When describing the spread of a distribution that is left skewed, right skewed, or has outliers, it can be misleading to only rely on the range to measure spread, since it is influenced by skewness and outliers. In this case, the range may make the spread appear to be larger than it is for a vast majority of the data.

If this is the case, in addition to reporting the range, you can include additional information about the spread of most of the data as well. This will give the reader a more accurate and complete picture of the true spread of the data. For example, in addition to reporting the range for the distribution of cls_perc_eval, we can also include information that most of the data are between about 50% and 100%, or within 50%. (In later activities, you will learn additional statistics to describe the typical spread of the data.)

The last feature in the description is the presence of outliers. Outliers are observations in the data that are unusual and outside the general pattern of the rest of the observations in the distribution. When working with a univariate distribution for a quantitative variable, an outlier is an observation that has an unusually high or unusually low value. It is good practice to make note of outliers, as these observations can sometimes influence the statistical results (e.g., the range).

robot