Symmetry and Skewness Notes

Introduction to Symmetry and Skewness

Symmetry and skewness describe the shape of a distribution. We have previously used histograms, stem plots, and box plots to display a distribution, and these tools help us talk about symmetry and skewness. A distribution is said to be symmetrical if it can be divided into two equal sizes of the same shape. In contrast, a histogram that is not symmetrical is classified as skewed. Skewness refers to asymmetry and can occur in two directions: left skew (a long tail toward the left) or right skew (a long tail toward the right). The direction of skewness is read from the direction in which the data points cluster and tail which extends away from the center.

A distribution is said to be skewed to the left if it has a long tail that trails toward the left. Conversely, a distribution is said to be skewed to the right if it has a long tail that trails toward the right side. This same idea applies to stem plots: a stem plot can be skewed to the right, for example. A good way to determine the skewness of a stem plot is by flipping it onto its side. When you view the stem plot this way, you must position the stems like a regular number line, where the lower numbers start on the left and increase toward the right. If there is a long tail toward the right, the distribution is skewed to the right.

With boxplots, skewness can be influenced by outliers, and this can affect our interpretation of skewness. For example, a regular (unmodified) boxplot might lead you to think a distribution is skewed to the left, but when you consider a modified boxplot, it might reveal that the dataset is actually skewed to the right. Because of this, there is a strategy for determining skew in boxplots.

Identifying Skewness in Histograms, Stem Plots, and Boxplots

In a histogram, remember that the bars correspond to frequencies. If there are more data values to the right of a central point than to the left, the distribution exhibits right skew; if there are more values to the left, it shows left skew. For example, if we look at a particular center value such as 12 and observe that to the right of 12 there are more data values than to the left, we would infer left or right skew based on how this imbalance is arranged around the center. In the example discussed, the count to the right of 12 exceeds the count to the left, which contributes to a skewed interpretation.

A stem plot’s skewness is determined similarly by the distribution of data along the stems. If the stem plot has a tail toward the right when laid out normally, it is skewed to the right; if the tail is toward the left, it is skewed to the left. The visualization technique of flipping the stem plot onto its side helps make the direction of the tail clear, and the stems should be positioned as a regular number line from left to right after flipping.

For boxplots, the skew direction can depend on the relative sizes of the quartile boxes and the whiskers. If the two quartile boxes are unequal in size, the side corresponding to the larger box indicates the skew direction; for example, if the left side of the box is larger, the distribution is skewed to the left. If the boxes are equal in size, you then look at the whiskers: the longer whisker determines the skew. If the whiskers are of equal length, the distribution is symmetrical. It’s important to note that the presence of outliers can affect interpretation, so one must consider both the regular boxplot and any modified versions when assessing skewness.

Center and Measures of Center: Mean and Median

When we consider symmetry and skewness, the relationship to the median and mean becomes clear. In a symmetrical distribution, the plane of symmetry will align with the median because the median is the middle data point. Since the mean is the balance point of a distribution, in a symmetric distribution you should also find that the mean equals the median. In the example discussed, both the mean and the median are equal to 12.

If the distribution is skewed, the median will not be exactly at the central value 12. Looking at the histogram, to the right of 12 we see more data values than to the left of 12. By some calculations (as described in the video), the median lies in the interval between 16 and 18, i.e. 16 < m < 18. Since the mean is the balance point, skewness will shift it toward the tail. Thus, for a left-skewed distribution, the mean is closer to the left tail and the median is closer to the right side of the distribution, meaning the mean is less than the median: ext{mean} < ext{median}. Conversely, for a right-skewed distribution, the mean is greater than the median, meaning the mean is drawn toward the right tail and the median toward the left: ext{mean} > ext{median}. These relationships align with the intuitive idea that skewness pulls the mean toward the side with the longer tail while the median remains at the central position relative to the bulk of the data.

Worked Example: How Skewness Affects the Mean and Median

In the symmetric case, the center of symmetry is at the median, and the mean equals the median. In a skewed distribution, the center shifts. The video illustrates a scenario where the central point (12) does not coincide with the actual median due to an unequal distribution of values around the center. The median falls between 16 and 18 (i.e., $16 < m < 18$), while the mean, as the balance point, is pulled toward the tail. If the distribution is skewed to the left, the mean will be less than the median, with the mean closer to the left side of the distribution and the median closer to the right side. If the distribution is skewed to the right, the mean will be greater than the median, with the mean closer to the right side and the median closer to the left side.

Practical Takeaways for Analyzing Skewness

  • Symmetry implies that the distribution can be divided into two halves that mirror each other, with the mean and median both at the center.

  • Skewness is about asymmetry and tail direction: left-skew means a longer left tail; right-skew means a longer right tail. The direction is interpreted from where the data clusters and how the tail extends.

  • Histograms, stem plots, and boxplots each provide a lens for detecting skewness, but outliers can complicate boxplot interpretation, so consider both regular and modified boxplots.

  • For symmetric distributions, mean and median coincide; for skewed distributions, the mean shifts toward the tail, and the median lies nearer to the bulk of the data.

Overall, the key ideas are: symmetry yields mean = median and a balanced shape around the center; skewness manifests as a longer tail on one side, with the mean moving toward that tail and the median shifting away from it. These relationships guide how we summarize and interpret data using these common plotting tools and numerical measures.