Skewness and Distribution Shapes

Skewness and Distribution Shapes

  • Skewness describes the asymmetry of a distribution. If the data pile up on one side and the tail stretches longer on the other side, the distribution is skewed.
  • The transcript emphasizes left-tailed and negatively skewed distributions: "Go to your peak long tail to the left. Right. Left tailed. Negatively skewed." This means the long tail is on the left side.
  • It’s also noted that there will be examples where distributions are negatively or positively skewed; skewness is a property of the data, not a defect.

Left-tailed vs Right-tailed distributions

  • Left-tailed distribution = negatively skewed distribution where the tail extends further to the left (lower values).
  • Right-tailed distribution = positively skewed distribution where the tail extends further to the right (higher values).
  • In left-skewed data, most observations are concentrated on the right (higher values) with a tail toward smaller values.
  • In right-skewed data, most observations are concentrated on the left (lower values) with a tail toward larger values.

Negatively skewed (left-skewed) distributions

  • Characteristics:
    • Tail is longer on the left side; bulk of the data is toward the right.
    • Common intuitive statement: mean is pulled toward the tail; thus the mean is typically less than the median.
  • Typical relationship among measures of central tendency:
    \text{Mode} > \text{Median} > \text{Mean}
  • Significance:
    • The mean may not be the best representative of the center when data are highly skewed.
    • Median often provides a better sense of a "typical" value in skewed distributions.
  • Real-world intuition: skewness can arise naturally in data where a floor bound exists (e.g., nonnegative values) and there are a few very small values pulling the tail left.

When skewness occurs: examples and interpretation

  • The transcript notes that skewness appears in various datasets; some contexts favor negative skewness, others positive.
  • Examples (typical in practice):
    • Right-skewed (positive skew): income distributions, waiting times, time-to-failure data, stock returns in certain regimes.
    • Left-skewed (negative skew): very easy tests where most people score high with a few low performers, certain completion time scenarios where most finish quickly but a few take much longer.
  • Practical interpretation:
    • Skewness informs which summary statistics to report (mean vs median) and which statistical methods are appropriate.
    • Skewness affects the suitability of methods that assume normality.

Relationship to mean, median, and mode

  • In skewed distributions, the order of mean, median, and mode shifts depending on the direction of skewness:
    • Left-skewed (negatively skewed):
      \text{Mode} > \text{Median} > \text{Mean}.
    • Right-skewed (positively skewed):
      \text{Mean} > \text{Median} > \text{Mode}.
  • Implications:
    • The mean is sensitive to outliers and the tail; it may not represent the "typical" observation in skewed data.
    • The median is more robust to extreme values and often better represents central tendency in skewed distributions.

Practical implications for data analysis

  • Normality assumptions: Many parametric tests assume approximately normal distributions; skewness can violate these assumptions.
  • Methods to address skewness:
    • Data transformation to reduce skewness (e.g., logarithmic, square root, Box-Cox transformations).
    • Use non-parametric tests when data remain skewed or when transforming is inappropriate.
    • Report both mean and median when skewness is present to provide a fuller picture.
  • Contextual decision-making:
    • Whether skewness is "bad" depends on the data and the analysis goal; skewness is a natural feature of many real-world datasets.

Takeaways and connections

  • Skewness indicates asymmetry; the long tail direction determines whether a distribution is negatively (left) or positively (right) skewed.
  • There is nothing inherently wrong with skewness; it depends on the data and the questions being asked.
  • Always consider which measure of central tendency and which statistical methods are appropriate given the skewness observed in the data.
  • Recognize that skewness informs practical decisions: transformation choices, reporting practices, and the robustness of conclusions.