Skewness and Distribution Shapes
Skewness and Distribution Shapes
- Skewness describes the asymmetry of a distribution. If the data pile up on one side and the tail stretches longer on the other side, the distribution is skewed.
- The transcript emphasizes left-tailed and negatively skewed distributions: "Go to your peak long tail to the left. Right. Left tailed. Negatively skewed." This means the long tail is on the left side.
- It’s also noted that there will be examples where distributions are negatively or positively skewed; skewness is a property of the data, not a defect.
Left-tailed vs Right-tailed distributions
- Left-tailed distribution = negatively skewed distribution where the tail extends further to the left (lower values).
- Right-tailed distribution = positively skewed distribution where the tail extends further to the right (higher values).
- In left-skewed data, most observations are concentrated on the right (higher values) with a tail toward smaller values.
- In right-skewed data, most observations are concentrated on the left (lower values) with a tail toward larger values.
Negatively skewed (left-skewed) distributions
- Characteristics:
- Tail is longer on the left side; bulk of the data is toward the right.
- Common intuitive statement: mean is pulled toward the tail; thus the mean is typically less than the median.
- Typical relationship among measures of central tendency:
\text{Mode} > \text{Median} > \text{Mean} - Significance:
- The mean may not be the best representative of the center when data are highly skewed.
- Median often provides a better sense of a "typical" value in skewed distributions.
- Real-world intuition: skewness can arise naturally in data where a floor bound exists (e.g., nonnegative values) and there are a few very small values pulling the tail left.
When skewness occurs: examples and interpretation
- The transcript notes that skewness appears in various datasets; some contexts favor negative skewness, others positive.
- Examples (typical in practice):
- Right-skewed (positive skew): income distributions, waiting times, time-to-failure data, stock returns in certain regimes.
- Left-skewed (negative skew): very easy tests where most people score high with a few low performers, certain completion time scenarios where most finish quickly but a few take much longer.
- Practical interpretation:
- Skewness informs which summary statistics to report (mean vs median) and which statistical methods are appropriate.
- Skewness affects the suitability of methods that assume normality.
- In skewed distributions, the order of mean, median, and mode shifts depending on the direction of skewness:
- Left-skewed (negatively skewed):
\text{Mode} > \text{Median} > \text{Mean}. - Right-skewed (positively skewed):
\text{Mean} > \text{Median} > \text{Mode}.
- Implications:
- The mean is sensitive to outliers and the tail; it may not represent the "typical" observation in skewed data.
- The median is more robust to extreme values and often better represents central tendency in skewed distributions.
Practical implications for data analysis
- Normality assumptions: Many parametric tests assume approximately normal distributions; skewness can violate these assumptions.
- Methods to address skewness:
- Data transformation to reduce skewness (e.g., logarithmic, square root, Box-Cox transformations).
- Use non-parametric tests when data remain skewed or when transforming is inappropriate.
- Report both mean and median when skewness is present to provide a fuller picture.
- Contextual decision-making:
- Whether skewness is "bad" depends on the data and the analysis goal; skewness is a natural feature of many real-world datasets.
Takeaways and connections
- Skewness indicates asymmetry; the long tail direction determines whether a distribution is negatively (left) or positively (right) skewed.
- There is nothing inherently wrong with skewness; it depends on the data and the questions being asked.
- Always consider which measure of central tendency and which statistical methods are appropriate given the skewness observed in the data.
- Recognize that skewness informs practical decisions: transformation choices, reporting practices, and the robustness of conclusions.