Week2

Box Plot Overview

  • A box plot (box and whisker plot) is a type of chart used in descriptive statistics.

  • It displays the distribution of numerical data visually, including skewness and quartiles.

  • Box plots summarize data through the five-number summary: minimum score, lower quartile, median, upper quartile, and maximum score.

Key Definitions

Minimum Score

  • The lowest score, excluding outliers (indicated at the left whisker).

Lower Quartile

  • Represents the 25th percentile; 25% of scores are below this value.

Median

  • The mid-point of the data, dividing the box into two parts.

  • Half the scores are greater than or equal to this value.

Upper Quartile

  • Represents the 75th percentile; 75% of values are below this score.

Maximum Score

  • The highest score, excluding outliers (indicated at the right whisker).

Whiskers

  • Extend from the quartiles to show scores outside the middle 50% (lower 25% and upper 25%).

Interquartile Range (IQR)

  • Displays the middle 50% of scores (range between the 25th and 75th percentiles).

Importance of Box Plots

Average Score

  • Shows the median, indicating the average value of the dataset.

Skewness

  • Box plot shape indicates distribution.

    • Symmetric: Median in the middle, equal whisker lengths.

    • Positively skewed: Median closer to lower quartile, shorter lower whisker.

    • Negatively skewed: Median closer to upper quartile, shorter upper whisker.

Dispersion

  • Indicates the spread of data: smallest to largest values at whiskers’ ends.

  • IQR calculated as Q3 - Q1.

Outliers

  • Observations outside the whiskers, indicating extreme values.

  • Often defined as data outside 1.5 * IQR above Q3 or below Q1.

Comparing Box Plots

Step 1: Compare Medians

  • Check median positions to identify potential differences between groups.

Step 2: Compare IQRs and Whiskers

  • Assess box lengths for data dispersion; longer boxes indicate more spread.

Step 3: Identify Outliers

  • Outliers are points outside the whiskers.

Step 4: Analyze Skewness

  • Determine if each sample exhibits similar asymmetry.

Conclusion

  • Box plots visually summarize data, facilitating the identification of mean values, data dispersion, and skewness.