Graphs for Categorical Data – Pie Charts & Bar Charts

Module Context

  • Continuing from earlier modules where measures of central tendency (mean), variability (range, standard deviation) and histograms (Module 1) were introduced.

  • Current focus: Pictorial/graphical summaries of behaviour—especially for categorical data.

  • Graph types covered in this mini-lecture:
    • Pie charts
    • Bar charts
    (Mentioned but not detailed here: histograms, box-and-whisker plots for continuous data.)

Pie Charts

  • Purpose: Visualise categorical data by partitioning a circle into slices.

  • Mechanism:
    • Area of each slice ∝ category frequency or percentage.
    • Useful legend or colour coding needed.

  • Example 1 — Favourite Season Survey
    • Sample size N=150N = 150 (fictitious).
    • Categories: spring, summer, winter, autumn.
    • Observation: largest slice = autumn ⇒ majority preference.

  • Advantages
    • Intuitive, quick at-a-glance comparison of parts to whole.
    • Good when categories ≤ ≈ 10 and each occupies a sizeable portion.

  • Pitfalls & Limitations
    • "Too many slices" problem – visual clutter when categories exceed ≈10.
    • Hard to discriminate tiny percentages.
    • Labels often overlap; legends become unreadable.

  • Example 2 — Causes of Death in Shakespeare’s Plays (many categories)
    • Stabbing > 50 % readily visible.
    • Remainder categories indistinguishable.

  • Example 3 — US 2007 Federal Budget Allocation (Wikipedia)
    • Large categories readable; small-budget items effectively invisible.
    • Demonstrates difficulty with small percentages.

Bar Charts

  • Also visualise categorical data; employ bars whose height encodes a number.

  • Horizontal (x-) axis = categories; Vertical (y-) axis = numerical value.

  • Flexibility of y-axis:

    1. Raw counts (frequencies).

    2. Summary statistics (mean, median, proportion, etc.).

Bar-Chart Example 1 — Favourite Season (Counts)
  • Same N=150N = 150 dataset shown as bars.

  • Height of each bar = count of votes.

  • Simple comparison: autumn > summer > winter > spring.

Bar-Chart Example 2 — Athlete Sprint Times (Means)
  • Hypothetical study: 20 athletes per sport.

  • Categories: bowling, athletics, swimming, hockey, football.

  • y-axis = mean 100 m race time (seconds).
    • Bowling ≈ 25 s, Athletics ≈ 10 s, etc.

  • Demonstrates summary-statistic usage rather than counts.

Error Bars & Variability
  • Bar charts can incorporate variability indicators:
    • e.g., ±1 SD error bars.
    • Graph conveys both central tendency (mean) and dispersion.

Multiple Variables & Interactions (Grouped / Clustered Bars)
  • Extension: Compare categories across a second variable (country, condition, gender…).

  • Example 3 — Athlete Type × Country (A vs B)
    • Each athlete type has two bars (country A & country B).
    • Interpretation paths:
    – Main effects: bowling slower than hockey in both countries.
    – Interaction:
    • Bowling: country A slower than B.
    • Swimming: country A faster than B.
    → Effect of athlete type depends on country.

  • Bar charts are especially intuitive for spotting such interactions.

Bar-Chart Pitfalls
  1. Category Overload
    • Re-plotting Shakespeare deaths as bars results in tall “stabbing” bar beside numerous near-zero bars—pattern still unreadable.

  2. Axis Manipulation / Misleading Scales
    • Ethical & interpretive risk: truncating the y-axis exaggerates perceived differences.
    • Case Study — Fox News Infographic
    – Data: welfare recipients 108,000,000\approx 108{,}000{,}000 vs full-time employees 101,000,000\approx 101{,}000{,}000.
    – Original graph y-axis starts at 100,000,000100{,}000{,}000 → makes gap look huge.
    – Re-plot starting at 00 shows modest difference.
    • Lesson: Always inspect/label axes; zero baseline is recommended for bar charts unless strong justification.

Practical & Ethical Implications

  • Graph choice matters for clarity, transparency, and integrity.

  • Over-complicated pie/bar charts hinder comprehension and invite misinterpretation.

  • Axis manipulation can intentionally or inadvertently bias viewers—ethical responsibility to present data honestly.

  • When data have many categories or wide dynamic ranges, consider alternatives (e.g., dot plots, treemaps, lollipop charts, log scales, or collapsing categories).

Summary Cheat-Sheet

  • Use pie charts when:
    • ≤ 10 categories, fairly even or large slice sizes, need part-to-whole insight.

  • Avoid pie charts when: many tiny segments or need precise comparison.

  • Use bar charts when:
    • Need to compare magnitudes across categories, display summary statistics, add error bars, or visualise interactions between variables.

  • Watch-outs:
    • Too many categories → clutter.
    • Truncated or inconsistent axes → misleading impressions.
    • Provide clear labels, scales, legends.

  • Both graphs supplement numerical summaries (mean, SD, etc.) and can powerfully communicate behavioural data when designed correctly.