Graphs for Categorical Data – Pie Charts & Bar Charts

Continuing from earlier modules where measures of central tendency (mean), variability (range, standard deviation) and histograms (Module 1) were introduced.
Current focus: Pictorial/graphical summaries of behaviour—especially for categorical data.
Graph types covered in this mini-lecture:
• Pie charts
• Bar charts
(Mentioned but not detailed here: histograms, box-and-whisker plots for continuous data.)

Purpose: Visualise categorical data by partitioning a circle into slices.
Mechanism:
• Area of each slice ∝ category frequency or percentage.
• Useful legend or colour coding needed.
Example 1 — Favourite Season Survey
• Sample size $N = 150$ (fictitious).
• Categories: spring, summer, winter, autumn.
• Observation: largest slice = autumn ⇒ majority preference.
Advantages
• Intuitive, quick at-a-glance comparison of parts to whole.
• Good when categories ≤ ≈ 10 and each occupies a sizeable portion.
Pitfalls & Limitations
• "Too many slices" problem – visual clutter when categories exceed ≈10.
• Hard to discriminate tiny percentages.
• Labels often overlap; legends become unreadable.
Example 2 — Causes of Death in Shakespeare’s Plays (many categories)
• Stabbing > 50 % readily visible.
• Remainder categories indistinguishable.
Example 3 — US 2007 Federal Budget Allocation (Wikipedia)
• Large categories readable; small-budget items effectively invisible.
• Demonstrates difficulty with small percentages.

Also visualise categorical data; employ bars whose height encodes a number.
Horizontal (x-) axis = categories; Vertical (y-) axis = numerical value.
Flexibility of y-axis:
1. Raw counts (frequencies).
2. Summary statistics (mean, median, proportion, etc.).

Hypothetical study: 20 athletes per sport.
Categories: bowling, athletics, swimming, hockey, football.
y-axis = mean 100 m race time (seconds).
• Bowling ≈ 25 s, Athletics ≈ 10 s, etc.
Demonstrates summary-statistic usage rather than counts.

Bar charts can incorporate variability indicators:
• e.g., ±1 SD error bars.
• Graph conveys both central tendency (mean) and dispersion.

Extension: Compare categories across a second variable (country, condition, gender…).
Example 3 — Athlete Type × Country (A vs B)
• Each athlete type has two bars (country A & country B).
• Interpretation paths:
– Main effects: bowling slower than hockey in both countries.
– Interaction:
• Bowling: country A slower than B.
• Swimming: country A faster than B.
→ Effect of athlete type depends on country.
Bar charts are especially intuitive for spotting such interactions.

Category Overload
• Re-plotting Shakespeare deaths as bars results in tall “stabbing” bar beside numerous near-zero bars—pattern still unreadable.
Axis Manipulation / Misleading Scales
• Ethical & interpretive risk: truncating the y-axis exaggerates perceived differences.
• Case Study — Fox News Infographic
– Data: welfare recipients $\approx 108{,}000{,}000$ vs full-time employees $\approx 101{,}000{,}000$ .
– Original graph y-axis starts at $100{,}000{,}000$ → makes gap look huge.
– Re-plot starting at $0$ shows modest difference.
• Lesson: Always inspect/label axes; zero baseline is recommended for bar charts unless strong justification.

Graph choice matters for clarity, transparency, and integrity.
Over-complicated pie/bar charts hinder comprehension and invite misinterpretation.
Axis manipulation can intentionally or inadvertently bias viewers—ethical responsibility to present data honestly.
When data have many categories or wide dynamic ranges, consider alternatives (e.g., dot plots, treemaps, lollipop charts, log scales, or collapsing categories).

Use pie charts when:
• ≤ 10 categories, fairly even or large slice sizes, need part-to-whole insight.
Avoid pie charts when: many tiny segments or need precise comparison.
Use bar charts when:
• Need to compare magnitudes across categories, display summary statistics, add error bars, or visualise interactions between variables.
Watch-outs:
• Too many categories → clutter.
• Truncated or inconsistent axes → misleading impressions.
• Provide clear labels, scales, legends.
Both graphs supplement numerical summaries (mean, SD, etc.) and can powerfully communicate behavioural data when designed correctly.