Displaying Data

Why Visualise Data?

• Generates insight that is often hidden in raw tables or spreadsheets.
• Uncovers underlying structure (e.g., clustering, skewness, multimodality).
• Pin-points important variables worth modelling further.
• Detects outliers or data-entry errors early.
• Helps researchers, practitioners, and lay readers make sense of patterns quickly.
• Bridges technical analysis and decision-making (policy, policing, social change).

Frequency Distributions

A frequency distribution table summarises how often each category occurs.

Parts of a Frequency Table

• Absolute Frequency (Count): raw tally in each category.
• Relative Frequency: proportion or percentage of the total.
• Cumulative Frequency: running total of the absolute frequencies.
• Cumulative Relative Frequency: running total of the percentages.

When to Use

• Data are categorical (nominal or ordinal) or discrete numerical.
• Goal: compare categories, identify dominant groups, or prepare for further visualisation.

Worked Example – Offence Type (600 + rows of data)

• Dataset: 611611 criminal incident records between 24/2/2224/2/2223/3/2223/3/22.
• Research question: “What is the distribution of the Offence Type variable?”

Absolute Frequencies (Pivot Table in Excel)

• Assault – 4242
• Drug Offences – 117117
• Fraud – 77
• Good Order Offences – 168168
• Handling Stolen Goods – 66
• Liquor (excl. Drunkenness) – 44
• Miscellaneous Offences – 22
• … (total 1616 categories, grand total 611611).

Relative Frequencies (same order)

• Assault – 6.87%6.87\%
• Drug Offences – 19.15%19.15\%
• Fraud – 1.15%1.15\%
• Good Order Offences – 27.50%27.50\% ← most common
• Handling Stolen Goods – 0.98%0.98\%
• Liquor (excl. Drunkenness) – 0.65%0.65\%
• … (sums to 100%100\%).

Cumulative Frequencies & Percentages (key checkpoints)

To figure out the cumulative frequency of each class, you simply add its frequency to the frequency of the previous class.

• By the time we include Good Order Offences we’ve reached 334334 cases (≈ 54.66%54.66\% of total).
• Including Other Theft raises the cumulative relative frequency to 85.76%85.76\%.
• Final running totals match the grand total 611611 and 100%100\%.

Significance

• Highlights a “long-tail” distribution: a few categories dominate, many are rare.
• Guides resource allocation (e.g., policing priorities) or merges rare categories for modelling.

Bar Charts

• Visual representation for categorical/discrete data.
• Each bar’s height = frequency (absolute or relative).
• Bars separated by gaps to signify categorical, non-continuous nature.

Example – Crime Counts by Offence Type

• Raw Excel plot first appeared unordered, cluttered.
• Improved version:
– Categories re-sorted from most to least frequent.
– Title contextualised: “Crime Counts by Offence Type between 24/2/2224/2/2223/3/2223/3/22”.
– X-axis labels angled/abridged for readability.
• Presentation tips: order logically, keep axis units consistent, avoid 3-D effects.

Histograms

• Designed for continuous or large-range discrete variables.
• X-axis divided into contiguous bins; no gaps (data are numeric and continuous along the axis).
• Each rectangle’s area represents frequency or density.
• Reveals modality (one peak vs multiple), skew, and outliers.

Example – Respondent Age (World Values Survey 2018)

N=1,795N = 1{,}795 ages; observed min 1717, max 9898 (potential 8181 distinct values).
• Instead of 8181 thin bars, group into broader bins (e.g., [0,20],[21,30],[0,20], [21,30], …).
• Final histogram quickly shows:
– Mode cluster in 31315050 range.
– Tapering tail in 7171+ ages.
– Potential data-entry errors if extreme ages (e.g., >100) pop up.

Research Example – Violent Crime Counts per Neighbourhood

• Histogram across Brisbane suburbs (Figure 4.9).
• Highlights positively skewed distribution: many suburbs with low counts, few with very high counts.
• Assists analysts in targeting outlier suburbs for situational crime prevention.

Line Charts (Time-Series)

• Best for showing change over ordered time periods.
• X-axis = time (years, months, weeks); Y-axis = value of interest.
• Multiple lines may compare groups (male vs female), variables, or geographies.

Example – Queensland Imprisonment Rate 2011201120212021

• Data excerpt (rates per 100,000100{,}000 population):
20112011: Male 302.1302.1, Female 24.024.0.
20142014: Male 353.5353.5, Female 36.236.2.
20212021: Male 460.7460.7, Female 45.245.2.
• Observations:
– Overall upward trend for both genders; steeper for males.
– Temporary dip in 20122012 (male 293.7293.7) before sustained growth.
– Public policy relevance: prison crowding, gender-specific interventions.

Practical / Ethical / Philosophical Considerations

• Clarity vs deception: mis-scaled axes or truncated zeros can mislead.
• Accessibility: colour palettes should be colour-blind friendly.
• Privacy: granular maps or line charts can inadvertently re-identify individuals.
• Equity: focusing solely on dominant categories might obscure minority experiences (e.g., low-frequency offences that disproportionately affect vulnerable groups).