Displaying Data

Why Visualise Data?

• Generates insight that is often hidden in raw tables or spreadsheets.
• Uncovers underlying structure (e.g., clustering, skewness, multimodality).
• Pin-points important variables worth modelling further.
• Detects outliers or data-entry errors early.
• Helps researchers, practitioners, and lay readers make sense of patterns quickly.
• Bridges technical analysis and decision-making (policy, policing, social change).

Frequency Distributions

A frequency distribution table summarises how often each category occurs.

Parts of a Frequency Table

• Absolute Frequency (Count): raw tally in each category.
• Relative Frequency: proportion or percentage of the total.
• Cumulative Frequency: running total of the absolute frequencies.
• Cumulative Relative Frequency: running total of the percentages.

When to Use

• Data are categorical (nominal or ordinal) or discrete numerical.
• Goal: compare categories, identify dominant groups, or prepare for further visualisation.

Worked Example – Offence Type (600 + rows of data)

• Dataset: $611$ criminal incident records between $24/2/22$ – $23/3/22$ .
• Research question: “What is the distribution of the Offence Type variable?”

Absolute Frequencies (Pivot Table in Excel)

• Assault – $42$
• Drug Offences – $117$
• Fraud – $7$
• Good Order Offences – $168$
• Handling Stolen Goods – $6$
• Liquor (excl. Drunkenness) – $4$
• Miscellaneous Offences – $2$
• … (total $16$ categories, grand total $611$ ).

Relative Frequencies (same order)

• Assault – $6.87\%$
• Drug Offences – $19.15\%$
• Fraud – $1.15\%$
• Good Order Offences – $27.50\%$ ← most common
• Handling Stolen Goods – $0.98\%$
• Liquor (excl. Drunkenness) – $0.65\%$
• … (sums to $100\%$ ).

Cumulative Frequencies & Percentages (key checkpoints)

To figure out the cumulative frequency of each class, you simply add its frequency to the frequency of the previous class.

• By the time we include Good Order Offences we’ve reached $334$ cases (≈ $54.66\%$ of total).
• Including Other Theft raises the cumulative relative frequency to $85.76\%$ .
• Final running totals match the grand total $611$ and $100\%$ .

Significance

• Highlights a “long-tail” distribution: a few categories dominate, many are rare.
• Guides resource allocation (e.g., policing priorities) or merges rare categories for modelling.

Bar Charts

• Visual representation for categorical/discrete data.
• Each bar’s height = frequency (absolute or relative).
• Bars separated by gaps to signify categorical, non-continuous nature.

Example – Crime Counts by Offence Type

• Raw Excel plot first appeared unordered, cluttered.
• Improved version:
– Categories re-sorted from most to least frequent.
– Title contextualised: “Crime Counts by Offence Type between $24/2/22$ – $23/3/22$ ”.
– X-axis labels angled/abridged for readability.
• Presentation tips: order logically, keep axis units consistent, avoid 3-D effects.

Histograms

• Designed for continuous or large-range discrete variables.
• X-axis divided into contiguous bins; no gaps (data are numeric and continuous along the axis).
• Each rectangle’s area represents frequency or density.
• Reveals modality (one peak vs multiple), skew, and outliers.

Example – Respondent Age (World Values Survey 2018)

• $N = 1{,}795$ ages; observed min $17$ , max $98$ (potential $81$ distinct values).
• Instead of $81$ thin bars, group into broader bins (e.g., $[0,20], [21,30], …$ ).
• Final histogram quickly shows:
– Mode cluster in $31$ – $50$ range.
– Tapering tail in $71$ + ages.
– Potential data-entry errors if extreme ages (e.g., >100) pop up.

Research Example – Violent Crime Counts per Neighbourhood

• Histogram across Brisbane suburbs (Figure 4.9).
• Highlights positively skewed distribution: many suburbs with low counts, few with very high counts.
• Assists analysts in targeting outlier suburbs for situational crime prevention.

Line Charts (Time-Series)

• Best for showing change over ordered time periods.
• X-axis = time (years, months, weeks); Y-axis = value of interest.
• Multiple lines may compare groups (male vs female), variables, or geographies.

Example – Queensland Imprisonment Rate $2011$ – $2021$

• Data excerpt (rates per $100{,}000$ population):
– $2011$ : Male $302.1$ , Female $24.0$ .
– $2014$ : Male $353.5$ , Female $36.2$ .
– $2021$ : Male $460.7$ , Female $45.2$ .
• Observations:
– Overall upward trend for both genders; steeper for males.
– Temporary dip in $2012$ (male $293.7$ ) before sustained growth.
– Public policy relevance: prison crowding, gender-specific interventions.

Practical / Ethical / Philosophical Considerations

• Clarity vs deception: mis-scaled axes or truncated zeros can mislead.
• Accessibility: colour palettes should be colour-blind friendly.
• Privacy: granular maps or line charts can inadvertently re-identify individuals.
• Equity: focusing solely on dominant categories might obscure minority experiences (e.g., low-frequency offences that disproportionately affect vulnerable groups).