Descriptive Statistics – Summarising Qualitative Data
Raw Data & Why We Summarise
Raw data = non-processed, just-collected observations
May appear as questionnaires, lists, tables, spreadsheets, etc.
Example given: table of 20 student responses showing province of origin alongside other variables.
Disadvantages of raw data
Contains “too much information” → cognitively heavy and time-consuming to read.
Lacks visual impact → the underlying “story” is hidden.
Scaling problem: 100 or 1 000 observations exacerbate both issues.
Practical implication: always convert raw qualitative data into concise tables or graphs before analysis or presentation.
Key Definitions (Revision)
Qualitative variable: records non-numeric information (e.g., province, gender, colour).
Descriptive statistics: methods for summarising, organising, presenting data.
Focus of the lecture: descriptive techniques for qualitative variables.
Frequency Table – The First Summarising Step
Structure (minimum two columns)
Categories (e.g., Eastern Cape, Free State, Gauteng, …).
Frequencies () = counts for each category.
Building procedure
List all observed categories (order can be alphabetical or by size).
Perform tallies (////) for each raw observation.
Add tallies → obtain for every row.
Check: (sample size).
• Example: students.
Why important?
Gives immediate insight into “how often” each outcome appears.
Underlies every subsequent graph (pie, bar, etc.).
Extra Informative Columns
Relative frequency ()
Formula:
• Properties: and .Percentage (%)
• Properties: .Angle size (°) – required for pie charts
• Properties: .
Flexibility: include only the columns that match the story you wish to convey.
• Minimalists: categories + f.
• Presenters: add % and angle for automatic charting.
Graphical Summaries for One Qualitative Variable
Pie Chart
Preparation: compute angle sizes as above.
Construction rules
Slice angle .
Use colour/legend to identify categories.
Label or annotate slices OR place legend beside chart.
Always give a descriptive title (e.g., “Pie Chart of Province of Origin”).
Interpretation: area of slice visualises proportion (immediate percentage feel).
Bar Chart (Simple / Unstacked)
Axes
$x$-axis = categories.
$y$-axis = frequency or relative frequency or percentage.
Drawing rules
Bars of equal width, do not touch (space distinguishes categories).
Scale $y$-axis to highest frequency (e.g., 0 → 7 for 7 entries).
Label both axes and supply graph title.
Pros: easy comparison of heights; quick spotting of most/least common categories.
Two Qualitative Variables – Contingency Table
Definition: frequency table that records joint occurrences of two (categorical) variables.
Example: Province × Gender (Male/Female).
Layout
Rows = categories of variable 1, columns = categories of variable 2 (or vice-versa).
Cell gives count .
Graphical Options for Two-Way Tables
Stacked Bar Chart
$x$-axis = primary categories (e.g., province).
Each bar’s height = overall frequency for that province.
Bar is subdivided (stacked) by second variable (gender) using different colours.
Legend identifies segments (e.g., red = Male, blue = Female).
Alternative orientation: swap axes (make gender primary, stack provinces).
Multiple (Clustered) Bar Chart
Bars for each sub-category drawn side-by-side within the same primary category.
Facilitates direct visual comparison between sub-categories at each category level.
Choice criteria
Stacked: highlights composition of totals.
Clustered: highlights direct comparison of sub-groups.
Best-Practice Checklist for Categorical Graphs
Verify , , \sum\text{%}=100, .
Use meaningful, non-abbreviated category labels or provide a clear legend.
Keep colours distinct & colour-blind-friendly; avoid misleading 3-D effects.
Scales must start at 0 for bar charts (avoids exaggeration of differences).
Titles should state what and for whom/when (e.g., “Bar Chart of Gender Distribution among First-Year Students, 2023”).
Mention sample size either in caption or footnote.
Real-World Relevance & Ethical Notes
Appropriate summaries prevent information overload for audiences (managers, policy makers, the public).
Proper labelling avoids misinterpretation; mis-labelled axes or missing totals can lead to unethical miscommunication of results.
In surveys with sensitive categories (e.g., gender identity, ethnicity), anonymised frequency tables protect respondent privacy.
Connections to Previous & Upcoming Lectures
Builds directly on earlier definitions of qualitative variable and descriptive statistics (Revision ✓).
Sets the foundation for upcoming lecture: Summarising quantitative data (histograms, stem-and-leaf, etc.).
Practice Assignment 1 should now be complete; expect Assignment 2 after the next session → keep pace to avoid falling behind.
Quick Formula Recap (All in One Place)
End of qualitative-data summarising techniques — be ready to apply these to homework and projects.