Descriptive Statistics – Summarising Qualitative Data

Raw Data & Why We Summarise

  • Raw data = non-processed, just-collected observations

    • May appear as questionnaires, lists, tables, spreadsheets, etc.

    • Example given: table of 20 student responses showing province of origin alongside other variables.

  • Disadvantages of raw data

    • Contains “too much information” → cognitively heavy and time-consuming to read.

    • Lacks visual impact → the underlying “story” is hidden.

    • Scaling problem: 100 or 1 000 observations exacerbate both issues.

  • Practical implication: always convert raw qualitative data into concise tables or graphs before analysis or presentation.

Key Definitions (Revision)

  • Qualitative variable: records non-numeric information (e.g., province, gender, colour).

  • Descriptive statistics: methods for summarising, organising, presenting data.
    Focus of the lecture: descriptive techniques for qualitative variables.

Frequency Table – The First Summarising Step

  • Structure (minimum two columns)

    1. Categories (e.g., Eastern Cape, Free State, Gauteng, …).

    2. Frequencies (f) = counts for each category.

  • Building procedure

    1. List all observed categories (order can be alphabetical or by size).

    2. Perform tallies (////) for each raw observation.

    3. Add tallies → obtain f for every row.

    4. Check: \sum f = n (sample size).
      • Example: 7+4+\dots = 20 students.

  • Why important?

    • Gives immediate insight into “how often” each outcome appears.

    • Underlies every subsequent graph (pie, bar, etc.).

Extra Informative Columns
  1. Relative frequency (rf)
    Formula: rf=\frac{f}{n}
    • Properties: 0\le rf\le1 and \sum rf =1.

  2. Percentage (%)
    \text{Percentage}=rf\times100
    • Properties: \sum\text{Percentages}=100.

  3. Angle size (°) – required for pie charts
    \text{Angle}=rf\times360
    • Properties: \sum\text{Angles}=360.

  • Flexibility: include only the columns that match the story you wish to convey.
    • Minimalists: categories + f.
    • Presenters: add % and angle for automatic charting.

Graphical Summaries for One Qualitative Variable

Pie Chart
  • Preparation: compute angle sizes as above.

  • Construction rules

    • Slice angle \propto rf.

    • Use colour/legend to identify categories.

    • Label or annotate slices OR place legend beside chart.

    • Always give a descriptive title (e.g., “Pie Chart of Province of Origin”).

  • Interpretation: area of slice visualises proportion (immediate percentage feel).

Bar Chart (Simple / Unstacked)
  • Axes

    • $x$-axis = categories.

    • $y$-axis = frequency or relative frequency or percentage.

  • Drawing rules

    • Bars of equal width, do not touch (space distinguishes categories).

    • Scale $y$-axis to highest frequency (e.g., 0 → 7 for 7 entries).

    • Label both axes and supply graph title.

  • Pros: easy comparison of heights; quick spotting of most/least common categories.

Two Qualitative Variables – Contingency Table

  • Definition: frequency table that records joint occurrences of two (categorical) variables.

    • Example: Province × Gender (Male/Female).

  • Layout

    • Rows = categories of variable 1, columns = categories of variable 2 (or vice-versa).

    • Cell gives count f_{ij}.

Graphical Options for Two-Way Tables
  1. Stacked Bar Chart

    • $x$-axis = primary categories (e.g., province).

    • Each bar’s height = overall frequency for that province.

    • Bar is subdivided (stacked) by second variable (gender) using different colours.

    • Legend identifies segments (e.g., red = Male, blue = Female).

    • Alternative orientation: swap axes (make gender primary, stack provinces).

  2. Multiple (Clustered) Bar Chart

    • Bars for each sub-category drawn side-by-side within the same primary category.

    • Facilitates direct visual comparison between sub-categories at each category level.

  • Choice criteria

    • Stacked: highlights composition of totals.

    • Clustered: highlights direct comparison of sub-groups.

Best-Practice Checklist for Categorical Graphs

  • Verify \sum f = n, \sum rf =1, \sum\text{%}=100, \sum\text{Angle}=360.

  • Use meaningful, non-abbreviated category labels or provide a clear legend.

  • Keep colours distinct & colour-blind-friendly; avoid misleading 3-D effects.

  • Scales must start at 0 for bar charts (avoids exaggeration of differences).

  • Titles should state what and for whom/when (e.g., “Bar Chart of Gender Distribution among First-Year Students, 2023”).

  • Mention sample size n either in caption or footnote.

Real-World Relevance & Ethical Notes

  • Appropriate summaries prevent information overload for audiences (managers, policy makers, the public).

  • Proper labelling avoids misinterpretation; mis-labelled axes or missing totals can lead to unethical miscommunication of results.

  • In surveys with sensitive categories (e.g., gender identity, ethnicity), anonymised frequency tables protect respondent privacy.

Connections to Previous & Upcoming Lectures

  • Builds directly on earlier definitions of qualitative variable and descriptive statistics (Revision ✓).

  • Sets the foundation for upcoming lecture: Summarising quantitative data (histograms, stem-and-leaf, etc.).

  • Practice Assignment 1 should now be complete; expect Assignment 2 after the next session → keep pace to avoid falling behind.

Quick Formula Recap (All in One Place)

  • \text{Relative Frequency}=\frac{f}{n}

  • \sum f = n

  • \sum \text{Relative Frequencies}=1

  • \text{Percentage}=\text{Relative Frequency}\times100

  • \sum \text{Percentages}=100

  • \text{Angle Size}=\text{Relative Frequency}\times360

  • \sum \text{Angle Sizes}=360


End of qualitative-data summarising techniques — be ready to apply these to homework and projects.