Descriptive Statistics – Summarising Qualitative Data

Raw data = non-processed, just-collected observations
- May appear as questionnaires, lists, tables, spreadsheets, etc.
- Example given: table of 20 student responses showing province of origin alongside other variables.
Disadvantages of raw data
- Contains “too much information” → cognitively heavy and time-consuming to read.
- Lacks visual impact → the underlying “story” is hidden.
- Scaling problem: 100 or 1 000 observations exacerbate both issues.
Practical implication: always convert raw qualitative data into concise tables or graphs before analysis or presentation.

Qualitative variable: records non-numeric information (e.g., province, gender, colour).
Descriptive statistics: methods for summarising, organising, presenting data.
Focus of the lecture: descriptive techniques for qualitative variables.

Structure (minimum two columns)
1. Categories (e.g., Eastern Cape, Free State, Gauteng, …).
2. Frequencies (f) = counts for each category.
Building procedure
1. List all observed categories (order can be alphabetical or by size).
2. Perform tallies (////) for each raw observation.
3. Add tallies → obtain f for every row.
4. Check: \sum f = n (sample size).
  • Example: 7+4+\dots = 20 students.
Why important?
- Gives immediate insight into “how often” each outcome appears.
- Underlies every subsequent graph (pie, bar, etc.).

Relative frequency (rf)
Formula: rf=\frac{f}{n}
• Properties: 0\le rf\le1 and \sum rf =1.
Percentage (%)
\text{Percentage}=rf\times100
• Properties: \sum\text{Percentages}=100.
Angle size (°) – required for pie charts
\text{Angle}=rf\times360
• Properties: \sum\text{Angles}=360.

Flexibility: include only the columns that match the story you wish to convey.
• Minimalists: categories + f.
• Presenters: add % and angle for automatic charting.

Preparation: compute angle sizes as above.
Construction rules
- Slice angle \propto rf.
- Use colour/legend to identify categories.
- Label or annotate slices OR place legend beside chart.
- Always give a descriptive title (e.g., “Pie Chart of Province of Origin”).
Interpretation: area of slice visualises proportion (immediate percentage feel).

Axes
- $x$-axis = categories.
- $y$-axis = frequency or relative frequency or percentage.
Drawing rules
- Bars of equal width, do not touch (space distinguishes categories).
- Scale $y$-axis to highest frequency (e.g., 0 → 7 for 7 entries).
- Label both axes and supply graph title.
Pros: easy comparison of heights; quick spotting of most/least common categories.

Definition: frequency table that records joint occurrences of two (categorical) variables.
- Example: Province × Gender (Male/Female).
Layout
- Rows = categories of variable 1, columns = categories of variable 2 (or vice-versa).
- Cell gives count f_{ij}.

Stacked Bar Chart
- $x$-axis = primary categories (e.g., province).
- Each bar’s height = overall frequency for that province.
- Bar is subdivided (stacked) by second variable (gender) using different colours.
- Legend identifies segments (e.g., red = Male, blue = Female).
- Alternative orientation: swap axes (make gender primary, stack provinces).
Multiple (Clustered) Bar Chart
- Bars for each sub-category drawn side-by-side within the same primary category.
- Facilitates direct visual comparison between sub-categories at each category level.

Choice criteria
- Stacked: highlights composition of totals.
- Clustered: highlights direct comparison of sub-groups.

Verify \sum f = n, \sum rf =1, \sum\text{%}=100, \sum\text{Angle}=360.
Use meaningful, non-abbreviated category labels or provide a clear legend.
Keep colours distinct & colour-blind-friendly; avoid misleading 3-D effects.
Scales must start at 0 for bar charts (avoids exaggeration of differences).
Titles should state what and for whom/when (e.g., “Bar Chart of Gender Distribution among First-Year Students, 2023”).
Mention sample size n either in caption or footnote.

Appropriate summaries prevent information overload for audiences (managers, policy makers, the public).
Proper labelling avoids misinterpretation; mis-labelled axes or missing totals can lead to unethical miscommunication of results.
In surveys with sensitive categories (e.g., gender identity, ethnicity), anonymised frequency tables protect respondent privacy.

Builds directly on earlier definitions of qualitative variable and descriptive statistics (Revision ✓).
Sets the foundation for upcoming lecture: Summarising quantitative data (histograms, stem-and-leaf, etc.).
Practice Assignment 1 should now be complete; expect Assignment 2 after the next session → keep pace to avoid falling behind.

End of qualitative-data summarising techniques — be ready to apply these to homework and projects.