Descriptive Statistics – Summarising Quantitative Data

Context & Purpose

  • Lecture continues the Descriptive Statistics block, focusing on summarising quantitative data.
  • Goal: transform raw numerical data into tables and graphs that reveal structure and patterns.

Quantitative Variables Refresher

  • Defined as variables that assume numerical values (e.g.
    audit-time in days).

Frequency-Distribution Table: Step-by-Step Construction

1. Decide on Number of Classes (Categories)

  • Use Sturges’ Rule: k = 1 + 3.3 \log n
    • n = sample size.
    • Always round up the result.
  • Example data set: 30 observations.
    • k = 1 + 3.3 \log 30 = 5.875 \rightarrow 6\;\text{classes}

2. Compute Class Width c

  • Formula: c = \frac{\text{max} - \text{min}}{k}
  • Example: max = 32, min = 11.5
    • c = \frac{32-11.5}{6}=3.417 \rightarrow 4 (round up to next integer, even if only 0.0001 over).

3. Establish Class Boundaries

  • If min value is an integer → start exactly at min.
  • If min value is not an integer → start at next lower integer.
  • Example:
    • Min = 11.5 → first lower boundary = 11.
    • Upper boundary of each class = lower boundary + c.
    • Six intervals produced (square bracket = inclusive, round bracket = exclusive):
    1. [11,15)
    2. [15,19)
    3. [19,23)
    4. [23,27)
    5. [27,31)
    6. [31,35)
    • Boundary logic: value 15 belongs to 2nd class, 19 to 3rd, etc.

4. Tally Observations

  • Scan raw list once, mark a stroke (|||| then ) in the correct class.
  • Example tallies lead to frequencies:
    • f = [8,7,3,8,3,1] (sum = 30, matches n).

5. Calculate Additional Columns

  • Relative frequency: rf = \frac{f}{n}. Sum = 1.
  • Cumulative frequency: Fi = fi + \sum{j
  • Relative cumulative frequency: \frac{F}{n}.
  • Class midpoint ((xm)): xm = \frac{\text{lower} + \text{upper}}{2}.
    • Example midpoints: [13,17,21,25,29,33].
  • Optional columns: percentage, proportion, etc.

Minimum required for a basic table: Class Intervals + Frequencies.


Graphical Methods for Quantitative Data

A. Histogram

  • X-axis: class intervals (continuous scale).
  • Y-axis: frequencies.
  • Bars touch because data are continuous.
  • Example bar heights: 8,7,3,8,3,1.
  • Label axes: e.g. “Audit Time (days)” and “Number of Clients”.
Interpreting Histogram Shape
  • Symmetric: frequencies cluster near centre, mirror-like tails.
  • Uniform: all classes have ~equal frequency.
  • Negatively skewed (skew-left): bulk of data on right, tail extends left.
  • Positively skewed (skew-right): bulk on left, tail extends right.

B. Ogive (Cumulative Frequency Curve)

  • Requires Class Boundaries + Cumulative Frequencies.
  • Plot upper class boundary vs. cumulative frequency; join with straight segments.
  • Always non-decreasing.
  • Interpretation:
    • At 19 days, F=15 → 15 clients finished in <19 days.
    • Use vertical then horizontal tracing to answer “≤ value” questions (e.g.
      ≤27 days → 26 clients).

C. Frequency Polygon

  • Uses midpoints vs. frequencies.
  • Add two extra (arbitrary) midpoints so curve starts/ends on X-axis:
    • Start: first midpoint -c ( 13-4=9 ) with f=0.
    • End: last midpoint +c ( 33+4=37 ) with f=0.
  • Plot points (9,0), (13,8), (17,7), (21,3), (25,8), (29,3), (33,1), (37,0) and connect with straight lines.
  • Entire polygon touches the X-axis only at the two artificial endpoints.

Practical & Pedagogical Notes

  • Choice of extra columns/graphs depends on message & audience.
  • Frequency tables suffice for numerical summaries; graphs convey visual intuition.
  • Check sums: \sum f = n and \sum rf = 1.
  • Rounding-up principle applies to both k and c, ensuring full coverage without data loss.
  • Always label graphs fully (title, axes, units).
  • Contrast: histogram (touching bars, continuous) vs.
    bar chart (separate categories, bars separated).
  • Cumulative frequency tools (table or ogive) allow percentile-type inquiries without raw data.

Course Logistics Mentioned

  • Unit 2 concluded; Practice Assignment 2 open (work over next 7-10 days).
  • Memo/solutions available in 5-7 days.
  • Additional video provided for full worked example; slides alone are “sufficient”.
  • Encouragement to “work diligently”.

Summary of Key Formulas & Symbols

  • Sturges: k = 1 + 3.3 \log n
  • Class width: c = \dfrac{\max - \min}{k}
  • Relative frequency: rf = \dfrac{f}{n}
  • Cumulative frequency: Fi = \sum{j\le i} f_j
  • Class midpoint: x_m = \dfrac{\text{lower}+\text{upper}}{2}
  • Notation: square bracket [ = inclusive, round bracket ) = exclusive.

Ethical & Practical Implications

  • Transparent summarisation prevents misinterpretation—e.g.
    incorrect class widths or mis-labelled histograms can bias audience perception.
  • Proper rounding avoids hiding observations outside chosen boundaries.
  • Graph choice impacts how variability & skewness are communicated to stakeholders (auditors, managers, regulators, etc.).