Descriptive Statistics – Summarising Quantitative Data

Context & Purpose

Lecture continues the Descriptive Statistics block, focusing on summarising quantitative data.
Goal: transform raw numerical data into tables and graphs that reveal structure and patterns.

Quantitative Variables Refresher

Defined as variables that assume numerical values (e.g.
audit-time in days).

Frequency-Distribution Table: Step-by-Step Construction

1. Decide on Number of Classes (Categories)

Use Sturges’ Rule: $k = 1 + 3.3 \log n$
- $n$ = sample size.
- Always round up the result.
Example data set: 30 observations.
- $k = 1 + 3.3 \log 30 = 5.875 \rightarrow 6\;\text{classes}$

2. Compute Class Width $c$

Formula: $c = \frac{\text{max} - \text{min}}{k}$
Example: max $= 32$ , min $= 11.5$
- $c = \frac{32-11.5}{6}=3.417 \rightarrow 4$ (round up to next integer, even if only 0.0001 over).

3. Establish Class Boundaries

If min value is an integer → start exactly at min.
If min value is not an integer → start at next lower integer.
Example:
- Min = 11.5 → first lower boundary = $11$ .
- Upper boundary of each class = lower boundary $+ c$ .
- Six intervals produced (square bracket = inclusive, round bracket = exclusive):
1. $[11,15)$
2. $[15,19)$
3. $[19,23)$
4. $[23,27)$
5. $[27,31)$
6. $[31,35)$
- Boundary logic: value $15$ belongs to 2nd class, $19$ to 3rd, etc.

4. Tally Observations

Scan raw list once, mark a stroke (|||| then ) in the correct class.
Example tallies lead to frequencies:
- $f = [8,7,3,8,3,1]$ (sum = $30$ , matches $n$ ).

5. Calculate Additional Columns

Relative frequency: $rf = \frac{f}{n}$ . Sum = 1.
Cumulative frequency: $Fi = fi + \sum{j<i} fj$ .
- Example: $F = [8,15,18,26,29,30]$ .
Relative cumulative frequency: $\frac{F}{n}$ .
Class midpoint ((xm)): $xm = \frac{\text{lower} + \text{upper}}{2}$ .
- Example midpoints: $[13,17,21,25,29,33]$ .
Optional columns: percentage, proportion, etc.

Minimum required for a basic table: Class Intervals + Frequencies.

Graphical Methods for Quantitative Data

A. Histogram

X-axis: class intervals (continuous scale).
Y-axis: frequencies.
Bars touch because data are continuous.
Example bar heights: 8,7,3,8,3,1.
Label axes: e.g. “Audit Time (days)” and “Number of Clients”.

Interpreting Histogram Shape

Symmetric: frequencies cluster near centre, mirror-like tails.
Uniform: all classes have ~equal frequency.
Negatively skewed (skew-left): bulk of data on right, tail extends left.
Positively skewed (skew-right): bulk on left, tail extends right.

B. Ogive (Cumulative Frequency Curve)

Requires Class Boundaries + Cumulative Frequencies.
Plot upper class boundary vs. cumulative frequency; join with straight segments.
Always non-decreasing.
Interpretation:
- At 19 days, $F=15$ → 15 clients finished in <19 days.
- Use vertical then horizontal tracing to answer “≤ value” questions (e.g.
 ≤27 days → 26 clients).

C. Frequency Polygon

Uses midpoints vs. frequencies.
Add two extra (arbitrary) midpoints so curve starts/ends on X-axis:
- Start: first midpoint $-c$ ( $13-4=9$ ) with $f=0$ .
- End: last midpoint $+c$ ( $33+4=37$ ) with $f=0$ .
Plot points (9,0), (13,8), (17,7), (21,3), (25,8), (29,3), (33,1), (37,0) and connect with straight lines.
Entire polygon touches the X-axis only at the two artificial endpoints.

Practical & Pedagogical Notes

Choice of extra columns/graphs depends on message & audience.
Frequency tables suffice for numerical summaries; graphs convey visual intuition.
Check sums: $\sum f = n$ and $\sum rf = 1$ .
Rounding-up principle applies to both $k$ and $c$ , ensuring full coverage without data loss.
Always label graphs fully (title, axes, units).
Contrast: histogram (touching bars, continuous) vs.
bar chart (separate categories, bars separated).
Cumulative frequency tools (table or ogive) allow percentile-type inquiries without raw data.

Course Logistics Mentioned

Unit 2 concluded; Practice Assignment 2 open (work over next 7-10 days).
Memo/solutions available in 5-7 days.
Additional video provided for full worked example; slides alone are “sufficient”.
Encouragement to “work diligently”.

Summary of Key Formulas & Symbols

Sturges: $k = 1 + 3.3 \log n$
Class width: $c = \dfrac{\max - \min}{k}$
Relative frequency: $rf = \dfrac{f}{n}$
Cumulative frequency: $Fi = \sum{j\le i} f_j$
Class midpoint: $x_m = \dfrac{\text{lower}+\text{upper}}{2}$
Notation: square bracket [ = inclusive, round bracket ) = exclusive.

Ethical & Practical Implications

Transparent summarisation prevents misinterpretation—e.g.
incorrect class widths or mis-labelled histograms can bias audience perception.
Proper rounding avoids hiding observations outside chosen boundaries.
Graph choice impacts how variability & skewness are communicated to stakeholders (auditors, managers, regulators, etc.).

Descriptive Statistics – Summarising Quantitative Data

Context & Purpose

Quantitative Variables Refresher

Frequency-Distribution Table: Step-by-Step Construction

1. Decide on Number of Classes (Categories)

2. Compute Class Width ccc

3. Establish Class Boundaries

4. Tally Observations

5. Calculate Additional Columns

Graphical Methods for Quantitative Data

A. Histogram

Interpreting Histogram Shape

B. Ogive (Cumulative Frequency Curve)

C. Frequency Polygon

Practical & Pedagogical Notes

Course Logistics Mentioned

Summary of Key Formulas & Symbols

Ethical & Practical Implications

2. Compute Class Width $c$