WEEK 2 NOTES – Graphing & Describing Data

SSF1093 – Statistics for Social Sciences

WEEK 2: Graphing Categorical & Quantitative Data

Week-2 Learning Objectives

Understand different data types and which graphs suit each.
Learn to create visual representations of both categorical and quantitative variables.

Recap: Types of Data

Categorical (Qualitative) – values sorted into non-numeric groups.
– Examples: Gender, favourite colour, type of vehicle.
Quantitative (Numeric) – measurable numeric values.
– Examples: Height, weight, temperature.

Graphing Categorical Data: Bar Charts

Rectangular bars, equal width, gaps between categories.
Both frequency and percentage can be displayed.
Key parts: Title, axes labels, uniform scale.
Illustrations:
– Vegetables bought (kg).
– Favourite sports (students).

Graphing Categorical Data: Pie Charts

Circle divided into sectors proportional to category share.
Sector angle \text{=proportion}\times360^{\circ}.
Examples shown: Favourite sports angles ( Football $108^{\circ}$, Basketball $54^{\circ}$, etc.).

Practice: Creating Bar / Pie Charts

Dataset 1 (Ice-cream sales): Chocolate 30, Vanilla 25, Strawberry 20, Mint 15
Bar chart.
Dataset 2 (Favourite pets): Dogs 50%, Cats 30%, Birds 10%, Hamsters 10%
Pie chart.

Frequency Table Example (Quantitative)

Statistics-test scores of 20 students.
Class boundaries: 61–70, 71–80, 81–90, 91–100.
Tasks:
(i) Frequency distribution,
(ii) Relative frequency \text{=}\frac{\text{freq}}{20},
(iii) Percentages.
Question: What % scored \ge 81 ? (Add two highest classes).

Frequency Table for Categorical Opinions

30 responses on CEO salaries: Y, N, O codes.
Tasks:
a) Frequency table.
b) Relative frequencies & percentages \text{=}\frac{f}{30}\times100\%.
c) Bar graph of relative frequencies.
d) Pie chart of percentages.

Graphing Quantitative Data: Histograms

Adjacent bars (no gaps); width = class interval.
Height = frequency (or density).
Difference from bar chart: for quantitative, continuous x-axis.
Examples: Mathematics scores, Library visits.

Graphing Quantitative Data: Scatter Plots

Each point = observation (x,y).
Used to explore relationships / correlation.
Example: Scuba‐diver depth vs water temperature (negative association).
Example: Number of birds vs time of day (possible peak times).

Choosing the Right Graph

Bar/Pie ➜ categorical.
Histogram ➜ single quantitative variable.
Scatter plot ➜ relationship between two quantitative variables.

Describing Quantitative Data Numerically

Central Tendency: Mean, Median, Mode.
Variation: Range, Inter-quartile Range (IQR), Variance, Standard Deviation, Coefficient of Variation.

Why Numeric Description?

Graphs give quick visual impact but lack precision.
Exact numbers communicate “how old, how rich, how tall”.

Central Tendency Concept

Represents “centre” or typical value.
Simplifies understanding of general trend.

Appropriate Measure Depends on Data

Mean is common starting point.
Median/Mode useful under skewness, categorical data, etc.
Interpretation caveat: saying “men spend more time on internet” implies higher mean/median.

Definitions

Mean (Arithmetic average).
Median (midpoint of ranked values).
Mode (most frequent value).

Formulae for Arithmetic Mean

Population mean: \mu = \frac{\sum{i=1}^{N} Xi}{N}
Sample mean: \bar{x} = \frac{\sum{i=1}^{n} xi}{n}

Mean Sensitivity to Outliers

Including every value ➜ mean easily distorted.
Illustrative numeric examples (outlier increases mean from 3 to 4).

Salary Example (Skewed)

Most workers earn 12\text{k}–18\text{k}, but two extreme high salaries skew mean upward.
Need median instead.

Median Characteristics

Middle value in ordered list (50 % above, 50 % below).
Resistant to extremes; preferred with skewness.

Finding Median (Procedure)

Rank data.
Median position \text{=}\frac{n+1}{2}.
If n even, average two central values.

Example: 8 car speeds.

Even vs Odd n

8 observations: median position 4.5 ➜ average of 4th & 5th values.
Practice data: 9,13,9,11,9,13,11,9,10,8,11 – find mean and median.

Mode

Highest frequency value.
Can be none, one, or multiple.
Not altered by outliers; works for qualitative variables.

Mode Example

Data: 9,13,9,11,9,13,11,9,10,8,11 ⇒ Mode = 9 (appears 4 times).
Graphical illustration of modal bar.

Caveat on Using Mode

If mode far from rest (outlier), it misrepresents central tendency.

Mean vs Median Comparison

Data Sets:

1–10 (no outliers): mean = median = 5.5
1–9 plus 1000 (positive outlier): mean \gg median, median preferred.
1–6 plus 70,80,90,100 (clustered high): median in middle still OK; mean inflated.

Conclusion: choose measure that better reflects majority.

Level of Measurement & Best Location Measure

Ratio/Interval ➜ Mean valid.
Ordinal ➜ Median (rank-based).
Nominal ➜ Mode only.

Shape of Distribution Guides Choice

Normal (symmetric): mean OK.
Skewed: median safer.
Open-ended class limits: use median.

Pros & Cons Summary

Mean:
- Uses all data.
  – Sensitive to extremes.
Median:
- Not distorted by extremes.
  – Slow for very large n.
Mode:
- Only option for nominal; unaffected by extremes.
  – May be multiple / none; poor for skewed quantitative sets.

Quick Quiz

Identify modal class.
State disadvantage of modal class as average (may not be unique/representative).
State disadvantage of mean (sensitivity to outliers).

Building a Histogram (Telephone Bills Example)

Steps:

Collect 200 bills.
Build frequency/relative-frequency table.
Draw histogram.

Interpretation Example

Approx \frac{1}{2} (108/200) bills < 30 (small).
Only 30 % (60/200) fall in middle range [30-75].
Nearly \frac{1}{3} > 75 (large).

Bell-Shaped (Normal) Histogram

Symmetric, unimodal curve resembling bell.
Many inferential methods assume population bell-shaped.

Skewness

Positively skewed: long right tail.
Negatively skewed: long left tail.

Shape & Central Tendency Relationship

Symmetric: \text{mean}=\text{median}=\text{mode}.
Positive skew: \text{mean}>\text{median}>\text{mode}.
Negative skew: \text{mean}<\text{median}<\text{mode}.

Skewness Example (Income)

Few high-income earners distort mean upward; distribution positively skewed.
Understanding skew helps target interventions (tax, welfare, etc.).

Skewness Example (Cricket Scores)

Most team players score > 50, few < 10 ⇒ negatively skewed.
Insightful for performance analysis.

Zero Skew (Symmetric)

Mean = Median = Mode condition.
Represents balanced data around centre.

Comparing Histograms

Task: decide which of two presented histograms is “more skewed”.
Visual: longer tail implies greater skewness.

Quick Shape Identification

Question: choose shape description (nearly symmetric, left skew, right skew, bimodal).

Practice Problem (Ages)

Data: 22,25,25,30,30,30,32,35,40
a) Mode = 30.
b) Median position \text{=}\frac{9+1}{2}=5.
c) Median = value in 5th place (30).

Practice Problem (Exercise Hours)

Hours: 1,2,2,3,4,5,6,6,8,10
Mean \text{=}\frac{47}{10}=4.7 h.
Median: average of 5th & 6th ranked (4 & 5) ⇒ $$\frac{4+5}{2}=4.5