WEEK 2 NOTES – Graphing & Describing Data

SSF1093 – Statistics for Social Sciences

WEEK 2: Graphing Categorical & Quantitative Data

Week-2 Learning Objectives

  • Understand different data types and which graphs suit each.

  • Learn to create visual representations of both categorical and quantitative variables.

Recap: Types of Data

  • Categorical (Qualitative) – values sorted into non-numeric groups.
    – Examples: Gender, favourite colour, type of vehicle.

  • Quantitative (Numeric) – measurable numeric values.
    – Examples: Height, weight, temperature.

Graphing Categorical Data: Bar Charts

  • Rectangular bars, equal width, gaps between categories.

  • Both frequency and percentage can be displayed.

  • Key parts: Title, axes labels, uniform scale.

  • Illustrations:
    – Vegetables bought (kg).
    – Favourite sports (students).

Graphing Categorical Data: Pie Charts

  • Circle divided into sectors proportional to category share.

  • Sector angle \text{=proportion}\times360^{\circ}.

  • Examples shown: Favourite sports angles ( Football $108^{\circ}$, Basketball $54^{\circ}$, etc.).

Practice: Creating Bar / Pie Charts

  • Dataset 1 (Ice-cream sales): Chocolate 30, Vanilla 25, Strawberry 20, Mint 15
    Bar chart.

  • Dataset 2 (Favourite pets): Dogs 50%, Cats 30%, Birds 10%, Hamsters 10%
    Pie chart.

Frequency Table Example (Quantitative)

  • Statistics-test scores of 20 students.

  • Class boundaries: 61–70, 71–80, 81–90, 91–100.

  • Tasks:
    (i) Frequency distribution,
    (ii) Relative frequency \text{=}\frac{\text{freq}}{20},
    (iii) Percentages.

  • Question: What % scored \ge 81 ? (Add two highest classes).

Frequency Table for Categorical Opinions

  • 30 responses on CEO salaries: Y, N, O codes.

  • Tasks:
    a) Frequency table.
    b) Relative frequencies & percentages \text{=}\frac{f}{30}\times100\%.
    c) Bar graph of relative frequencies.
    d) Pie chart of percentages.

Graphing Quantitative Data: Histograms

  • Adjacent bars (no gaps); width = class interval.

  • Height = frequency (or density).

  • Difference from bar chart: for quantitative, continuous x-axis.

  • Examples: Mathematics scores, Library visits.

Graphing Quantitative Data: Scatter Plots

  • Each point = observation (x,y).

  • Used to explore relationships / correlation.

  • Example: Scuba‐diver depth vs water temperature (negative association).

  • Example: Number of birds vs time of day (possible peak times).

Choosing the Right Graph

  • Bar/Pie ➜ categorical.

  • Histogram ➜ single quantitative variable.

  • Scatter plot ➜ relationship between two quantitative variables.

Describing Quantitative Data Numerically

  • Central Tendency: Mean, Median, Mode.

  • Variation: Range, Inter-quartile Range (IQR), Variance, Standard Deviation, Coefficient of Variation.

Why Numeric Description?

  • Graphs give quick visual impact but lack precision.

  • Exact numbers communicate “how old, how rich, how tall”.

Central Tendency Concept

  • Represents “centre” or typical value.

  • Simplifies understanding of general trend.

Appropriate Measure Depends on Data

  • Mean is common starting point.

  • Median/Mode useful under skewness, categorical data, etc.

  • Interpretation caveat: saying “men spend more time on internet” implies higher mean/median.

Definitions

  • Mean (Arithmetic average).

  • Median (midpoint of ranked values).

  • Mode (most frequent value).

Formulae for Arithmetic Mean

  • Population mean: \mu = \frac{\sum{i=1}^{N} Xi}{N}

  • Sample mean: \bar{x} = \frac{\sum{i=1}^{n} xi}{n}

Mean Sensitivity to Outliers

  • Including every value ➜ mean easily distorted.

  • Illustrative numeric examples (outlier increases mean from 3 to 4).

Salary Example (Skewed)

  • Most workers earn 12\text{k}–18\text{k}, but two extreme high salaries skew mean upward.

  • Need median instead.

Median Characteristics

  • Middle value in ordered list (50 % above, 50 % below).

  • Resistant to extremes; preferred with skewness.

Finding Median (Procedure)

  1. Rank data.

  2. Median position \text{=}\frac{n+1}{2}.

  3. If n even, average two central values.

  • Example: 8 car speeds.

Even vs Odd n

  • 8 observations: median position 4.5 ➜ average of 4th & 5th values.

  • Practice data: 9,13,9,11,9,13,11,9,10,8,11 – find mean and median.

Mode

  • Highest frequency value.

  • Can be none, one, or multiple.

  • Not altered by outliers; works for qualitative variables.

Mode Example

  • Data: 9,13,9,11,9,13,11,9,10,8,11 ⇒ Mode = 9 (appears 4 times).

  • Graphical illustration of modal bar.

Caveat on Using Mode

  • If mode far from rest (outlier), it misrepresents central tendency.

Mean vs Median Comparison

  • Data Sets:

  1. 1–10 (no outliers): mean = median = 5.5

  2. 1–9 plus 1000 (positive outlier): mean \gg median, median preferred.

  3. 1–6 plus 70,80,90,100 (clustered high): median in middle still OK; mean inflated.

  • Conclusion: choose measure that better reflects majority.

Level of Measurement & Best Location Measure

  • Ratio/Interval ➜ Mean valid.

  • Ordinal ➜ Median (rank-based).

  • Nominal ➜ Mode only.

Shape of Distribution Guides Choice

  • Normal (symmetric): mean OK.

  • Skewed: median safer.

  • Open-ended class limits: use median.

Pros & Cons Summary

  • Mean:

    • Uses all data.
      – Sensitive to extremes.

  • Median:

    • Not distorted by extremes.
      – Slow for very large n.

  • Mode:

    • Only option for nominal; unaffected by extremes.
      – May be multiple / none; poor for skewed quantitative sets.

Quick Quiz

  1. Identify modal class.

  2. State disadvantage of modal class as average (may not be unique/representative).

  3. State disadvantage of mean (sensitivity to outliers).

Building a Histogram (Telephone Bills Example)

  • Steps:

  1. Collect 200 bills.

  2. Build frequency/relative-frequency table.

  3. Draw histogram.

Interpretation Example

  • Approx \frac{1}{2} (108/200) bills < 30 (small).

  • Only 30 % (60/200) fall in middle range [30-75].

  • Nearly \frac{1}{3} > 75 (large).

Bell-Shaped (Normal) Histogram

  • Symmetric, unimodal curve resembling bell.

  • Many inferential methods assume population bell-shaped.

Skewness

  • Positively skewed: long right tail.

  • Negatively skewed: long left tail.

Shape & Central Tendency Relationship

  • Symmetric: \text{mean}=\text{median}=\text{mode}.

  • Positive skew: \text{mean}>\text{median}>\text{mode}.

  • Negative skew: \text{mean}<\text{median}<\text{mode}.

Skewness Example (Income)

  • Few high-income earners distort mean upward; distribution positively skewed.

  • Understanding skew helps target interventions (tax, welfare, etc.).

Skewness Example (Cricket Scores)

  • Most team players score > 50, few < 10 ⇒ negatively skewed.

  • Insightful for performance analysis.

Zero Skew (Symmetric)

  • Mean = Median = Mode condition.

  • Represents balanced data around centre.

Comparing Histograms

  • Task: decide which of two presented histograms is “more skewed”.

  • Visual: longer tail implies greater skewness.

Quick Shape Identification

  • Question: choose shape description (nearly symmetric, left skew, right skew, bimodal).

Practice Problem (Ages)

  • Data: 22,25,25,30,30,30,32,35,40
    a) Mode = 30.
    b) Median position \text{=}\frac{9+1}{2}=5.
    c) Median = value in 5th place (30).

Practice Problem (Exercise Hours)

  • Hours: 1,2,2,3,4,5,6,6,8,10

  • Mean \text{=}\frac{47}{10}=4.7 h.

  • Median: average of 5th & 6th ranked (4 & 5) ⇒ $$\frac{4+5}{2}=4.5