WEEK 2 NOTES – Graphing & Describing Data
SSF1093 – Statistics for Social Sciences
WEEK 2: Graphing Categorical & Quantitative Data
Week-2 Learning Objectives
Understand different data types and which graphs suit each.
Learn to create visual representations of both categorical and quantitative variables.
Recap: Types of Data
Categorical (Qualitative) – values sorted into non-numeric groups.
– Examples: Gender, favourite colour, type of vehicle.Quantitative (Numeric) – measurable numeric values.
– Examples: Height, weight, temperature.
Graphing Categorical Data: Bar Charts
Rectangular bars, equal width, gaps between categories.
Both frequency and percentage can be displayed.
Key parts: Title, axes labels, uniform scale.
Illustrations:
– Vegetables bought (kg).
– Favourite sports (students).
Graphing Categorical Data: Pie Charts
Circle divided into sectors proportional to category share.
Sector angle \text{=proportion}\times360^{\circ}.
Examples shown: Favourite sports angles ( Football $108^{\circ}$, Basketball $54^{\circ}$, etc.).
Practice: Creating Bar / Pie Charts
Dataset 1 (Ice-cream sales): Chocolate 30, Vanilla 25, Strawberry 20, Mint 15
Bar chart.Dataset 2 (Favourite pets): Dogs 50%, Cats 30%, Birds 10%, Hamsters 10%
Pie chart.
Frequency Table Example (Quantitative)
Statistics-test scores of 20 students.
Class boundaries: 61–70, 71–80, 81–90, 91–100.
Tasks:
(i) Frequency distribution,
(ii) Relative frequency \text{=}\frac{\text{freq}}{20},
(iii) Percentages.Question: What % scored \ge 81 ? (Add two highest classes).
Frequency Table for Categorical Opinions
30 responses on CEO salaries: Y, N, O codes.
Tasks:
a) Frequency table.
b) Relative frequencies & percentages \text{=}\frac{f}{30}\times100\%.
c) Bar graph of relative frequencies.
d) Pie chart of percentages.
Graphing Quantitative Data: Histograms
Adjacent bars (no gaps); width = class interval.
Height = frequency (or density).
Difference from bar chart: for quantitative, continuous x-axis.
Examples: Mathematics scores, Library visits.
Graphing Quantitative Data: Scatter Plots
Each point = observation (x,y).
Used to explore relationships / correlation.
Example: Scuba‐diver depth vs water temperature (negative association).
Example: Number of birds vs time of day (possible peak times).
Choosing the Right Graph
Bar/Pie ➜ categorical.
Histogram ➜ single quantitative variable.
Scatter plot ➜ relationship between two quantitative variables.
Describing Quantitative Data Numerically
Central Tendency: Mean, Median, Mode.
Variation: Range, Inter-quartile Range (IQR), Variance, Standard Deviation, Coefficient of Variation.
Why Numeric Description?
Graphs give quick visual impact but lack precision.
Exact numbers communicate “how old, how rich, how tall”.
Central Tendency Concept
Represents “centre” or typical value.
Simplifies understanding of general trend.
Appropriate Measure Depends on Data
Mean is common starting point.
Median/Mode useful under skewness, categorical data, etc.
Interpretation caveat: saying “men spend more time on internet” implies higher mean/median.
Definitions
Mean (Arithmetic average).
Median (midpoint of ranked values).
Mode (most frequent value).
Formulae for Arithmetic Mean
Population mean: \mu = \frac{\sum{i=1}^{N} Xi}{N}
Sample mean: \bar{x} = \frac{\sum{i=1}^{n} xi}{n}
Mean Sensitivity to Outliers
Including every value ➜ mean easily distorted.
Illustrative numeric examples (outlier increases mean from 3 to 4).
Salary Example (Skewed)
Most workers earn 12\text{k}–18\text{k}, but two extreme high salaries skew mean upward.
Need median instead.
Median Characteristics
Middle value in ordered list (50 % above, 50 % below).
Resistant to extremes; preferred with skewness.
Finding Median (Procedure)
Rank data.
Median position \text{=}\frac{n+1}{2}.
If n even, average two central values.
Example: 8 car speeds.
Even vs Odd n
8 observations: median position 4.5 ➜ average of 4th & 5th values.
Practice data: 9,13,9,11,9,13,11,9,10,8,11 – find mean and median.
Mode
Highest frequency value.
Can be none, one, or multiple.
Not altered by outliers; works for qualitative variables.
Mode Example
Data: 9,13,9,11,9,13,11,9,10,8,11 ⇒ Mode = 9 (appears 4 times).
Graphical illustration of modal bar.
Caveat on Using Mode
If mode far from rest (outlier), it misrepresents central tendency.
Mean vs Median Comparison
Data Sets:
1–10 (no outliers): mean = median = 5.5
1–9 plus 1000 (positive outlier): mean \gg median, median preferred.
1–6 plus 70,80,90,100 (clustered high): median in middle still OK; mean inflated.
Conclusion: choose measure that better reflects majority.
Level of Measurement & Best Location Measure
Ratio/Interval ➜ Mean valid.
Ordinal ➜ Median (rank-based).
Nominal ➜ Mode only.
Shape of Distribution Guides Choice
Normal (symmetric): mean OK.
Skewed: median safer.
Open-ended class limits: use median.
Pros & Cons Summary
Mean:
Uses all data.
– Sensitive to extremes.
Median:
Not distorted by extremes.
– Slow for very large n.
Mode:
Only option for nominal; unaffected by extremes.
– May be multiple / none; poor for skewed quantitative sets.
Quick Quiz
Identify modal class.
State disadvantage of modal class as average (may not be unique/representative).
State disadvantage of mean (sensitivity to outliers).
Building a Histogram (Telephone Bills Example)
Steps:
Collect 200 bills.
Build frequency/relative-frequency table.
Draw histogram.
Interpretation Example
Approx \frac{1}{2} (108/200) bills < 30 (small).
Only 30 % (60/200) fall in middle range [30-75].
Nearly \frac{1}{3} > 75 (large).
Bell-Shaped (Normal) Histogram
Symmetric, unimodal curve resembling bell.
Many inferential methods assume population bell-shaped.
Skewness
Positively skewed: long right tail.
Negatively skewed: long left tail.
Shape & Central Tendency Relationship
Symmetric: \text{mean}=\text{median}=\text{mode}.
Positive skew: \text{mean}>\text{median}>\text{mode}.
Negative skew: \text{mean}<\text{median}<\text{mode}.
Skewness Example (Income)
Few high-income earners distort mean upward; distribution positively skewed.
Understanding skew helps target interventions (tax, welfare, etc.).
Skewness Example (Cricket Scores)
Most team players score > 50, few < 10 ⇒ negatively skewed.
Insightful for performance analysis.
Zero Skew (Symmetric)
Mean = Median = Mode condition.
Represents balanced data around centre.
Comparing Histograms
Task: decide which of two presented histograms is “more skewed”.
Visual: longer tail implies greater skewness.
Quick Shape Identification
Question: choose shape description (nearly symmetric, left skew, right skew, bimodal).
Practice Problem (Ages)
Data: 22,25,25,30,30,30,32,35,40
a) Mode = 30.
b) Median position \text{=}\frac{9+1}{2}=5.
c) Median = value in 5th place (30).
Practice Problem (Exercise Hours)
Hours: 1,2,2,3,4,5,6,6,8,10
Mean \text{=}\frac{47}{10}=4.7 h.
Median: average of 5th & 6th ranked (4 & 5) ⇒ $$\frac{4+5}{2}=4.5