MA213 – Chapter 2: Methods for Describing Sets of Data

Where We’ve Been

Reviewed foundational themes of statistics
- Inferential Statistics: provide conclusions about a population based on sample data
- Descriptive Statistics: organize, summarize, and present data
Key elements of any statistical problem
- $\text{Population}$ & $\text{Sample}$ definitions
- Variable type identification
- Data-collection methodology
Two broad data types
- Quantitative (numerical)
- Qualitative (categorical)

Where We’re Going (Chapter Road-Map)

Describe data visually
- Graphs for quantitative variables
- Graphs for qualitative variables
Describe data numerically
- Central tendency: Mean, Median, Mode
- Variability: Variance, Standard Deviation
- Relative standing, outliers, association
Depict relationships between two quantitative variables
- (e.g., scatterplots—later section)

2.1 Describing Qualitative (Categorical) Data

Definition: qualitative values represent distinct classes—no intrinsic numeric meaning
- Examples: eye color, gender, political party, geographic location
Goal: organize raw category listings into useful summaries to support description & inference
Two numerical summaries
- Class Frequency: count of observations in each class
- Class Relative Frequency: $\dfrac{\text{class frequency}}{\text{total}}$ ; multiply by $100$ for Class Percentage

Class Frequency (Illustrative Example)

Transportation to school study
- Classes: car, bus, bike, walk
- Frequencies (sample): $12,7,6,11$ respectively

Adult Aphasia Study (Example 1)

Raw listing of $22$ subjects → classes:
- Anomic: $10$ patients
- Broca’s: $5$ patients
- Conduction: $7$ patients
Relative measures
- Anomic: $10/22 = 0.455 \; (45.5\%)$
- Broca’s: $5/22 = 0.227 \; (22.7\%)$
- Conduction: $7/22 = 0.318 \; (31.8\%)$
- Totals check: $22/22 = 1.00 \; (100\%)$

Blood-Type Survey (Example 2; $n=24$ )

Tallied frequencies
- $A = 5$ , $B = 7$ , $AB = 4$ , $O = 8$
Relative frequencies / percentages
- $A: 5/24 \approx 0.208 \; (20.8\%)$
- $B: 7/24 \approx 0.292 \; (29.2\%)$
- $AB: 4/24 \approx 0.167 \; (16.7\%)$
- $O: 8/24 \approx 0.333 \; (33.3\%)$

Graphical Displays for Qualitative Data

Pie Chart
- Slice angle: $\dfrac{\text{class freq}}{n}\times360^\circ$
- Slice %: $\dfrac{\text{class freq}}{n}\times100$
- Example (snack preferences): $\text{Potato chips}$ slice $= 37.3\%$ ⇒ $134^\circ$ , etc.
Bar Graph
- Height = frequency, relative frequency, or % per class
- Bars separated (qualitative axis has no inherent order)
- Coffee reasons example
- Taste $27.6\%$ , Awake $30.3\%$ , Coffeehouse $2.6\%$ , Other $3.9\%$ , Never drink $35.5\%$
Pareto Diagram
- Bars ordered left→right by descending height; superimposed line shows cumulative counts/percentages
- Highlights principle that a few categories often account for majority

Practice: Pie Chart Construction (Blood Types)

Angle/percentage computations reproduced above
Resulting distribution
- $O = 33.3\%$ , $B = 29.2\%$ , $A = 20.8\%$ , $AB = 16.7\%$

2.2 Graphical Methods for Quantitative Data

Quantitative values are measured on a true numeric scale (age, height, body temperature …)
Primary small-to-large-data tools
- Dot plots
- Stem-and-leaf plots
- Histograms

Dot Plots

Each numeric observation plotted as a dot above a horizontal axis
Construction steps
1. Determine min/max → choose axis scale
2. Draw baseline
3. Plot a separate stacked dot for each occurrence (duplicates vertically stack)
Pros/Cons: great for small datasets; preserves every value; quickly shows clusters & outliers

Stem-and-Leaf Plots

Split each number into stem (leading digit(s)) & leaf (trailing digit)
- $34 \to 3 | 4$ ; $356 \to 35 | 6$
Example 6: $20$ -day cardiogram counts
- Raw data: ${25,31,20,32,13,14,43,02,57,23,36,32,33,32,44,32,52,44,51,45}$
- Organized display (stem | leaves):
- $0 | 2$
- $1 | 3 \; 4$
- $2 | 0 \; 3 \; 5$
- $3 | 1 \; 2 \; 2 \; 2 \; 2 \; 3 \; 6$
- $4 | 3 \; 4 \; 4 \; 5$
- $5 | 1 \; 2 \; 7$
Strength: retains raw values and shows order; useful for moderate (<100ish) sample sizes

Histograms

Group quantitative values into class intervals (bins) on horizontal axis
Vertical axis = frequency or relative frequency within each bin; bars touch (continuity)
Example 7: $20$ test scores $[45\dots 90]$
- Possible 5-class grouping (given solution)
- $45{-}53:3$ , $54{-}62:4$ , $63{-}71:4$ , $72{-}80:5$ , $81{-}90:4$
- Visual histogram depicts distribution shape across score intervals

Choosing Number of Classes

General guideline: $5 \text{–} 20$ classes (depends on $n$ )
As $n$ increases ⇒ narrower class width beneficial
Classes must be
- Mutually exclusive
- Exhaustive (cover entire range)
- Continuous (no gaps even if frequency $=0$ )
- Equal width
Practical width calculation: \text{Class width} = \dfrac{\text{max} - \text{min}}{\text{desired # classes}} then round up to a convenient value

Bin-Number Illustration (MPG Data)

Too few bins (2) ⇒ oversmoothing; too many (100) ⇒ noise; select middle ground (e.g., 10–20) for balance

Grouped Frequency Distribution Example (50-State Record Highs)

Data range $\approx 100 \text{–} 134$
Using $7$ classes
- Determine class width $= \dfrac{134-100}{7} \approx 4.9$ ⇒ round to $5$
- Create intervals (e.g., $100{-}104,105{-}109,\dots$ ) & tally frequencies
- Histogram plotted accordingly (details in slides)
- Analysis targets distribution’s center (~ $115{-}120$ °F), spread (~ $100{-}134$ °F), possible skewness

Interpreting Histograms & Exploring Distributions

Identify shape
- Symmetric bell-shaped
- Uniform (flat)
- J-shaped / Reverse J
- Left-skewed or Right-skewed
- Bimodal, U-shaped, etc.
Center: midpoint where half data lie on each side (visual estimate) – later formalized by mean/median
Spread: range or anticipated variability (min⇢max)
Outliers: isolated bars/dots far from main cluster; flag for error check or substantive investigation

Common Shapes (visual palette)

Bell-shaped (Gaussian)
Uniform (rectangular)
J / Reverse-J
Left- or Right-skewed (tail direction)
Bimodal (two peaks) or U-shaped

Comparative Summary of Graph Types

Dot Plot
- Precise values shown, simple, small $n$
Stem-and-Leaf
- Values shown & ordered; still compact; moderate $n$
Histogram
- Conceals individual values but better for large $n$ ; reveals overall pattern more clearly

Conceptual & Practical Implications

Choosing the correct summary/graph depends on
- Data type (qualitative vs quantitative)
- Sample size
- Audience need: raw detail vs overall pattern
Ethical responsibility
- Avoid misleading through inappropriate class widths or selective category ordering
- Clearly label axes & include units
Real-world relevance
- Data visualization underpins decision-making in business, healthcare, public policy
- Recognizing skew/outliers prevents faulty “average” interpretations (e.g., median salary vs mean)

Key Formulas (LaTeX)

Relative frequency: $\text{RF}i = \dfrac{fi}{n}$
Class percentage: $\%i = \text{RF}i \times 100$
Pie-slice angle: $\thetai = \text{RF}i \times 360^{\circ}$
Histogram class width (rounded): $w = \left\lceil\dfrac{\text{max}-\text{min}}{k}\right\rceil$ where $k$ = desired # classes

Study Tips & Connections

Master categorical vs numerical distinction first; drives all later analytic choices
Re-draw given examples by hand to reinforce procedure memory (dot plot, stem-leaf, histogram)
Link shapes to potential real processes (e.g., right-skewed income, symmetric test errors)
Practice converting raw tallies to relative frequencies & percentages—vital for reports
Preview upcoming numeric measures: mean & standard deviation integrate with the visual tools learned here

MA213 – Chapter 2: Methods for Describing Sets of Data

Where We’ve Been

Where We’re Going (Chapter Road-Map)

2.1 Describing Qualitative (Categorical) Data

Class Frequency (Illustrative Example)

Adult Aphasia Study (Example 1)

Blood-Type Survey (Example 2; n=24n=24n=24)

Graphical Displays for Qualitative Data

Practice: Pie Chart Construction (Blood Types)

2.2 Graphical Methods for Quantitative Data

Dot Plots

Stem-and-Leaf Plots

Histograms

Choosing Number of Classes

Bin-Number Illustration (MPG Data)

Grouped Frequency Distribution Example (50-State Record Highs)

Interpreting Histograms & Exploring Distributions

Common Shapes (visual palette)

Comparative Summary of Graph Types

Conceptual & Practical Implications

Key Formulas (LaTeX)

Study Tips & Connections

Blood-Type Survey (Example 2; $n=24$ )