Variables Part Two: Organizing & Displaying Numeric Data

Categorical vs. Numeric Data

  • Categorical (qualitative)
    • Order of categories does not matter; they can be alphabetized or arranged arbitrarily.
  • Numeric (quantitative)
    • Natural order from smallest → largest.
    • Must be organized respecting that order.
    • Primary tools introduced:
    • Frequency & relative-frequency distributions.
    • Graphs: dot plots, stem-and-leaf plots, histograms.

Frequency & Relative-Frequency Distributions

  • Purpose: tabulate how often each value (or interval) occurs.
  • Two grouping styles:
    1. Single-value grouping — each class is one specific value.
    2. Cut-point grouping — each class is an interval
    • Lower cut point (class lower limit) is included.
    • Upper cut point is excluded (becomes lower limit of next class).
  • Relative frequency formula: rel. freq.=fn\text{rel. freq.}=\frac{f}{n} where ff is class frequency, nn is total observations.

Single-Value Example: Ages of 10 Statistics Students

  • Raw ages: 18, 18, 18, 19, 19, 20, 17, 18, 19, 20.
  • Youngest = 17; oldest = 20 ⇒ list all integers 17–20 even if absent.
  • Tally → Frequency table
    • 17 (1), 18 (4), 19 (3), 20 (2)  (sum = 10).
  • Relative frequencies: 0.10, 0.40, 0.30, 0.20.
  • Always include: title, variable label ("Age, yrs"), frequency & relative-frequency columns, and grand total.

Cut-Point Grouping

Rules

  • Recommended number of classes kk: between 5 and 20.
  • All classes same width ww (prefer whole number when possible).
  • Compute width
    w=maxminkw=\frac{\text{max}-\text{min}}{k} then round up.
  • Each observation belongs to exactly one class.

Egg-Weight Example (20 eggs, grams)

  • Min = 54.4 g; Max = 62.1 g.
  • Range R=62.154.4=7.7R=62.1-54.4=7.7 (small).
  • Choose k=5k=5w=7.75=1.542w=\frac{7.7}{5}=1.54\to2.
  • Start at nearest convenient whole number: 54.
  • Classes (lower inclusive, upper exclusive):
    • 54–<56, 56–<58, 58–<60, 60–<62, 62–<64.
  • After tally: frequencies 2, 6, 6, 4, 2 (total 20) → relative frequencies .10, .30, .30, .20, .10.
  • Midpoint of a class: mid=lower+upper2\text{mid}=\frac{\text{lower}+\text{upper}}{2}.

Retirement-Home Ages (20 people, 81–90)

  • Range R=9081=9R=90-81=9 ⇒ pick k=5k=5w=1.8=2w=\lceil1.8\rceil=2.
  • Start 81 → classes 81–<83, 83–<85, … 89–<91.
  • Fill tally → complete frequency & relative-frequency columns, add title.

College-Coach Ages (100 coaches, 35–80)

  • Range R=8035=45R=80-35=45; choose k=8k=8 for larger spread.
  • w=45/8=5.6256w=45/8=5.625\to6.
  • Classes starting 35: 35–<41, 41–<47, …, 77–<83 (8 total).
  • After tally one could compute frequencies, rel. frequencies, add descriptive title.

Graphical Displays

Dot Plot

  • Small/medium data sets; each dot represents one observation positioned above its value on a number line.
  • Example: Resting heart rates of 15 ASU students (52–93 bpm).
    • Axes: horizontal = heart rate (beats per minute), vertical often omitted; number of stacked dots = frequency.
    • Must include title & unit; speaker illustrated missing labels as a teaching point.

Stem-and-Leaf Plot

  • Shows raw data while giving distribution shape.
  • Split each value into:
    • Stem = all but final digit.
    • Leaf = final (right-most) digit.
  • Draw vertical line; stems on left, leaves on right (ascending order).
  • Two formats:
    1. One-line-per-stem — every stem appears once.
    • Resting heart rates example produced rows for 5|, 6|, 7|, 8|, 9|.
    1. Two-lines-per-stem — first line holds leaves 0–4, second line 5–9.
    • GPA example (2.0–4.0): stems 2, 3, 4 duplicated; first row for 0–4 leaves, second for 5–9 leaves.
    • Demonstrated sorting, handling duplicates, and possible >4.0 honors cases.
  • Power-walking 10 k times (60–89 min) used two-lines-per-stem; blank stems retained (no skipping like number line).
  • Advantages: preserves individual observations, quick to construct by hand.
  • Disadvantages: cluttered with very large datasets (≥ hundreds); use histogram instead.

Histogram

  • Bar-like graph for numeric data; bars touch (continuous scale).
  • Y-axis: frequency or relative frequency; X-axis: class intervals.
  • Two versions:
    • Frequency histogram (counts).
    • Relative-frequency histogram (proportions).
  • Egg-weight example displayed both; bars spanned 54–<56,…,62–<64 g.
  • Choosing kk too small (e.g., 2 classes → w=8w=8) hides structure; too large (e.g., 20 classes) shows noisy spikes. Aim for balance guided by range and the 5–20 rule.

Distribution & Shape Terminology

  • Distribution: table, graph, or formula indicating possible values of a variable and their frequencies.

Modality (Number of Peaks)

  • Unimodal: one peak (e.g., normal bell curve).
  • Bimodal: two peaks.
  • Multimodal: three or more peaks (no special “trimodal” term; all 3+ fall here).
    • Instructor analogy: unicycle (1 wheel) = unimodal.

Symmetry & Skewness

  • Symmetric distribution: can be split into mirror halves.
  • Right-skewed: long tail extends to larger values (right side “pulls” out).
  • Left-skewed: long tail toward smaller values (left side elongated).
    • Visual analogy: kid grabbing one side of the bell curve and running.

Practical & Pedagogical Notes

  • Always include:
    • Descriptive title.
    • Variable name & measurement units on axes or table headings.
  • Whole numbers preferred for class boundaries ("the world doesn’t like decimals"), but scientists may tolerate precise values.
  • When deciding kk:
    • Large range (≈1000) → lean toward upper limit (≈20).
    • Small range (≈10) → lean toward lower limit (≈5).
    • Trial-and-error acceptable until display "looks best".
  • Empty classes should still appear in tables/plots (analogous to not skipping numbers on a number line).
  • Dot plots & stem-and-leaf ideal for quick insight, homework, or small n; histograms preferred for large datasets or presentations.

Key Formula Recap

  • Range: R=maxminR = \text{max} - \text{min}.
  • Class width (before rounding): w=Rkw=\frac{R}{k}.
  • Relative frequency: rel. freq.=fn\text{rel. freq.}=\frac{f}{n}.
  • Class midpoint: mid=lower cut point+upper cut point2\text{mid}=\frac{\text{lower cut point}+\text{upper cut point}}{2}.

Ethical & Real-World Connections

  • Examples contextualized: nutritional supplements in poultry industry, retirement-home demographics, power walking & joint health, academic GPAs.
  • Instructor encourages adult learners ("went back in mid-30s – you can too") to reduce intimidation and foster inclusive education.