Chapter 2 Notes: Frequency Tables, Class Width, Boundaries, Histograms, and Qualitative Charts

Grading workflow (as described in the transcript)

  • The instructor adjusts assignment scores to a 100-point scale and then uploads them to Blackboard.
  • Example given: if a student scores 83, it will go to Blackboard as 800 (note: this phrasing in the transcript seems like a misstatement or a transcription error; the intended idea is that scores are scaled to a 100-point scale before entry). The key point: the grading is done manually to keep control over the grade book and avoid automatic synchronization issues between WebAssign and Blackboard.
  • The instructor plans to upload grades after every two or three assignments, and will upload assignment 1, 2, 3, 4 grades to Blackboard once the first exam is taken.
  • Students can ask questions if they don’t understand the process.

Chapter 2 overview: frequency tables and data organization

  • Main topics covered:
    • Frequency tables: how to create them from a dataset by classifying data into intervals (classes).
    • Class definitions: each interval has a lower limit and an upper limit (e.g., a class from 70 to 77 would have lower limit 70 and upper limit 77 in the example; other classes shown include 33–40, etc.).
    • Purpose of classes: to group data into meaningful ranges for summarization.
  • Important terms:
    • Classes (or class intervals): non-overlapping data ranges used to group observations.
    • Lower limit of a class: the smallest value that can belong to that class.
    • Upper limit of a class: the largest value that can belong to that class.

Deciding the number of classes

  • Step 1: Decide the number of classes (k).
    • Guidance mentioned: for about 60 observations, 5–6 classes are reasonable; for thousands of observations, 15–16 classes may be used. There is no strict universal rule, but a problem may specify the number of classes.
  • Step 2: Decide the class width once the number of classes is chosen.
    • The width is computed from the data:
    • Given dataset with largest observation max and smallest observation min, and chosen number of classes k, the class width is:
    • w = \frac{\max - \min}{k}
    • Since class width should be a practical (often integer) size, you round up to the next whole number. The transcript notes the rounding rule: "round up" the resulting value.
    • Example from the lecture (dataset described):
    • Largest observation: 47, smallest: 1, number of classes k = 6.
    • Compute w = \left\lceil \frac{47 - 1}{6} \right\rceil = \left\lceil \frac{46}{6} \right\rceil = \left\lceil 7.666… \right\rceil = 8.
    • Note: the lecturer also showed a rough 7.7 value before rounding, which led to using 8 as the class width.
  • Step 3: Determine the class limits (class boundaries for the raw data).
    • With min = 1 and w = 8, the lower class limits (L_i) are:
    • L1 = 1, L2 = 1 + 8 = 9,
      L3 = 17, L4 = 25,
      L5 = 33, L6 = 41.
    • Corresponding upper limits (U_i) for width 8 are:
    • U1 = 1 + 8 - 1 = 8, U2 = 16,
      U3 = 24, U4 = 32,
      U5 = 40, U6 = 48-1 = 47.
    • Resulting classes (inclusive on both ends):
    • [1,8], [9,16], [17,24], [25,32], [33,40], [41,47].
    • The transcript notes stopping at 47 because the largest observation is 47 (no need for an extra empty class).
  • Step 4: Populate the frequency column
    • For each class, count how many observations fall into the interval (including the endpoints).
    • Example counts given in the lecture:
    • Class [1,8]: 14 observations.
    • Class [9,16]: 11 observations.
    • Counts for the remaining classes were described as "and so on" (the instructor indicated you can count directly or use the book).
  • Step 5: Midpoints
    • The midpoint of a class is the average of its lower and upper limits:
    • \text{Midpoint}i = \frac{Li + U_i}{2}.
    • Example: for [1,8], midpoint is \frac{1+8}{2} = 4.5.
    • Midpoints can be used for summaries or for constructing certain plots.
  • Step 6: Class boundaries (to prepare for a histogram)
    • Class boundaries are used to avoid gaps between adjacent classes when drawing a histogram.
    • The idea: there is a space between the upper limit of one class and the lower limit of the next class; take half of that space to assign as a boundary.
    • In this dataset, since class width is 8, you add 0.5 to the upper limit to set the lower boundary of the next class:
    • Boundaries for the first few classes:
    • First class [1,8]: lower boundary = 1 - 0.5 = 0.5, upper boundary = 8 + 0.5 = 8.5
    • Second class [9,16]: lower boundary = 8.5, upper boundary = 16.5
    • Third class [17,24]: lower boundary = 16.5, upper boundary = 24.5
    • And so on, producing a continuous set of boundaries such that there is no gap between bars.
    • The boundaries can be described generically as:
    • Bi = [Li - \tfrac{1}{2}, U_i + \tfrac{1}{2}) between successive classes.
    • Boundary example for a start value of 2 (for illustration): if a class starts at 2 and has width 8, the boundary sequence would be 1.5, 9.5, 17.5, etc.
  • Step 7: Relative frequency
    • Relative frequency for class i:
    • ri = \frac{fi}{N},
    • where $f_i$ is the class frequency and $N$ is the total number of observations.
    • The sum of all relative frequencies equals 1 (ignoring rounding errors):
    • \sumi ri = 1.
    • Example from the transcript: if $N=60$ and $f1=14$, then r1 = \frac{14}{60} \approx 0.233.
    • A class with a relative frequency of 0.35 corresponds to 35% of observations in that class.
  • Step 8: From the frequency distribution to interpretation
    • The distribution lets you describe the dataset succinctly (e.g., what ranges most observations fall into).
    • A brief written summary is commonly added below the histogram to interpret the results for readers (biologists, agriculturists, etc.).

Creating and interpreting a histogram

  • To draw a histogram:
    • Use the dataset and the class boundaries on the x-axis.
    • Use either the frequency or the relative frequency as the y-axis.
    • Bars should touch each other (no gaps) when using a histogram with class boundaries.
    • The bars represent the frequencies (or relative frequencies) for each class.
  • Variants of the y-axis:
    • Frequency histogram: height equals $f_i$.
    • Relative frequency histogram: height equals $r_i$.
  • Example class heights (based on the described frequencies):
    • The first class might have a height corresponding to 14, the second to 11, and so on (as per the example in the lecture).
  • Interpreting the shape of a histogram:
    • Symmetric (approximately bell-shaped) histograms suggest a normal-like distribution and allow certain calculations to rely on symmetry.
    • Skewed right (positively skewed) or left (negatively skewed) indicates deviations from normality.
    • The mode is the class with the highest frequency; a histogram with a single peak is unimodal; two peaks indicate bimodal; no clear peak indicates no mode or a plateau.
  • Ogive (cumulative frequency polygon)
    • After constructing a cumulative frequency table, you can plot the cumulative frequencies against class boundaries (often the upper boundaries) to form an ogive.
    • The ogive helps in understanding the cumulative distribution and locating percentiles.
    • The construction involves adding the frequencies cumulatively (e.g., after the first class, the cumulative frequency is $f1$; after the second class, $f1+f_2$; etc.).
    • A common practice is to connect these points with a smooth curve to visualize the cumulative distribution.

Cumulative frequency and ogives: a quick example framework

  • Suppose you have the class frequencies $f1, f2, …, f_k$.
  • Cumulative frequencies are defined as:
    • C1 = f1, \ Ci = \sum{j=1}^i f_j \quad (i = 2, …, k).
  • The ogive is drawn by plotting points $(Bi, Ci)$ where $B_i$ are the class upper (or boundary) values.
  • This provides a way to read percentiles directly from the curve.

Qualitative data: bar charts, pie charts, and Pareto charts

  • When data are qualitative (categories), you typically use:
    • Bar chart: bars represent the frequency (or proportion) of each category; bars are separated by gaps, since categories are distinct.
    • Pie chart: slices represent the proportion of each category relative to the whole.
  • Pareto chart (special bar chart):
    • Bars are ordered from most frequent to least frequent (in descending order).
    • Often used with an overlaid cumulative percentage line to emphasize the Pareto principle (80/20 rule).
    • Distinguishes Pareto charts from standard bar charts by the ordered arrangement and the cumulative line.

Practical takeaways and strategies for exams

  • When constructing a frequency table from a dataset:
    • Decide the number of classes (k) first, keeping in mind the size of your dataset.
    • Compute class width using w = \frac{\max - \min}{k} and round up to a convenient integer.
    • Determine class limits with a consistent width, ensuring the entire data range is covered without gaps or overlaps.
    • Count frequencies by scanning the data or using a systematic counting approach.
    • Compute midpoints and class boundaries to prepare for histograms.
    • Compute relative frequencies and verify they sum to 1.
  • For histograms:
    • Use class boundaries on the x-axis to ensure bars touch (no gaps).
    • Decide whether to plot frequencies or relative frequencies on the y-axis.
  • For ogives:
    • Build a cumulative frequency column and plot against class boundaries to analyze percentiles.
  • For qualitative data:
    • Use bar charts for category frequencies; use Pareto charts when you want to emphasize the most frequent categories and show cumulative impact.
  • Conceptual notes:
    • Histograms reveal the shape of the distribution (symmetric, skewed, unimodal, bimodal).
    • The choice of class width and boundaries affects the appearance of the histogram and the interpretation of the data.
    • Always be mindful of rounding and how it affects totals and boundaries; document the rules used (e.g., rounding up the width, using boundaries 0.5 units away from class limits).

Quick cheat sheet: key formulas to memorize

  • Class width (rounded up):
  • w = \left\lceil \frac{\max - \min}{k} \right\rceil
  • Class limits (example start at L1):
  • L1 = \min, L{i+1} = Li + w, Ui = L_i + w - 1
  • Class boundaries (to avoid gaps):
  • Bi = [Li - \tfrac{1}{2}, U_i + \tfrac{1}{2})
  • Midpoints:
  • Mi = \frac{Li + U_i}{2}
  • Frequencies: f_i
  • Total observations: N = \sumi fi
  • Relative frequencies:
  • ri = \frac{fi}{N}
  • Cumulative frequencies (ogive):
  • C1 = f1, \ Ci = \sum{j=1}^i f_j \
  • Percentiles and cumulative interpretation come from the ogive plot.

Note on examples in the transcript

  • Several working numbers are provided as examples (e.g., max=47, min=1, k=6 leading to width 8; specific counts per class such as 14 in the first class and 11 in the second). These illustrate the step-by-step approach described above. If you work with a concrete dataset, apply the same steps and fill in the exact frequencies from your data.
  • The transcript also includes practical remarks about grading workflows and manual grade entry, which are context-specific to the course administration.