Chapter 2 Notes: Frequency Tables, Class Width, Boundaries, Histograms, and Qualitative Charts

Grading workflow (as described in the transcript)

The instructor adjusts assignment scores to a 100-point scale and then uploads them to Blackboard.
Example given: if a student scores 83, it will go to Blackboard as 800 (note: this phrasing in the transcript seems like a misstatement or a transcription error; the intended idea is that scores are scaled to a 100-point scale before entry). The key point: the grading is done manually to keep control over the grade book and avoid automatic synchronization issues between WebAssign and Blackboard.
The instructor plans to upload grades after every two or three assignments, and will upload assignment 1, 2, 3, 4 grades to Blackboard once the first exam is taken.
Students can ask questions if they don’t understand the process.

Chapter 2 overview: frequency tables and data organization

Main topics covered:
- Frequency tables: how to create them from a dataset by classifying data into intervals (classes).
- Class definitions: each interval has a lower limit and an upper limit (e.g., a class from 70 to 77 would have lower limit 70 and upper limit 77 in the example; other classes shown include 33–40, etc.).
- Purpose of classes: to group data into meaningful ranges for summarization.
Important terms:
- Classes (or class intervals): non-overlapping data ranges used to group observations.
- Lower limit of a class: the smallest value that can belong to that class.
- Upper limit of a class: the largest value that can belong to that class.

Deciding the number of classes

Step 1: Decide the number of classes (k).
- Guidance mentioned: for about 60 observations, 5–6 classes are reasonable; for thousands of observations, 15–16 classes may be used. There is no strict universal rule, but a problem may specify the number of classes.
Step 2: Decide the class width once the number of classes is chosen.
- The width is computed from the data:
- Given dataset with largest observation max and smallest observation min, and chosen number of classes k, the class width is:
- $w = \frac{\max - \min}{k}$
- Since class width should be a practical (often integer) size, you round up to the next whole number. The transcript notes the rounding rule: "round up" the resulting value.
- Example from the lecture (dataset described):
- Largest observation: 47, smallest: 1, number of classes k = 6.
- Compute $w = \left\lceil \frac{47 - 1}{6} \right\rceil = \left\lceil \frac{46}{6} \right\rceil = \left\lceil 7.666… \right\rceil = 8.$
- Note: the lecturer also showed a rough 7.7 value before rounding, which led to using 8 as the class width.
Step 3: Determine the class limits (class boundaries for the raw data).
- With min = 1 and w = 8, the lower class limits (L_i) are:
- L1 = 1, L2 = 1 + 8 = 9,
  L3 = 17, L4 = 25,
  L5 = 33, L6 = 41.
- Corresponding upper limits (U_i) for width 8 are:
- U1 = 1 + 8 - 1 = 8, U2 = 16,
  U3 = 24, U4 = 32,
  U5 = 40, U6 = 48-1 = 47.
- Resulting classes (inclusive on both ends):
- $[1,8], [9,16], [17,24], [25,32], [33,40], [41,47].$
- The transcript notes stopping at 47 because the largest observation is 47 (no need for an extra empty class).
Step 4: Populate the frequency column
- For each class, count how many observations fall into the interval (including the endpoints).
- Example counts given in the lecture:
- Class [1,8]: 14 observations.
- Class [9,16]: 11 observations.
- Counts for the remaining classes were described as "and so on" (the instructor indicated you can count directly or use the book).
Step 5: Midpoints
- The midpoint of a class is the average of its lower and upper limits:
- $\text{Midpoint}i = \frac{Li + U_i}{2}.$
- Example: for [1,8], midpoint is $\frac{1+8}{2} = 4.5.$
- Midpoints can be used for summaries or for constructing certain plots.
Step 6: Class boundaries (to prepare for a histogram)
- Class boundaries are used to avoid gaps between adjacent classes when drawing a histogram.
- The idea: there is a space between the upper limit of one class and the lower limit of the next class; take half of that space to assign as a boundary.
- In this dataset, since class width is 8, you add 0.5 to the upper limit to set the lower boundary of the next class:
- Boundaries for the first few classes:
- First class [1,8]: lower boundary = 1 - 0.5 = 0.5, upper boundary = 8 + 0.5 = 8.5
- Second class [9,16]: lower boundary = 8.5, upper boundary = 16.5
- Third class [17,24]: lower boundary = 16.5, upper boundary = 24.5
- And so on, producing a continuous set of boundaries such that there is no gap between bars.
- The boundaries can be described generically as:
- $Bi = [Li - \tfrac{1}{2}, U_i + \tfrac{1}{2})$ between successive classes.
- Boundary example for a start value of 2 (for illustration): if a class starts at 2 and has width 8, the boundary sequence would be 1.5, 9.5, 17.5, etc.
Step 7: Relative frequency
- Relative frequency for class i:
- $ri = \frac{fi}{N},$
- where $f_i$ is the class frequency and $N$ is the total number of observations.
- The sum of all relative frequencies equals 1 (ignoring rounding errors):
- $\sumi ri = 1.$
- Example from the transcript: if $N=60$ and $f1=14$, then $r1 = \frac{14}{60} \approx 0.233.$
- A class with a relative frequency of 0.35 corresponds to 35% of observations in that class.
Step 8: From the frequency distribution to interpretation
- The distribution lets you describe the dataset succinctly (e.g., what ranges most observations fall into).
- A brief written summary is commonly added below the histogram to interpret the results for readers (biologists, agriculturists, etc.).

Creating and interpreting a histogram

To draw a histogram:
- Use the dataset and the class boundaries on the x-axis.
- Use either the frequency or the relative frequency as the y-axis.
- Bars should touch each other (no gaps) when using a histogram with class boundaries.
- The bars represent the frequencies (or relative frequencies) for each class.
Variants of the y-axis:
- Frequency histogram: height equals $f_i$.
- Relative frequency histogram: height equals $r_i$.
Example class heights (based on the described frequencies):
- The first class might have a height corresponding to 14, the second to 11, and so on (as per the example in the lecture).
Interpreting the shape of a histogram:
- Symmetric (approximately bell-shaped) histograms suggest a normal-like distribution and allow certain calculations to rely on symmetry.
- Skewed right (positively skewed) or left (negatively skewed) indicates deviations from normality.
- The mode is the class with the highest frequency; a histogram with a single peak is unimodal; two peaks indicate bimodal; no clear peak indicates no mode or a plateau.
Ogive (cumulative frequency polygon)
- After constructing a cumulative frequency table, you can plot the cumulative frequencies against class boundaries (often the upper boundaries) to form an ogive.
- The ogive helps in understanding the cumulative distribution and locating percentiles.
- The construction involves adding the frequencies cumulatively (e.g., after the first class, the cumulative frequency is $f1$; after the second class, $f1+f_2$; etc.).
- A common practice is to connect these points with a smooth curve to visualize the cumulative distribution.

Cumulative frequency and ogives: a quick example framework

Suppose you have the class frequencies $f1, f2, …, f_k$.
Cumulative frequencies are defined as:
- $C1 = f1, \ Ci = \sum{j=1}^i f_j \quad (i = 2, …, k).$
The ogive is drawn by plotting points $(Bi, Ci)$ where $B_i$ are the class upper (or boundary) values.
This provides a way to read percentiles directly from the curve.

Qualitative data: bar charts, pie charts, and Pareto charts

When data are qualitative (categories), you typically use:
- Bar chart: bars represent the frequency (or proportion) of each category; bars are separated by gaps, since categories are distinct.
- Pie chart: slices represent the proportion of each category relative to the whole.
Pareto chart (special bar chart):
- Bars are ordered from most frequent to least frequent (in descending order).
- Often used with an overlaid cumulative percentage line to emphasize the Pareto principle (80/20 rule).
- Distinguishes Pareto charts from standard bar charts by the ordered arrangement and the cumulative line.

Practical takeaways and strategies for exams

When constructing a frequency table from a dataset:
- Decide the number of classes (k) first, keeping in mind the size of your dataset.
- Compute class width using $w = \frac{\max - \min}{k}$ and round up to a convenient integer.
- Determine class limits with a consistent width, ensuring the entire data range is covered without gaps or overlaps.
- Count frequencies by scanning the data or using a systematic counting approach.
- Compute midpoints and class boundaries to prepare for histograms.
- Compute relative frequencies and verify they sum to 1.
For histograms:
- Use class boundaries on the x-axis to ensure bars touch (no gaps).
- Decide whether to plot frequencies or relative frequencies on the y-axis.
For ogives:
- Build a cumulative frequency column and plot against class boundaries to analyze percentiles.
For qualitative data:
- Use bar charts for category frequencies; use Pareto charts when you want to emphasize the most frequent categories and show cumulative impact.
Conceptual notes:
- Histograms reveal the shape of the distribution (symmetric, skewed, unimodal, bimodal).
- The choice of class width and boundaries affects the appearance of the histogram and the interpretation of the data.
- Always be mindful of rounding and how it affects totals and boundaries; document the rules used (e.g., rounding up the width, using boundaries 0.5 units away from class limits).

Quick cheat sheet: key formulas to memorize

Class width (rounded up):
$w = \left\lceil \frac{\max - \min}{k} \right\rceil$
Class limits (example start at L1):
L1 = \min, L{i+1} = Li + w, Ui = L_i + w - 1
Class boundaries (to avoid gaps):
$Bi = [Li - \tfrac{1}{2}, U_i + \tfrac{1}{2})$
Midpoints:
$Mi = \frac{Li + U_i}{2}$
Frequencies: $f_i$
Total observations: $N = \sumi fi$
Relative frequencies:
$ri = \frac{fi}{N}$
Cumulative frequencies (ogive):
C1 = f1, \ Ci = \sum{j=1}^i f_j \
Percentiles and cumulative interpretation come from the ogive plot.

Note on examples in the transcript

Several working numbers are provided as examples (e.g., max=47, min=1, k=6 leading to width 8; specific counts per class such as 14 in the first class and 11 in the second). These illustrate the step-by-step approach described above. If you work with a concrete dataset, apply the same steps and fill in the exact frequencies from your data.
The transcript also includes practical remarks about grading workflows and manual grade entry, which are context-specific to the course administration.