Chapter 2 Notes: Frequency Tables, Class Width, Boundaries, Histograms, and Qualitative Charts
Grading workflow (as described in the transcript)
- The instructor adjusts assignment scores to a 100-point scale and then uploads them to Blackboard.
- Example given: if a student scores 83, it will go to Blackboard as 800 (note: this phrasing in the transcript seems like a misstatement or a transcription error; the intended idea is that scores are scaled to a 100-point scale before entry). The key point: the grading is done manually to keep control over the grade book and avoid automatic synchronization issues between WebAssign and Blackboard.
- The instructor plans to upload grades after every two or three assignments, and will upload assignment 1, 2, 3, 4 grades to Blackboard once the first exam is taken.
- Students can ask questions if they don’t understand the process.
Chapter 2 overview: frequency tables and data organization
- Main topics covered:
- Frequency tables: how to create them from a dataset by classifying data into intervals (classes).
- Class definitions: each interval has a lower limit and an upper limit (e.g., a class from 70 to 77 would have lower limit 70 and upper limit 77 in the example; other classes shown include 33–40, etc.).
- Purpose of classes: to group data into meaningful ranges for summarization.
- Important terms:
- Classes (or class intervals): non-overlapping data ranges used to group observations.
- Lower limit of a class: the smallest value that can belong to that class.
- Upper limit of a class: the largest value that can belong to that class.
Deciding the number of classes
- Step 1: Decide the number of classes (k).
- Guidance mentioned: for about 60 observations, 5–6 classes are reasonable; for thousands of observations, 15–16 classes may be used. There is no strict universal rule, but a problem may specify the number of classes.
- Step 2: Decide the class width once the number of classes is chosen.
- The width is computed from the data:
- Given dataset with largest observation max and smallest observation min, and chosen number of classes k, the class width is:
- w = \frac{\max - \min}{k}
- Since class width should be a practical (often integer) size, you round up to the next whole number. The transcript notes the rounding rule: "round up" the resulting value.
- Example from the lecture (dataset described):
- Largest observation: 47, smallest: 1, number of classes k = 6.
- Compute w = \left\lceil \frac{47 - 1}{6} \right\rceil = \left\lceil \frac{46}{6} \right\rceil = \left\lceil 7.666… \right\rceil = 8.
- Note: the lecturer also showed a rough 7.7 value before rounding, which led to using 8 as the class width.
- Step 3: Determine the class limits (class boundaries for the raw data).
- With min = 1 and w = 8, the lower class limits (L_i) are:
- L1 = 1,
L2 = 1 + 8 = 9,
L3 = 17,
L4 = 25,
L5 = 33,
L6 = 41. - Corresponding upper limits (U_i) for width 8 are:
- U1 = 1 + 8 - 1 = 8,
U2 = 16,
U3 = 24,
U4 = 32,
U5 = 40,
U6 = 48-1 = 47. - Resulting classes (inclusive on both ends):
- [1,8], [9,16], [17,24], [25,32], [33,40], [41,47].
- The transcript notes stopping at 47 because the largest observation is 47 (no need for an extra empty class).
- Step 4: Populate the frequency column
- For each class, count how many observations fall into the interval (including the endpoints).
- Example counts given in the lecture:
- Class [1,8]: 14 observations.
- Class [9,16]: 11 observations.
- Counts for the remaining classes were described as "and so on" (the instructor indicated you can count directly or use the book).
- Step 5: Midpoints
- The midpoint of a class is the average of its lower and upper limits:
- \text{Midpoint}i = \frac{Li + U_i}{2}.
- Example: for [1,8], midpoint is \frac{1+8}{2} = 4.5.
- Midpoints can be used for summaries or for constructing certain plots.
- Step 6: Class boundaries (to prepare for a histogram)
- Class boundaries are used to avoid gaps between adjacent classes when drawing a histogram.
- The idea: there is a space between the upper limit of one class and the lower limit of the next class; take half of that space to assign as a boundary.
- In this dataset, since class width is 8, you add 0.5 to the upper limit to set the lower boundary of the next class:
- Boundaries for the first few classes:
- First class [1,8]: lower boundary = 1 - 0.5 = 0.5, upper boundary = 8 + 0.5 = 8.5
- Second class [9,16]: lower boundary = 8.5, upper boundary = 16.5
- Third class [17,24]: lower boundary = 16.5, upper boundary = 24.5
- And so on, producing a continuous set of boundaries such that there is no gap between bars.
- The boundaries can be described generically as:
- Bi = [Li - \tfrac{1}{2}, U_i + \tfrac{1}{2}) between successive classes.
- Boundary example for a start value of 2 (for illustration): if a class starts at 2 and has width 8, the boundary sequence would be 1.5, 9.5, 17.5, etc.
- Step 7: Relative frequency
- Relative frequency for class i:
- ri = \frac{fi}{N},
- where $f_i$ is the class frequency and $N$ is the total number of observations.
- The sum of all relative frequencies equals 1 (ignoring rounding errors):
- \sumi ri = 1.
- Example from the transcript: if $N=60$ and $f1=14$, then r1 = \frac{14}{60} \approx 0.233.
- A class with a relative frequency of 0.35 corresponds to 35% of observations in that class.
- Step 8: From the frequency distribution to interpretation
- The distribution lets you describe the dataset succinctly (e.g., what ranges most observations fall into).
- A brief written summary is commonly added below the histogram to interpret the results for readers (biologists, agriculturists, etc.).
Creating and interpreting a histogram
- To draw a histogram:
- Use the dataset and the class boundaries on the x-axis.
- Use either the frequency or the relative frequency as the y-axis.
- Bars should touch each other (no gaps) when using a histogram with class boundaries.
- The bars represent the frequencies (or relative frequencies) for each class.
- Variants of the y-axis:
- Frequency histogram: height equals $f_i$.
- Relative frequency histogram: height equals $r_i$.
- Example class heights (based on the described frequencies):
- The first class might have a height corresponding to 14, the second to 11, and so on (as per the example in the lecture).
- Interpreting the shape of a histogram:
- Symmetric (approximately bell-shaped) histograms suggest a normal-like distribution and allow certain calculations to rely on symmetry.
- Skewed right (positively skewed) or left (negatively skewed) indicates deviations from normality.
- The mode is the class with the highest frequency; a histogram with a single peak is unimodal; two peaks indicate bimodal; no clear peak indicates no mode or a plateau.
- Ogive (cumulative frequency polygon)
- After constructing a cumulative frequency table, you can plot the cumulative frequencies against class boundaries (often the upper boundaries) to form an ogive.
- The ogive helps in understanding the cumulative distribution and locating percentiles.
- The construction involves adding the frequencies cumulatively (e.g., after the first class, the cumulative frequency is $f1$; after the second class, $f1+f_2$; etc.).
- A common practice is to connect these points with a smooth curve to visualize the cumulative distribution.
Cumulative frequency and ogives: a quick example framework
- Suppose you have the class frequencies $f1, f2, …, f_k$.
- Cumulative frequencies are defined as:
- C1 = f1, \ Ci = \sum{j=1}^i f_j \quad (i = 2, …, k).
- The ogive is drawn by plotting points $(Bi, Ci)$ where $B_i$ are the class upper (or boundary) values.
- This provides a way to read percentiles directly from the curve.
Qualitative data: bar charts, pie charts, and Pareto charts
- When data are qualitative (categories), you typically use:
- Bar chart: bars represent the frequency (or proportion) of each category; bars are separated by gaps, since categories are distinct.
- Pie chart: slices represent the proportion of each category relative to the whole.
- Pareto chart (special bar chart):
- Bars are ordered from most frequent to least frequent (in descending order).
- Often used with an overlaid cumulative percentage line to emphasize the Pareto principle (80/20 rule).
- Distinguishes Pareto charts from standard bar charts by the ordered arrangement and the cumulative line.
Practical takeaways and strategies for exams
- When constructing a frequency table from a dataset:
- Decide the number of classes (k) first, keeping in mind the size of your dataset.
- Compute class width using w = \frac{\max - \min}{k} and round up to a convenient integer.
- Determine class limits with a consistent width, ensuring the entire data range is covered without gaps or overlaps.
- Count frequencies by scanning the data or using a systematic counting approach.
- Compute midpoints and class boundaries to prepare for histograms.
- Compute relative frequencies and verify they sum to 1.
- For histograms:
- Use class boundaries on the x-axis to ensure bars touch (no gaps).
- Decide whether to plot frequencies or relative frequencies on the y-axis.
- For ogives:
- Build a cumulative frequency column and plot against class boundaries to analyze percentiles.
- For qualitative data:
- Use bar charts for category frequencies; use Pareto charts when you want to emphasize the most frequent categories and show cumulative impact.
- Conceptual notes:
- Histograms reveal the shape of the distribution (symmetric, skewed, unimodal, bimodal).
- The choice of class width and boundaries affects the appearance of the histogram and the interpretation of the data.
- Always be mindful of rounding and how it affects totals and boundaries; document the rules used (e.g., rounding up the width, using boundaries 0.5 units away from class limits).
- Class width (rounded up):
- w = \left\lceil \frac{\max - \min}{k} \right\rceil
- Class limits (example start at L1):
- L1 = \min,
L{i+1} = Li + w,
Ui = L_i + w - 1
- Class boundaries (to avoid gaps):
- Bi = [Li - \tfrac{1}{2}, U_i + \tfrac{1}{2})
- Midpoints:
- Mi = \frac{Li + U_i}{2}
- Frequencies: f_i
- Total observations: N = \sumi fi
- Relative frequencies:
- ri = \frac{fi}{N}
- Cumulative frequencies (ogive):
- C1 = f1, \ Ci = \sum{j=1}^i f_j \
- Percentiles and cumulative interpretation come from the ogive plot.
Note on examples in the transcript
- Several working numbers are provided as examples (e.g., max=47, min=1, k=6 leading to width 8; specific counts per class such as 14 in the first class and 11 in the second). These illustrate the step-by-step approach described above. If you work with a concrete dataset, apply the same steps and fill in the exact frequencies from your data.
- The transcript also includes practical remarks about grading workflows and manual grade entry, which are context-specific to the course administration.