Displaying Qualitative & Quantitative Variables – Comprehensive Study Notes
Frequency Distribution Tables (FDT) – Categorical Variables
Purpose
• Organize raw, unordered counts of qualitative data into an easy-to-scan table, providing a structured overview of the data.
• Shows how individuals or objects are spread across various distinct categories or classes, making patterns and distributions immediately visible.
Essential columns
• Class/Category name (qualitative label): A descriptive name for each group or type of observation.
• Frequency f: The raw count, indicating the number of observations that fall into each specific category. This is derived by tallying the occurrences of each category.
• Relative Frequency \text{rf}=f/n: The proportion of observations in a given category, calculated by dividing the category's frequency (f) by the total number of observations (n). This can also be expressed as a percentage (\text{rf} \times 100\%).
Interpreting an FDT
• Allows for quick identification of the categories with the highest and lowest frequencies, highlighting dominant or rare groups.
• Relative frequencies are particularly useful for comparing distributions across different data sets, even if they have varying total sample sizes (n), as they standardize the counts into proportions.
Classroom blood-type example (30 observations)
• RAW DATA: A, O, AB, A, A, …, B (An unsorted list of individual blood types from 30 students).
• Finished FDT (computed during lecture):
O: 13 observations, representing the most frequent blood type.
B: 8 observations.
A: 7 observations.
AB: 2 observations, representing the least frequent blood type.
– Corresponding Relative Frequencies (calculated as (f/30) \times 100\%):
O: 43.3\% (13/30)
B: 26.7\% (8/30)
A: 23.3\% (7/30)
AB: 6.7\% (2/30)
– The sum of relative frequencies should approximate 100\% (due to rounding).
Bar Graphs
Definition & Anatomy
• Consist of vertical or horizontal bars where the height or length of each bar is directly proportional to the frequency (or relative frequency) of a category.
• One axis (typically the horizontal axis for vertical bars) lists the distinct, non-overlapping categories, while the other axis shows a numeric scale for frequency, starting from zero.
• Bars are separated by gaps to visually indicate that the categories are distinct and qualitative.
Strengths
• Patterns, such as the most or least common categories, and extreme values stand out much faster than when reviewing a table.
• Extremely useful for direct comparison of magnitudes and clear visual ranking of categories. For example, it's easy to see if one category has twice the frequency of another.
Limitations / Pitfalls
• Too many categories can lead to chart clutter, making it difficult to distinguish individual bars or their labels.
• A very wide range of frequencies can exaggerate the tallest bar and diminish the visibility and perceived importance of others, especially when scale is not carefully chosen.
Best-practice checklist
• Clear Axis labels: Both axes must be clearly labeled (e.g., "Blood Type" and "Number of Students").
• Descriptive Title: A concise title that explains the graph's content.
• Proper Scaling: The frequency axis should have appropriate increments.
• Uniform Bar Width: All bars must have the same width to prevent misinterpretation of magnitude.
• Zero Baseline: The frequency axis must start at 0 to accurately represent magnitudes; omitting this can severely distort perceived differences.
• Avoid 3-D effects: Three-dimensional effects or artistic enhancements can distort proportions and make interpretation difficult.
• Accessible Colour Palettes: Use colour schemes that are colour-blind friendly and provide sufficient contrast.
Pie Charts
Definition
• A circular graph partitioned into wedges (sectors), where each wedge represents a category.
• The angle of each wedge (and thus its area) is proportional to the frequency or percentage that the category represents of the whole (360^ ext{o} circle).
When to use
• Primarily used to highlight part-to-whole relationships, illustrating how each category contributes to the total sum. It visually represents proportions.
• Best suited for audiences that prefer a simple, at-a-glance visualization of proportions without needing exact numerical comparison, particularly when the number of categories is small (maximum of approximately 6 categories is recommended for clarity).
Strengths
• Provides an immediate visual sense of the weight or proportion each class holds relative to the entire dataset.
• Requires minimal statistical background to interpret, making them accessible to a broad audience.
Limitations
• They do not effectively reveal patterns or trends across categories, such as changes over time or relationships between categories.
• Easy to mislead viewers through techniques such as re-ordering slices, using