Displaying Qualitative & Quantitative Variables – Comprehensive Study Notes

Frequency Distribution Tables (FDT) – Categorical Variables

  • Purpose

    • Organize raw, unordered counts of qualitative data into an easy-to-scan table, providing a structured overview of the data.

    • Shows how individuals or objects are spread across various distinct categories or classes, making patterns and distributions immediately visible.

  • Essential columns

    Class/Category name (qualitative label): A descriptive name for each group or type of observation.

    Frequency f: The raw count, indicating the number of observations that fall into each specific category. This is derived by tallying the occurrences of each category.

    Relative Frequency \text{rf}=f/n: The proportion of observations in a given category, calculated by dividing the category's frequency (f) by the total number of observations (n). This can also be expressed as a percentage (\text{rf} \times 100\%).

  • Interpreting an FDT

    • Allows for quick identification of the categories with the highest and lowest frequencies, highlighting dominant or rare groups.

    • Relative frequencies are particularly useful for comparing distributions across different data sets, even if they have varying total sample sizes (n), as they standardize the counts into proportions.

  • Classroom blood-type example (30 observations)

    • RAW DATA: A, O, AB, A, A, …, B (An unsorted list of individual blood types from 30 students).

    • Finished FDT (computed during lecture):

    • O: 13 observations, representing the most frequent blood type.

    • B: 8 observations.

    • A: 7 observations.

    • AB: 2 observations, representing the least frequent blood type.

    – Corresponding Relative Frequencies (calculated as (f/30) \times 100\%):

    • O: 43.3\% (13/30)

    • B: 26.7\% (8/30)

    • A: 23.3\% (7/30)

    • AB: 6.7\% (2/30)

    – The sum of relative frequencies should approximate 100\% (due to rounding).

Bar Graphs

  • Definition & Anatomy

    • Consist of vertical or horizontal bars where the height or length of each bar is directly proportional to the frequency (or relative frequency) of a category.

    • One axis (typically the horizontal axis for vertical bars) lists the distinct, non-overlapping categories, while the other axis shows a numeric scale for frequency, starting from zero.

    • Bars are separated by gaps to visually indicate that the categories are distinct and qualitative.

  • Strengths

    • Patterns, such as the most or least common categories, and extreme values stand out much faster than when reviewing a table.

    • Extremely useful for direct comparison of magnitudes and clear visual ranking of categories. For example, it's easy to see if one category has twice the frequency of another.

  • Limitations / Pitfalls

    • Too many categories can lead to chart clutter, making it difficult to distinguish individual bars or their labels.

    • A very wide range of frequencies can exaggerate the tallest bar and diminish the visibility and perceived importance of others, especially when scale is not carefully chosen.

  • Best-practice checklist

    Clear Axis labels: Both axes must be clearly labeled (e.g., "Blood Type" and "Number of Students").

    Descriptive Title: A concise title that explains the graph's content.

    Proper Scaling: The frequency axis should have appropriate increments.

    Uniform Bar Width: All bars must have the same width to prevent misinterpretation of magnitude.

    Zero Baseline: The frequency axis must start at 0 to accurately represent magnitudes; omitting this can severely distort perceived differences.

    Avoid 3-D effects: Three-dimensional effects or artistic enhancements can distort proportions and make interpretation difficult.

    Accessible Colour Palettes: Use colour schemes that are colour-blind friendly and provide sufficient contrast.

Pie Charts

  • Definition

    • A circular graph partitioned into wedges (sectors), where each wedge represents a category.

    • The angle of each wedge (and thus its area) is proportional to the frequency or percentage that the category represents of the whole (360^ ext{o} circle).

  • When to use

    • Primarily used to highlight part-to-whole relationships, illustrating how each category contributes to the total sum. It visually represents proportions.

    • Best suited for audiences that prefer a simple, at-a-glance visualization of proportions without needing exact numerical comparison, particularly when the number of categories is small (maximum of approximately 6 categories is recommended for clarity).

  • Strengths

    • Provides an immediate visual sense of the weight or proportion each class holds relative to the entire dataset.

    • Requires minimal statistical background to interpret, making them accessible to a broad audience.

  • Limitations

    • They do not effectively reveal patterns or trends across categories, such as changes over time or relationships between categories.

    • Easy to mislead viewers through techniques such as re-ordering slices, using