Notes: Visual Display of Data — Frequency Distributions, Bar Charts, Histograms, Scatter Plots

1 Frequency Distribution

  • Definition: A frequency distribution is a way of organizing data so that we can see how often each value (or group of values) occurs. Instead of looking at raw numbers in a long list, the data are grouped into categories or intervals, and the number of times each category appears (its frequency) is recorded.

2 Tabular Form

  • This shows the data in a table with categories (or intervals) and their corresponding frequencies.

  • Example: test scores of 20 students

    • Table 1: Example of a frequency distribution in tabular form

    • Table content:

    Score Range

    Frequency

    0–10

    2

    11–20

    4

    21–30

    6

    31–40

    5

    41–50

    3

  • Interpretation: This table tells us, for example, that 6 students scored between 21 and 30.

3 Graphical Form

  • Frequencies can be shown with visuals. Common types include Bar Chart, Histogram, and Scatter Plots.

3.1 Bar Chart

  • A bar chart is a graphical way of showing a frequency distribution.

  • It uses rectangular bars of equal width, where the height (or length) of each bar represents the frequency of a category or group.

  • Bar charts are best for categorical data (like favorite colors, fruits, or movie genres).

  • The categories are placed along the x-axis, and the frequencies are shown on the y-axis.

  • Bars are separated by gaps (unlike histograms, where bars touch).

  • Example: Survey of 10 students about their favorite fruit.

    • List: Apple, Orange, Banana, Apple, Mango, Apple, Banana, Mango, Apple, Orange

    • Frequency table:

    Fruit

    Frequency

    Apple

    4

    Orange

    2

    Banana

    2

    Mango

    2

    • Table 2: Frequency distribution of favorite fruits

    • Frequencies: Apple 4, Orange 2, Banana 2, Mango 2

    • Figure: Bar chart of favorite fruits

3.2 Histogram

  • A histogram is a graphical way of showing a frequency distribution for numerical data.

  • Unlike bar charts (for categorical data), histograms group numbers into intervals (called bins or classes), and the bins touch each other to show the continuous nature of the data.

  • Details:

    • The x-axis shows the intervals (e.g., score ranges).

    • The y-axis shows the frequency (how many data points fall in each interval).

    • Bins are adjacent (no gaps) because the data are numerical and continuous.

How to construct bin size or range for a Histogram
  • Step (1): Decide on the number of classes (bins).

    • If the dataset is large, choose around 10 to 20 classes; if the dataset is small, choose around 4 to 6 classes.

    • Thumb rule (from the transcript): Number of bins = number of observations, desired class size (at least 4).

  • Step (2): Compute the width of each class:

    • Class width = Range of data ÷ Number of classes

    • Always round this result up to a convenient number.

    • In formulas: w=Rangekw = \left\lceil \frac{\text{Range}}{k} \right\rceil where $k$ is the number of classes.

  • Step (3): Select the smallest data value as the lower limit of the first class, then add multiples of the class width to generate the lower limits of the remaining classes.

  • Step (4): Find the upper class limits by adding the class width to each lower limit, then subtract the smallest significant unit in the data (for whole numbers, subtract 1). This avoids overlap of class intervals.

    • In formula: U<em>i=L</em>i+ws,U<em>i = L</em>i + w - s, where $s$ is the smallest unit (for whole numbers, $s = 1$).

  • Step (5): Define class boundaries by taking the midpoint between the upper limit of one class and the lower limit of the next class. This ensures the intervals touch and all data values are included without ambiguity.

  • Example: Suppose 20 students had the following test scores:
    12, 25, 33, 41, 27, 38, 45, 21, 29, 19, 10, 30, 22, 36, 40, 14, 18, 32, 47, 24

    • Step (1): Number of classes. n = 20 observations. Since the data are small, we choose 5 classes.

    • Step (2): Class width. Range = 47 − 10 = 37. Class width = 37 ÷ 5 = 7.4 → round up to 8.

    • Step (3): Lower limits: 10, 18, 26, 34, 42.

    • Step (4): Upper class limits: 10 + 8 − 1 = 17; 18 + 8 − 1 = 25; 26 + 8 − 1 = 33; 34 + 8 − 1 = 41; 42 + 8 − 1 = 49.

    • Step (5): Class boundaries (±0.5 rule): [9.5, 17.5), [17.5, 25.5), [25.5, 33.5), [33.5, 41.5), [41.5, 49.5).

    • Frequency Table:

    • Class Interval (limits) | Class Boundaries | Frequency

    • 10–17 | 9.5–17.5 | 3

    • 18–25 | 17.5–25.5 | 6

    • 26–33 | 25.5–33.5 | 5

    • 34–41 | 33.5–41.5 | 4

    • 42–49 | 41.5–49.5 | 2

    • Figure 2: Histogram of test scores with class width = 8

2.2.3 Scatter Plots
  • A scatter plot is a graph that shows the relationship between two variables.

    • Each data point is shown as a dot on the coordinate plane:

    • The x-axis represents one variable.

    • The y-axis represents the other variable.

    • By looking at the pattern of points, we can see if the variables are related.

  • Uses of scatter plots:

    • To check if two variables have a positive relationship (as one increases, the other increases).

    • To check for a negative relationship (as one increases, the other decreases).

    • To detect if there is no clear relationship (points scattered randomly).

    • To identify possible outliers (points far away from the rest).

  • Examples: (Figure 3) Scatter plots showing positive (blue), negative (red), and no relationship (green) between two variables

    • Positive Relationship: points trend upward from left to right.

    • Negative Relationship: points trend downward from left to right.

    • No Clear Relationship: points show no discernible pattern.

  • Note: The transcript includes a Figure 3 illustrating these three cases (positive, negative, no clear relationship).