Constructing Graphical and Tabular Displays of Data

Frequency and Relative Frequency

  • The frequency of a class is the total count of observations within that class.

  • The relative frequency of a class is the proportion of total observations belonging to that class.

  • For a numerical variable, the sum of all relative frequencies is equal to 11.

Constructing Histograms

  • To construct a frequency histogram, write lower class limits (e.g., 3030, 4040, 5050, \dots, 100100, and 110110) equally spaced on the horizontal axis and frequencies (e.g., 00, 22, 44, 66, 88, 1010) on the vertical axis.

  • To construct a relative frequency histogram, the vertical axis typically displays proportions (e.g., 0.050.05, 0.100.10, 0.150.15, \dots, 0.300.30).

  • Histograms consist of rectangles where the width represents the class width and the height represents the frequency or relative frequency.

Density Histograms

  • In a density histogram, the area of each bar is equal to the relative frequency of that bar's class.

  • The total area of all bars in a density histogram must equal 11.

  • Proportions can be determined using bar areas:

    • In a study of 20182018 Major League Baseball® (MLB) ticket prices, the proportion of stadiums with prices between 3030 and 39.9939.99 inclusive was found by summing bar areas: 0.20+0.13=0.330.20 + 0.13 = 0.33.

    • The proportion for prices less than 2020 was 0.030.03.

    • The proportion for prices "at least" 2020 can be found by summing all bars from 2020 upward (0.17+0.27+0.20+0.13+0.10+0.03+0.07=0.970.17 + 0.27 + 0.20 + 0.13 + 0.10 + 0.03 + 0.07 = 0.97) or by using the complement: 10.03=0.971 - 0.03 = 0.97.

Distribution Shapes and Modality

  • Unimodal: A distribution with one mound.

  • Bimodal: A distribution with two mounds.

  • Multimodal: A distribution with more than two mounds.

  • Skewed-left: The left tail is longer than the right tail.

  • Skewed-right: The right tail is longer than the left tail.

  • Symmetric: The left tail is approximately a mirror image of the right tail.

Analyzing Distributions of Numerical Variables

  • The four characteristics of a distribution should be determined in the following order:

    1. Identify all outliers. Correct or remove those stemming from errors; consider separate studies for others.

    2. Determine the shape. Evaluate if subgroups should be analyzed separately for bimodal or multimodal data.

    3. Measure and interpret the center.

    4. Describe the spread.

Mathematical Models

  • A model is defined as a mathematical description of an authentic situation.