Displaying and Summarizing Quantitative Data

  • Displaying and Summarizing Quantitative Data

    • Numerical variables take many values.
    • Example: Prices of shirts: 50, 35, 35, 40, 35, 25, 30, 55, 55, 65, 55, 90
    • To display quantitative data, use various graphs:
    • Bar Charts, Dot Plots, Stem-and-Leaf Displays, Histograms, Time Plots, Box Plots, Scatterplots
  • Dot Plots

    • Represents individual observations.
    • Construction:
    1. Draw a horizontal or vertical line.
    2. Label the variable and mark its values.
    3. Place dots above each value according to its frequency.
    • Works well for small datasets (n ≤ 50).
  • Describing Shapes and Spread

    • Distribution can be:
    • Uniform: no modes
    • Unimodal: one peak
    • Bimodal: two peaks
    • Multimodal: more than two peaks
    • Symmetry:
    • Symmetric: mirrored on both sides
    • Skewness:
      • Positively skewed: longer tail on right
      • Negatively skewed: longer tail on left
    • Outliers: Deviations from overall pattern.
  • Stem-and-Leaf Displays

    • Splits each observation into a stem (leading digits) and leaf (last digit).
    • Steps to create:
    1. Order data.
    2. Divide each observation.
    3. List stems in a column and arrange leaves accordingly on their rows.
  • Histograms

    • Common for depicting numerical data distributions.
    • Bars represent frequency within specific intervals (bins).
    • Steps to construct:
    1. Define intervals and their equal length.
    2. Create frequency table for the intervals.
    3. Draw bars for each interval showing frequency.
  • Numerical Summaries

    • Purpose: Reduce large datasets into key measures.
    • Notation:
    • Let y be the variable, n sample size.
    • y1, y2, ext{ and } y_n are data points.
  • Measures of Center

    • Mean: ar{y} = rac{ ext{Sum of observations}}{n}.
    • Median: M, middle value when data is ordered.
    • Mode: Most frequent value in the dataset.
  • Comparing Mean, Median, and Mode

    • Mean is sensitive to outliers; it's skewed in such cases.
    • Prefer median for skewed distributions since it resists outliers.
    • Symmetric distributions: mean = median = mode.
    • Skewed distributions:
    • Right skewed: Mean > Median > Mode
    • Left skewed: Mean < Median < Mode
  • Variability Measurements

    • Variability reflects data spread.
    • Common measures:
    • Range: ext{Range} = ext{max} - ext{min}
    • Variance & Standard Deviation:
      • Variance: s^2 = rac{ ext{Sum of squared deviations}}{n-1}
      • Standard Deviation: s = ext{sqrt}(s^2)
    • Interquartile Range (IQR): IQR = Q3 - Q1, the middle 50% of data.
  • Five-Number Summary

    • Consists of: Minimum, Q1, Median, Q3, Maximum.
    • Useful for boxplot construction:
    1. Draw a line for the data range.
    2. Draw box between Q1 and Q3.
    3. Mark the median line and whiskers for range.
    4. Identify outliers as data points that exceed the fences:
      • Upper Fence = Q3 + 1.5 imes IQR
      • Lower Fence = Q1 - 1.5 imes IQR
  • Time Plots

    • Useful for data across time, shows trends and fluctuations.
    • Always place time on the horizontal axis.
  • Conclusion

    • Graph data first to understand distribution.
    • Use mean and standard deviation for symmetric distributions; median and IQR for skewed ones.
    • Report means with and without outliers to reveal contrasts.