Displaying and Summarizing Quantitative Data
Module 2 - Section 2
Displaying and Summarizing Quantitative Data
Introduction to Quantitative Data
Quantitative variables often take many values, exemplified by the prices of walking shoes:
Example Prices: 90, 70, 70, 70, 75, 70, 65, 68, 60, 74, 70, 95, 75, 68, 85, 40, 65
The need for visual representations arises to display the data and illustrate the distribution effectively.
Types of Graphs for Numerical Data
Categorical Variables:
Bar Chart
Pie Chart
Numerical Variables:
Dot Plots
Stem Plots
Histograms
Time Plots
Box Plots
Scatterplots
Dot Plots
Definition: A dot plot is a graphical display that portrays individual observations.
Construction Steps:
Draw a horizontal (or vertical) line.
Label the line with the name of the variable and mark the regular values of the variable on it.
For each observation, place a dot above (or next to) its value on the number line.
Important Notes:
The number of dots above a value indicates the frequency of occurrence of that value.
Dot plots are more effective for small datasets (n ≤ 50).
Example: Dotplot for Prices of Walking Shoes
Data for 17 walking shoes:** 90, 70, 70, 70, 75, 70, 65, 68, 60, 74, 70, 95, 75, 68, 85, 40, 65**
Dotplot created using the data above displays frequency and distribution.
Describing a Distribution
Important aspects to describe a plot include:
Shapes: Determine the nature of the distribution (unimodal, bimodal, multimodal).
Modes: Characterization based on the number of humps or peaks in the distribution.
Symmetry or Skewness: Aspects related to the balance of the distribution.
Deviations or Outliers: Identifying unusual values that deviate from the overall pattern.
Center: Refers to the value that divides the data in half, indicating a typical range.
Spread: Measures the range of values and the concentration around the center.
Shapes of Distributions
Unimodal: One peak.
Bimodal: Two peaks; may indicate two different groups within the data.
Multimodal: More than two peaks; rarely occurs.
Uniform: No clear modes (flat distribution).
Skewness in Data
Symmetric: If a vertical line can divide the graph into mirror images on either side.
Positively Skewed: Histogram stretches out more to the right; the upper tail is longer.
Negatively Skewed: Histogram stretches out more to the left; the lower tail is longer.
Outlier: An observation that falls outside the overall pattern.
Center and Spread of Data
Center: Value that splits the data in half, indicating typical values within the dataset.
Measures of central tendency include mean, median, and mode.
Spread: Refers to the variation within the data.
Important measures include range, standard deviation, and interquartile range (IQR).
Stem-and-Leaf Displays (Stemplots)
Definition: A stemplot or stem-and-leaf display effectively shows individual observations of quantitative data; suitable for small datasets (n ≤ 50).
Construction Steps:
Order the data from smallest to largest.
Divide each number into two parts: the stem (leading digits) and the leaf (last digit).
Label the bins using the stem.
List stems in a column and record leaf portions in the corresponding rows.
Provide a key to decode the stemplot.
Example: Stemplot for Prices of Walking Shoes
Ordered Prices: 40, 60, 65, 65, 68, 68, 70, 70, 70, 70, 70, 74, 75, 75, 85, 90, 95
Stem and Leaf Representation:
Stem: 4, 5, 6, 7, 8, 9
Leaves: 0, 0, 5, 5, 8, 8, 0, 0, 0, 0, 0, 4, 5, 5, 5, 0, 5
Notes that the leaf '5' represents $75.
Back-to-back Stemplot
Definition: This is effective for comparing the distribution of two groups side by side.
Construction: Use the same principles as stemplots but position one group's leaves on one side and the other group's leaves on the opposite.
Histogram
Definition: The most common graph for depicting numerical data which visualizes the distribution of an underlying variable.
Description: Utilizes bars to represent frequency or relative frequency of measurements falling within specified equal-width intervals (bins).
Constructing a Histogram
Steps to create:
Decide on intervals of equal length (bins) for data.
Use the left-inclusive method for class intervals; this determines where boundary values fall.
Create a frequency table for intervals.
Mark interval boundaries on the horizontal axis and frequency/relative frequency on the vertical axis.
Draw the bars to represent the class intervals with heights according to frequency or relative frequency.
Example: Histogram for Prices of Walking Shoes
Data for Analysis: Prices of walking shoes include: 40, 60, 65, 65, 68, 68, 70, 70, 70, 70, 70, 74, 75, 75, 85, 90, 95
Use a bin width of 10 for the histogram.
Totals from frequency table reveal distribution:
Class Intervals: [40; 50), [50; 60), [60; 70), [70; 80), [80; 90), [90; 100)
Frequency values corresponding to each interval.
Relative Frequency and Proportions
Important calculations:
Determine the proportion of walking shoe prices falling on or above $70.
Percent of prices falling below $70 calculated using the frequency table derived from histogram.
Frequency Table Example:
Class Intervals | Frequency | Relative Frequency |
|---|---|---|
[40; 50) | 1 | 5.88% |
[50; 60) | 0 | 0% |
[60; 70) | 5 | 29.41% |
[70; 80) | 8 | 47.06% |
[80; 90) | 1 | 5.88% |
[90; 100) | 2 | 11.76% |
Total | 17 | 100.00% |
Important Notes
Data values are clearly retained with dot plots and stem-and-leaf plots but are lost in histograms.
In bar charts (for categorical data), spaces between bars indicate distinct counts of categories.
Conversely, in histograms, gaps indicate regions without data points, enhancing visualization of distribution gaps.
Conclusion
Summary of methods to visualize and summarize quantitative data includes dot plots, stem-and-leaf displays, and histograms.
Acknowledgment
Thank you for watching this video!