Displaying and Summarizing Quantitative Data
Displaying and Summarizing Quantitative Data
- Numerical variables take many values.
- Example: Prices of shirts: 50, 35, 35, 40, 35, 25, 30, 55, 55, 65, 55, 90
- To display quantitative data, use various graphs:
- Bar Charts, Dot Plots, Stem-and-Leaf Displays, Histograms, Time Plots, Box Plots, Scatterplots
Dot Plots
- Represents individual observations.
- Construction:
- Draw a horizontal or vertical line.
- Label the variable and mark its values.
- Place dots above each value according to its frequency.
- Works well for small datasets (n ≤ 50).
Describing Shapes and Spread
- Distribution can be:
- Uniform: no modes
- Unimodal: one peak
- Bimodal: two peaks
- Multimodal: more than two peaks
- Symmetry:
- Symmetric: mirrored on both sides
- Skewness:
- Positively skewed: longer tail on right
- Negatively skewed: longer tail on left
- Outliers: Deviations from overall pattern.
Stem-and-Leaf Displays
- Splits each observation into a stem (leading digits) and leaf (last digit).
- Steps to create:
- Order data.
- Divide each observation.
- List stems in a column and arrange leaves accordingly on their rows.
Histograms
- Common for depicting numerical data distributions.
- Bars represent frequency within specific intervals (bins).
- Steps to construct:
- Define intervals and their equal length.
- Create frequency table for the intervals.
- Draw bars for each interval showing frequency.
Numerical Summaries
- Purpose: Reduce large datasets into key measures.
- Notation:
- Let y be the variable, n sample size.
- y1, y2, ext{ and } y_n are data points.
Measures of Center
- Mean: ar{y} = rac{ ext{Sum of observations}}{n}.
- Median: M, middle value when data is ordered.
- Mode: Most frequent value in the dataset.
Comparing Mean, Median, and Mode
- Mean is sensitive to outliers; it's skewed in such cases.
- Prefer median for skewed distributions since it resists outliers.
- Symmetric distributions: mean = median = mode.
- Skewed distributions:
- Right skewed: Mean > Median > Mode
- Left skewed: Mean < Median < Mode
Variability Measurements
- Variability reflects data spread.
- Common measures:
- Range: ext{Range} = ext{max} - ext{min}
- Variance & Standard Deviation:
- Variance: s^2 = rac{ ext{Sum of squared deviations}}{n-1}
- Standard Deviation: s = ext{sqrt}(s^2)
- Interquartile Range (IQR): IQR = Q3 - Q1, the middle 50% of data.
Five-Number Summary
- Consists of: Minimum, Q1, Median, Q3, Maximum.
- Useful for boxplot construction:
- Draw a line for the data range.
- Draw box between Q1 and Q3.
- Mark the median line and whiskers for range.
- Identify outliers as data points that exceed the fences:
- Upper Fence = Q3 + 1.5 imes IQR
- Lower Fence = Q1 - 1.5 imes IQR
Time Plots
- Useful for data across time, shows trends and fluctuations.
- Always place time on the horizontal axis.
Conclusion
- Graph data first to understand distribution.
- Use mean and standard deviation for symmetric distributions; median and IQR for skewed ones.
- Report means with and without outliers to reveal contrasts.