CH 4 Describing Data – Displaying and Exploring Data (Chapter 4)

Definition: A dot plot is a graphical display where each data point is represented as a dot along a number line. Identical values are stacked.
Key Features:
- Retains individual observation identity.
- Useful for small datasets.
- Easily identifies clusters and gaps.
Example: Comparing the number of vehicles serviced at two dealerships using dot plots.

Standard Deviation: The most common measure of dispersion.
Alternative Measures:
- Quartiles: Divide data into four equal parts.
- Deciles: Divide data into ten equal parts.
- Percentiles: Indicate the relative position of a value within the dataset.

Formula: Lp=(n+1)×P100Lp=(n+1)×100P
Example: Finding the 33rd percentile, median (50th percentile), and quartiles.

Definition: A graphical representation of data distribution based on five-number summary:
1. Minimum Value
2. Q1 (First Quartile, 25th Percentile)
3. Median (Q2, 50th Percentile)
4. Q3 (Third Quartile, 75th Percentile)
5. Maximum Value
Interquartile Range (IQR):
- IQR=Q3−Q1IQR=Q3−Q1
- Middle 50% of data falls within this range.
Outliers:
- Mild outliers: Beyond 1.5 * IQR.
- Extreme outliers: Beyond 3 * IQR.
Example: Creating a box plot for delivery times at Alexander’s Pizza.

Definition: Measures the asymmetry of data distribution.
Types of Skewness:
- Symmetric: Mean = Median = Mode.
- Positively Skewed (Right Skewed): Mean > Median > Mode.
- Negatively Skewed (Left Skewed): Mean < Median < Mode.
- Bimodal: Two peaks in the distribution.
Skewness Coefficient (Pearson’s Estimate):
- If skewness < -1 or > +1 → Highly skewed.
- If skewness between -1 and -0.5 or 0.5 and 1 → Moderately skewed.
- If skewness between -0.5 and +0.5 → Approximately symmetric.
Example: Computing skewness for earnings per share of software companies.

Definition: Graphical representation of the relationship between two variables.
How to Construct:
- X-axis: Independent variable.
- Y-axis: Dependent variable.
Example: Analyzing the relationship between profit earned on vehicle sales and the buyer’s age.

Definition: A table that summarizes two categorical variables.
Usage: Helps identify relationships between two variables.
Example: Comparing dealership profits by dealership location.
Key Insights from Contingency Table Analysis:
- Proportion of sales above and below the median at different dealerships.
- Identifying patterns in categorical data.