Business Analytics I - Chapter 2: Displaying Data
Intro
- Technology aids in displaying statistics.
- The purpose of displaying data is to summarize information that isn't easily observable from a simple data table.
Intro
- Data summarization is used for:
- Categorical variables (nominal or ordinal).
- Quantitative variables (interval or ratio).
- Two variables using tables.
- Two variables using graphs.
- The chapter covers techniques for these cases.
Display: Categorical Variables
- Example: A survey of 25 JMU students regarding their state of residence.
- Question: Categorical or Numerical?
- Scale of measurement?
Display: Categorical Variables
- Summarizing data for categorical variables involves:
- Frequency distributions (relative, percent).
- Bar charts.
- Pie charts.
Display: Categorical Variables - Frequency Distributions
- Frequency, relative frequency, and percent frequency:
- A tabular summary of data showing frequencies of observations in each class.
- Example:
- Maryland: Frequency = 2, Relative Frequency = 2/25=0.08, Percent Frequency = 8%
- North Carolina: Frequency = 6, Relative Frequency = 0.24, Percent Frequency = 24%
- South Carolina: Frequency = 4, Relative Frequency = 0.16, Percent Frequency = 16%
- Virginia: Frequency = 12, Relative Frequency = 0.48, Percent Frequency = 48%
- West Virginia: Frequency = 1, Relative Frequency = 0.04, Percent Frequency = 4%
Display: Categorical Variables - Bar Charts
- Bar charts display frequencies of categorical data, especially when numerical values are of interest.
- Example: A bar chart showing the count of students from each state of residence (Maryland, North Carolina, South Carolina, Virginia, West Virginia).
- Question: Is the data nominal or ordinal?
Display: Categorical Variables - Pie Charts
- Pie charts: Pie segments represent the percentage of each category frequency.
- Used when numerical values are NOT of interest.
- Example: A pie chart showing the percentage of students from each state of residence.
Display: Categorical Variables - Pie Chart vs. Bar Chart
- Pie charts are used for comparing relative sizes of all possible categories.
- Bar charts are used to highlight actual data values.
Display: Categorical Variables - Pareto Charts
- Pareto charts: A type of bar chart that shows frequencies of categories in decreasing order.
- Commonly used in manufacturing scenarios for investigating quality control issues.
Display: Quantitative Variables
- Summarizing data for quantitative variables involves:
- Frequency distributions (relative, percent).
- Histograms.
- Ogive (cumulative).
- Stem-and-leaf display.
Display: Quantitative Variables - Motivation
- Consider the question: How often did this store sell two iPads in one day? Three iPads in one day?
- It takes too long to deduce this information without summarized representations.
Display: Quantitative Variables - Frequency Distribution
- Frequency distribution: Shows the number of observations that fall into specific intervals.
- Allows for faster information retrieval.
Display: Quantitative Variables - Relative Frequency Distribution
- Relative frequency distribution: Shows the proportion of observations that fall into specific intervals.
- Example:
- Number Sold Per Day: 0, Frequency: 5, Relative Frequency: 5/50=0.10
- Number Sold Per Day: 1, Frequency: 8, Relative Frequency: 8/50=0.16
- Number Sold Per Day: 2, Frequency: 14, Relative Frequency: 14/50=0.28
- Number Sold Per Day: 3, Frequency: 13, Relative Frequency: 13/50=0.26
- Number Sold Per Day: 4, Frequency: 6, Relative Frequency: 6/50=0.12
- Number Sold Per Day: 5, Frequency: 4, Relative Frequency: 4/50=0.08
- Total Frequency: 50, Total Relative Frequency: 1.00
Display: Quantitative Variables - Histogram
- Histogram: A graph showing the number of observations in each class of a frequency distribution.
- Question: Is the number of days and iPad sales per day discrete or continuous?
Display: Quantitative Variables - Ogive
- Ogive: A graph plotting the cumulative relative frequency distribution on the y-axis and bin endpoints on the x-axis.
- Question: Is the data discrete or continuous?
Display: Quantitative Variables - Stem-and-Leaf Display
- Stem-and-leaf display:
- Splits data into stems (large place value) and leaves (small place value).
- Lists leaves to the right of the stem values.
- Provides a rotated histogram-like view of the distribution.
Display: Quantitative Variables - Stem-and-Leaf Display Example
- Example: Stem and Leaf display
- 7 | 8 8 9 9 9
- 8 | 0 0 0 0 1 1 2 3 3 4 4 4 5 6 7 8
- 9 | 0 2 5
- Leaf unit = 1
- More detailed stems can be split in half (e.g., 70-74 and 75-79 instead of just 70s and 80s).
- Data: 11.3, 9.6, 10.4, 7.5, 8.3, 10.5, 10.0, 9.3, 8.1, 7.7, 7.5, 8.4, 6.3, 8.8
- Task: Construct a stem-and-leaf display for the data.
- Leaf unit = 0.1
Display: Two Variables - Tables
- Summarizing data for two variables using tables:
- Contingency tables (displays data for one variable in rows and data for another variable in columns).
Display: Two Variables - Contingency Tables
- Contingency table: A cross-tabulation method to summarize two categorical and/or quantitative variables.
- Example: ABC Store Sales Details for August (count of customers)
- 18 out of 200 customers were returning customers and paid using cash: 18/200=0.09
- Contingency Table:
- Rows: Returning customers, First-time customers, Total
- Columns: Cash, Credit, Payment app, Total
- Example data points: Returning customers using Cash: 18 (count), 0.09 (relative frequency).
Display: Two Variables - Graphs
- Summarizing data for two variables using graphs:
- Scatter plot.
- Side-by-side bar chart.
- Stacked bar chart.
Display: Two Variables - Graphs - Scatter Plot
- Scatter plot: Provides a picture of the relationship between two data points that are paired.
- The dependent variable (y-axis) is influenced by changes in the independent variable (x-axis).
- Example: Hours studied (x-axis) vs. Exam score (y-axis) for twenty students.
Display: Two Variables - Graphs - Scatter Plot Example
- Example: Scatter Plot of Exam Performance vs. Hours Studied
- Illustrates the relationship between hours studied and exam scores.
Display: Two Variables - Graphs - Side-by-Side and Stacked Bar Charts
- Example: Restaurant data depicting average price per plate and quality rating.
- Average price per plate: $10, $15, $20, $25
- Quality rating: Poor, Average, Good, Excellent
- Side-by-side bar chart: A graphical display for depicting multiple bar charts on the same display.
- Used to compare average price per plate vs. quality rating.
Display: Two Variables - Graphs - Stacked Bar Chart
- Stacked bar chart: A graphical display for depicting multiple variables on the same display.
- Used to compare average price per plate vs. quality rating.
Display: Summary
- Distribution of data:
- Bar chart: Frequencies for categorical data.
- Pie chart: Relative frequencies for categorical data.
- Histogram: Frequency distribution for quantitative data.
- Ogive: Percent frequency distribution for quantitative data.
- Stem-and-leaf display: Rank order and shape of distribution for quantitative data.
- Comparisons of data:
- Side-by-side bar chart: Compare relative frequency of two variables.
- Stacked bar chart: Compare relative frequency of two variables.
- Relationships of data:
- Scatter plot: Shows the relationship between two quantitative variables.
Copyright
- Note: Select figures, tables, and examples were obtained from the course textbook publisher.
- Donnelly, Robert A. (2015) Business Statistics. Upper Saddle River, New Jersey: Pearson Education, Inc. (Custom edition for James Madison University) ISBN:978-1-269-88344-3.