Business Analytics I - Chapter 2: Displaying Data

Intro

  • Technology aids in displaying statistics.
  • The purpose of displaying data is to summarize information that isn't easily observable from a simple data table.

Intro

  • Data summarization is used for:
    • Categorical variables (nominal or ordinal).
    • Quantitative variables (interval or ratio).
    • Two variables using tables.
    • Two variables using graphs.
  • The chapter covers techniques for these cases.

Display: Categorical Variables

  • Example: A survey of 25 JMU students regarding their state of residence.
    • Question: Categorical or Numerical?
    • Scale of measurement?

Display: Categorical Variables

  • Summarizing data for categorical variables involves:
    • Frequency distributions (relative, percent).
    • Bar charts.
    • Pie charts.

Display: Categorical Variables - Frequency Distributions

  • Frequency, relative frequency, and percent frequency:
    • A tabular summary of data showing frequencies of observations in each class.
    • Example:
      • Maryland: Frequency = 2, Relative Frequency = 2/25=0.082/25 = 0.08, Percent Frequency = 8%
      • North Carolina: Frequency = 6, Relative Frequency = 0.24, Percent Frequency = 24%
      • South Carolina: Frequency = 4, Relative Frequency = 0.16, Percent Frequency = 16%
      • Virginia: Frequency = 12, Relative Frequency = 0.48, Percent Frequency = 48%
      • West Virginia: Frequency = 1, Relative Frequency = 0.04, Percent Frequency = 4%

Display: Categorical Variables - Bar Charts

  • Bar charts display frequencies of categorical data, especially when numerical values are of interest.
  • Example: A bar chart showing the count of students from each state of residence (Maryland, North Carolina, South Carolina, Virginia, West Virginia).
  • Question: Is the data nominal or ordinal?

Display: Categorical Variables - Pie Charts

  • Pie charts: Pie segments represent the percentage of each category frequency.
  • Used when numerical values are NOT of interest.
  • Example: A pie chart showing the percentage of students from each state of residence.

Display: Categorical Variables - Pie Chart vs. Bar Chart

  • Pie charts are used for comparing relative sizes of all possible categories.
  • Bar charts are used to highlight actual data values.

Display: Categorical Variables - Pareto Charts

  • Pareto charts: A type of bar chart that shows frequencies of categories in decreasing order.
  • Commonly used in manufacturing scenarios for investigating quality control issues.

Display: Quantitative Variables

  • Summarizing data for quantitative variables involves:
    • Frequency distributions (relative, percent).
    • Histograms.
    • Ogive (cumulative).
    • Stem-and-leaf display.

Display: Quantitative Variables - Motivation

  • Consider the question: How often did this store sell two iPads in one day? Three iPads in one day?
  • It takes too long to deduce this information without summarized representations.

Display: Quantitative Variables - Frequency Distribution

  • Frequency distribution: Shows the number of observations that fall into specific intervals.
  • Allows for faster information retrieval.

Display: Quantitative Variables - Relative Frequency Distribution

  • Relative frequency distribution: Shows the proportion of observations that fall into specific intervals.
  • Example:
    • Number Sold Per Day: 0, Frequency: 5, Relative Frequency: 5/50=0.105/50 = 0.10
    • Number Sold Per Day: 1, Frequency: 8, Relative Frequency: 8/50=0.168/50 = 0.16
    • Number Sold Per Day: 2, Frequency: 14, Relative Frequency: 14/50=0.2814/50 = 0.28
    • Number Sold Per Day: 3, Frequency: 13, Relative Frequency: 13/50=0.2613/50 = 0.26
    • Number Sold Per Day: 4, Frequency: 6, Relative Frequency: 6/50=0.126/50 = 0.12
    • Number Sold Per Day: 5, Frequency: 4, Relative Frequency: 4/50=0.084/50 = 0.08
    • Total Frequency: 50, Total Relative Frequency: 1.00

Display: Quantitative Variables - Histogram

  • Histogram: A graph showing the number of observations in each class of a frequency distribution.
  • Question: Is the number of days and iPad sales per day discrete or continuous?

Display: Quantitative Variables - Ogive

  • Ogive: A graph plotting the cumulative relative frequency distribution on the y-axis and bin endpoints on the x-axis.
  • Question: Is the data discrete or continuous?

Display: Quantitative Variables - Stem-and-Leaf Display

  • Stem-and-leaf display:
    • Splits data into stems (large place value) and leaves (small place value).
    • Lists leaves to the right of the stem values.
    • Provides a rotated histogram-like view of the distribution.

Display: Quantitative Variables - Stem-and-Leaf Display Example

  • Example: Stem and Leaf display
    • 7 | 8 8 9 9 9
    • 8 | 0 0 0 0 1 1 2 3 3 4 4 4 5 6 7 8
    • 9 | 0 2 5
  • Leaf unit = 1
  • More detailed stems can be split in half (e.g., 70-74 and 75-79 instead of just 70s and 80s).

Display: Extra Credit Example

  • Data: 11.3, 9.6, 10.4, 7.5, 8.3, 10.5, 10.0, 9.3, 8.1, 7.7, 7.5, 8.4, 6.3, 8.8
  • Task: Construct a stem-and-leaf display for the data.
  • Leaf unit = 0.1

Display: Two Variables - Tables

  • Summarizing data for two variables using tables:
    • Contingency tables (displays data for one variable in rows and data for another variable in columns).

Display: Two Variables - Contingency Tables

  • Contingency table: A cross-tabulation method to summarize two categorical and/or quantitative variables.
  • Example: ABC Store Sales Details for August (count of customers)
    • 18 out of 200 customers were returning customers and paid using cash: 18/200=0.0918/200 = 0.09
    • Contingency Table:
      • Rows: Returning customers, First-time customers, Total
      • Columns: Cash, Credit, Payment app, Total
      • Example data points: Returning customers using Cash: 18 (count), 0.09 (relative frequency).

Display: Two Variables - Graphs

  • Summarizing data for two variables using graphs:
    • Scatter plot.
    • Side-by-side bar chart.
    • Stacked bar chart.

Display: Two Variables - Graphs - Scatter Plot

  • Scatter plot: Provides a picture of the relationship between two data points that are paired.
    • The dependent variable (y-axis) is influenced by changes in the independent variable (x-axis).
    • Example: Hours studied (x-axis) vs. Exam score (y-axis) for twenty students.

Display: Two Variables - Graphs - Scatter Plot Example

  • Example: Scatter Plot of Exam Performance vs. Hours Studied
    • Illustrates the relationship between hours studied and exam scores.

Display: Two Variables - Graphs - Side-by-Side and Stacked Bar Charts

  • Example: Restaurant data depicting average price per plate and quality rating.
    • Average price per plate: $10, $15, $20, $25
    • Quality rating: Poor, Average, Good, Excellent

Display: Two Variables - Graphs - Side-by-Side Bar Chart

  • Side-by-side bar chart: A graphical display for depicting multiple bar charts on the same display.
    • Used to compare average price per plate vs. quality rating.

Display: Two Variables - Graphs - Stacked Bar Chart

  • Stacked bar chart: A graphical display for depicting multiple variables on the same display.
    • Used to compare average price per plate vs. quality rating.

Display: Summary

  • Distribution of data:
    • Bar chart: Frequencies for categorical data.
    • Pie chart: Relative frequencies for categorical data.
    • Histogram: Frequency distribution for quantitative data.
    • Ogive: Percent frequency distribution for quantitative data.
    • Stem-and-leaf display: Rank order and shape of distribution for quantitative data.
  • Comparisons of data:
    • Side-by-side bar chart: Compare relative frequency of two variables.
    • Stacked bar chart: Compare relative frequency of two variables.
  • Relationships of data:
    • Scatter plot: Shows the relationship between two quantitative variables.
  • Note: Select figures, tables, and examples were obtained from the course textbook publisher.
  • Donnelly, Robert A. (2015) Business Statistics. Upper Saddle River, New Jersey: Pearson Education, Inc. (Custom edition for James Madison University) ISBN:978-1-269-88344-3.