Stats Chapter 2
Chapter Overview: Statistics for Managers Using Microsoft® Excel®
Introduction
Author: David M. Levine, David F. Stephan, Kathryn A. Szabat, Marking Bulger
Subject: Overview of organizing and visualizing data using Excel, including strategies for both categorical and numerical variables.
Objectives of Chapter 2
Understand how to:
Organize and visualize categorical variables.
Organize and visualize numerical variables.
Summarize a mix of variables.
Avoid common errors in data organization and visualization.
Organizing and Visualizing Data
Importance of Summarization
Tabular Summaries: Helps you explore data more and make decisions easier
Visual Summaries: Quickly spot data and see trends
DCOVA Process: Organize and visualize steps
Organizing Categorical Data
Types of Data Tables
Summary Table:
Tallies frequencies or percentages across categories.
Example: Devices used to watch TV shows:
Television Set: 49%
Tablet: 9%
Smartphone: 10%
Laptop/Desktop: 32%
Contingency Table
Purpose: Organizes two or more categorical variables to study relationships.
Structure: Rows represent one categorical variable and columns represent another.
Example Data: Examines invoice errors categorized by size (small, medium, large):
Small Amount: 170 No Errors, 20 Errors
Total: 190
Data Analysis with Contingency Tables
Frequency Analysis
No Errors vs. Errors Data:
Small Amounts: 42.50% no errors, 5.00% errors.
Medium Amounts: 25.00% no errors, 10.00% errors.
Large Amounts: 16.25% no errors, 1.25% errors.
Percentage Breakdown
Row Totals: Useful for highlighting error likelihood:
Medium invoices have a higher error rate (28.57%) compared to small (10.53%).
Column Totals: Shows proportion of errors:
61.54% of invoices with errors are medium sized.
Organizing Numerical Data
Ordered Array
Definition: A sequence of data in rank order (smallest to largest).
Purpose: Indicates range and helps identify outliers.
Frequency Distribution
Structure: Arrangements in ordered classes, considers:
Class groupings (5-15 classes recommended).
Class boundaries.
Width of class intervals.
Example of Frequency Distribution
Raw data sorted (e.g., temperatures) and examined for patterns:
Compute class boundaries and midpoints for analysis.
Visualizing Categorical Data
Graphical Displays
Bar Chart: Represents categorical data; bars indicate frequency or percentage.
Pie Chart: Illustrates percentage of categories within a whole.
Doughnut Chart: Similar to pie, but with a central hole.
Pareto Chart: Displays categories in descending order of frequency with a cumulative line.
Visualizing Numerical Data
Histogram
Definition: Vertical bar chart showing frequency distribution without gaps.
Axes: Class boundaries on horizontal, frequency on vertical.
Percentage Polygon
Purpose: Connects class midpoints to visualize percentages over time, useful for comparisons.
Visualizing Relationships in Data
Scatter Plots
Definition: Displays paired observations of two numerical variables; assesses relationships.
Time-Series Plot
Usage: Analyze data patterns over time.
Best Practices for Data Visualization
Utilize simple visualizations; ensure clarity through proper labeling.
Start vertical axes at zero and maintain consistent scales.
Avoid complex or 3D chart types that can confuse data interpretation.
Common Pitfalls in Data Presentation
Presentation issues leading to obscured data or false impressions.
Importance of proper scaling and avoiding chart junk to enhance usefulness.
Chapter Summary
Focus on organizing and visualizing both categorical and numerical variables.
Cover strategies for summarizing mixed variables and avoiding visualization errors.