Ch. 2
Copyright Information
Copyright © 2014, 2011, and 2008 Pearson Education, Inc.
Chapter Overview
Chapter 2: Methods for Describing Sets of Data
Key Terms
Class: A category into which qualitative data can be classified.
Class frequency: Number of observations in the data set in a particular class.
Class relative frequency: Class frequency divided by total number of observations in data set.
Class percentage: Class relative frequency multiplied by 100.
Data Presentation Methods
Qualitative Data
Summary Table
Stem-&-Leaf Display
Frequency Distribution
Histogram
Bar Graph
Pie Chart
Pareto Diagram
Dot Plot
Quantitative Data
Summary Table
Lists categories & number of elements in each category.
Obtained by tallying responses in each category.
Can show frequencies (counts), percentages, or both.
Example:
Major | Count
Accounting | 130
Economics | 20
Management | 50
Total: 200
Bar Graph
Vertical Bars for Qualitative Variables.
Bar height shows frequency or percentage.
Zero Point Percent Used: Bar widths should be equal.
Pie Chart
Shows breakdown of total quantity into categories.
Useful for showing relative differences within the data.
Angle Size: Calculated as (360° * percentage).
Example: Accounting 65% → (360° * 0.65 = 234°).
Pareto Diagram
Similar to bar graph but categories are arranged by height in descending order from left to right.
Vertical Bars for Quantitative Variables.
Summary of Data Presentation
Bar Graph: Represents categories of qualitative variable with bars showing frequencies.
Pie Chart: Represents categories of qualitative variable as slices of a pie.
Pareto Diagram: Arrangement of bar graph with categories by frequency.
Data Presentation Challenges
Once an Analyst: Created multiple graphical presentations for different datasets.
Example: Market Shares of Web Browsers (e.g., Firefox, Internet Explorer).
Numerical Measures of Central Tendency
Mean: Acts as balance point. Affected by extreme values (outliers).
Median: Middle value in ordered sequence; unaffected by extreme values.
Mode: Value that occurs most often; can be none, one, or multiple modes.
Numerical Measures of Variability
Range: Difference between largest and smallest observations (Range = x_largest - x_smallest).
Variance: Measure of how data varies about the mean.
Standard Deviation: Reflects dispersion about the mean.
Chebyshev’s Theorem
Applies to any shape of data.
At least 3/4 of the data lie within (x - 2s, x + 2s).
At least 8/9 of the data lie within (x - 3s, x + 3s).
z-Scores
Measure how far away a data point is from the mean, in standard deviations.
Useful for standardizing different data sets.
Graphical Methods for Bivariate Relationships
Scattergram: Used to describe relationships between two quantitative variables, indicating positive, negative, or no relationship.
Time Series Plot
Used to visually display data over time; shows trends and changes.
Points are typically connected by straight lines.
Misleading Data Presentation
Common issues include misrepresenting scales, lack of zero points, and misleading wording. Awareness of both central tendency and variability is crucial to avoid misconceptions.
Central tendency refers to a way to describe the average or typical value in a set of data. It gives you an idea of where most of the data points lie. The most common measures of central tendency are:
Mean: The average value, found by adding all the numbers together and dividing by how many numbers there are.
Median: The middle value when all the numbers are arranged in order. If there's an even number of values, it's the average of the two middle ones.
Mode: The number that appears most often in the data set.
Variability, on the other hand, tells us how much the data points differ from each other. It helps us understand whether the values are spread out or all close together. Common measures of variability include:
Range: The difference between the highest and lowest values.
Variance: A measure of how far the numbers in the set are from the mean.
Standard Deviation: It shows how much variation there is from the average.
In short, central tendency gives a summary of a data set, while variability shows how spread out the data points are.