Ch. 2

Copyright Information

  • Copyright © 2014, 2011, and 2008 Pearson Education, Inc.

Chapter Overview

  • Chapter 2: Methods for Describing Sets of Data


Key Terms

  • Class: A category into which qualitative data can be classified.

  • Class frequency: Number of observations in the data set in a particular class.

  • Class relative frequency: Class frequency divided by total number of observations in data set.

  • Class percentage: Class relative frequency multiplied by 100.


Data Presentation Methods

  • Qualitative Data

    • Summary Table

    • Stem-&-Leaf Display

    • Frequency Distribution

    • Histogram

    • Bar Graph

    • Pie Chart

    • Pareto Diagram

    • Dot Plot

  • Quantitative Data


Summary Table

  • Lists categories & number of elements in each category.

  • Obtained by tallying responses in each category.

  • Can show frequencies (counts), percentages, or both.

  • Example:

    • Major | Count

    • Accounting | 130

    • Economics | 20

    • Management | 50

    • Total: 200


Bar Graph

  • Vertical Bars for Qualitative Variables.

  • Bar height shows frequency or percentage.

  • Zero Point Percent Used: Bar widths should be equal.


Pie Chart

  • Shows breakdown of total quantity into categories.

  • Useful for showing relative differences within the data.

  • Angle Size: Calculated as (360° * percentage).

    • Example: Accounting 65% → (360° * 0.65 = 234°).


Pareto Diagram

  • Similar to bar graph but categories are arranged by height in descending order from left to right.

  • Vertical Bars for Quantitative Variables.


Summary of Data Presentation

  • Bar Graph: Represents categories of qualitative variable with bars showing frequencies.

  • Pie Chart: Represents categories of qualitative variable as slices of a pie.

  • Pareto Diagram: Arrangement of bar graph with categories by frequency.


Data Presentation Challenges

  • Once an Analyst: Created multiple graphical presentations for different datasets.

    • Example: Market Shares of Web Browsers (e.g., Firefox, Internet Explorer).


Numerical Measures of Central Tendency

  • Mean: Acts as balance point. Affected by extreme values (outliers).

  • Median: Middle value in ordered sequence; unaffected by extreme values.

  • Mode: Value that occurs most often; can be none, one, or multiple modes.


Numerical Measures of Variability

  • Range: Difference between largest and smallest observations (Range = x_largest - x_smallest).

  • Variance: Measure of how data varies about the mean.

  • Standard Deviation: Reflects dispersion about the mean.


Chebyshev’s Theorem

  • Applies to any shape of data.

  • At least 3/4 of the data lie within (x - 2s, x + 2s).

  • At least 8/9 of the data lie within (x - 3s, x + 3s).


z-Scores

  • Measure how far away a data point is from the mean, in standard deviations.

  • Useful for standardizing different data sets.


Graphical Methods for Bivariate Relationships

  • Scattergram: Used to describe relationships between two quantitative variables, indicating positive, negative, or no relationship.


Time Series Plot

  • Used to visually display data over time; shows trends and changes.

  • Points are typically connected by straight lines.


Misleading Data Presentation

  • Common issues include misrepresenting scales, lack of zero points, and misleading wording. Awareness of both central tendency and variability is crucial to avoid misconceptions.

Central tendency refers to a way to describe the average or typical value in a set of data. It gives you an idea of where most of the data points lie. The most common measures of central tendency are:

  1. Mean: The average value, found by adding all the numbers together and dividing by how many numbers there are.

  2. Median: The middle value when all the numbers are arranged in order. If there's an even number of values, it's the average of the two middle ones.

  3. Mode: The number that appears most often in the data set.

Variability, on the other hand, tells us how much the data points differ from each other. It helps us understand whether the values are spread out or all close together. Common measures of variability include:

  1. Range: The difference between the highest and lowest values.

  2. Variance: A measure of how far the numbers in the set are from the mean.

  3. Standard Deviation: It shows how much variation there is from the average.

In short, central tendency gives a summary of a data set, while variability shows how spread out the data points are.