Describing Data: Measures of Center and Average
Introduction to Describing Data: What is Average?
Learning Goals for Section 4.1:
Understand the fundamental differences between the three primary measures of center: mean, median, and mode.
Analyze how outliers affect these different measures of average.
Determine when it is appropriate to apply a weighted mean instead of a standard mean.
Measures of Center: Mean, Median, and Mode
The Mean:
Definition: The mean is the value most commonly referred to as the "average."
Calculation: It is calculated by summing all the values in a data set and then dividing by the total number of values.
Metaphor/Visualization (Figure 4.1): If a histogram were constructed using physical blocks, the mean would represent the specific point on the horizontal axis where the distribution would balance perfectly.
The Median:
Definition: The median is the value that occupies the middle position when the data set is sorted in ascending or descending order.
Calculating for Even Data Sets: If the data set contains an even number of values, the median is defined as the value halfway between the two middle values (calculated as the average of those two values).
The Mode:
Definition: The mode is the most frequently occurring value (or group of values) in a data set.
Rounding Rule for Statistical Calculations
General Principle: When performing statistical calculations, the final answer should typically be expressed with one more decimal place of precision than what is provided in the original list of data values.
Examples:
If original data are whole numbers (0 decimal places), the mean should be rounded to the nearest tenth ().
If original data are given to the nearest tenth (1 decimal place), the mean should be rounded to the nearest hundredth ().
Precision Note: One must always round only the final answer; do not round any intermediate values used during the calculation process.
Practical Application: Price Data Example
DataSet: Eight grocery stores sell a PR energy bar at the following eight prices:
Calculating the Mean:
Sum of prices:
Count ():
Mean =
Using the Rounding Rule ( decimal places):
Calculating the Median:
Sorted Data (Ascending Order):
Since there are values, the middle consists of the fourth and fifth values: and .
Calculation:
Using the Rounding Rule ( decimal places):
Calculating the Mode:
The mode is because it appears twice, which is more frequent than any other price in the set.
The Impact of Outliers on Statistical Measures
Definition of Outlier: An outlier (or outlying value) is a value in a data set that is significantly higher or significantly lower than almost all other values.
The Basketball Contract Scenario:
Five graduating seniors receive first-year contract offers for the NBA. Four receive no offer (), and one receives .
Data:
Mean Offer Calculation:
Problem of Representation: While the mean indicates an "average" of , this number is unrepresentative because is an extreme outlier. If the outlier is removed, the mean drops to zero.
Resistance to Outliers:
Mean: Significantly affected by outliers; they pull the mean toward the extreme value.
Median: Generally unaffected because outliers exist at the ends of the sorted list, not the center. (Note: Deleting an outlier may change the count of values and thus shift the median slightly, but the value of the outlier itself does not distort the result).
Mode: Generally unaffected by outliers.
"Average" Confusion and Real-World Examples
The Wage Dispute Scenario:
Context: News reports an average wage of \,per hour in the industry. Workers at a firm claim their average is only , while management claims the firm's average is .
Resolution: Both can be correct if they use different measures of center.
Hypothetical Data: Five workers with wages and .
Median calculation: The middle value is .
Mean calculation: According to the text, the mean of such a distribution is reported as .
Confusion Source: Misunderstandings often arise when the specific type of "average" (mean vs. median) is not specified, or when calculating methods are not transparent.
Calculating Weighted Mean
Definition: A weighted mean accounts for variations in the relative importance or "weight" of individual data values within a set.
Formula:
Course Grade Example:
Structure: 4 Quizzes (each worth ) and 1 Final Exam (worth ).
Scores: Quiz scores are ; Final Exam score is .
Calculation using percentages as weighted values (15 and 40):
Sum of weighted values:
Sum of weights:
Final Score:
Rounding Rule Application: The finale score is rounded to .
Questions & Discussion
Think About It (Weights as Decimals):
Question: Because weights often represent percentages, would calculating the weighted mean using decimals (e.g., and ) change the final answer?
Logic: No, the answer remains the same because both the numerator () and the denominator () are scaled by the same factor, maintaining the same ratio.
Formalizing the Mean with Summation Notation
The Summation Sign (\Sigma): This Greek capital letter sigma indicates that a set of numbers should be added together.
Variables:
: Represents each individual value in a data set.
: Represents the total number of values in a sample.
: The standard symbol for the mean of a sample.
(mu): The Greek letter used to represent the mean of a population.
General Formulas:
Mean:
Weighted Mean:
Calculating Measures for Binned Data
Approach: For data organized into bins (ranges), assume that the middle value of the bin represents every data value within that bin.
Binned Data Example (Table of 50 values):
Bin 1 (0–6): Middle value = . Frequency = . Contribution = .
Bin 2 (7–13): Middle value = . Frequency = . Contribution = .
Bin 3 (14–20): Middle value = . Frequency = . Contribution = .
Bin 4 (21–27): Middle value = . Frequency = . Contribution = .
Calculating the Mean:
Total Sum:
Total Count ():
Mean:
Determining the Median and Mode:
Median: With values, the median is located between the and sorted values. Counting frequencies (), the and values fall into the bin, known as the median class.
Mode: The bin with the highest frequency. In this data set, the mode is the bin (frequency of ).