Describing Data: Measures of Center and Average

Introduction to Describing Data: What is Average?

  • Learning Goals for Section 4.1:

    • Understand the fundamental differences between the three primary measures of center: mean, median, and mode.

    • Analyze how outliers affect these different measures of average.

    • Determine when it is appropriate to apply a weighted mean instead of a standard mean.

Measures of Center: Mean, Median, and Mode

  • The Mean:

    • Definition: The mean is the value most commonly referred to as the "average."

    • Calculation: It is calculated by summing all the values in a data set and then dividing by the total number of values.

    • Metaphor/Visualization (Figure 4.1): If a histogram were constructed using physical blocks, the mean would represent the specific point on the horizontal axis where the distribution would balance perfectly.

  • The Median:

    • Definition: The median is the value that occupies the middle position when the data set is sorted in ascending or descending order.

    • Calculating for Even Data Sets: If the data set contains an even number of values, the median is defined as the value halfway between the two middle values (calculated as the average of those two values).

  • The Mode:

    • Definition: The mode is the most frequently occurring value (or group of values) in a data set.

Rounding Rule for Statistical Calculations

  • General Principle: When performing statistical calculations, the final answer should typically be expressed with one more decimal place of precision than what is provided in the original list of data values.

  • Examples:

    • If original data are whole numbers (0 decimal places), the mean should be rounded to the nearest tenth (0.10.1).

    • If original data are given to the nearest tenth (1 decimal place), the mean should be rounded to the nearest hundredth (0.010.01).

  • Precision Note: One must always round only the final answer; do not round any intermediate values used during the calculation process.

Practical Application: Price Data Example

  • DataSet: Eight grocery stores sell a PR energy bar at the following eight prices:

    • $1.79\$1.79

    • $1.29\$1.29

    • $1.29\$1.29

    • $1.35\$1.35

    • $1.39\$1.39

    • $1.49\$1.49

    • $1.59\$1.59

    • $1.09\$1.09

  • Calculating the Mean:

    • Sum of prices: $1.79+$1.29+$1.29+$1.35+$1.39+$1.49+$1.59+$1.09=$11.28\$1.79 + \$1.29 + \$1.29 + \$1.35 + \$1.39 + \$1.49 + \$1.59 + \$1.09 = \$11.28

    • Count (nn): 88

    • Mean = $11.288=$1.41\frac{\$11.28}{8} = \$1.41

    • Using the Rounding Rule (33 decimal places): $1.410\$1.410

  • Calculating the Median:

    • Sorted Data (Ascending Order): $1.09,$1.29,$1.29,$1.35,$1.39,$1.49,$1.59,$1.79\$1.09, \$1.29, \$1.29, \$1.35, \$1.39, \$1.49, \$1.59, \$1.79

    • Since there are 88 values, the middle consists of the fourth and fifth values: $1.35\$1.35 and $1.39\$1.39.

    • Calculation: $1.35+$1.392=$1.37\frac{\$1.35 + \$1.39}{2} = \$1.37

    • Using the Rounding Rule (33 decimal places): $1.370\$1.370

  • Calculating the Mode:

    • The mode is $1.29\$1.29 because it appears twice, which is more frequent than any other price in the set.

The Impact of Outliers on Statistical Measures

  • Definition of Outlier: An outlier (or outlying value) is a value in a data set that is significantly higher or significantly lower than almost all other values.

  • The Basketball Contract Scenario:

    • Five graduating seniors receive first-year contract offers for the NBA. Four receive no offer (00), and one receives $10,000,000\$10,000,000.

    • Data: 0,0,0,0,$10,000,0000, 0, 0, 0, \$10,000,000

    • Mean Offer Calculation: $0+$0+$0+$0+$10,000,0005=$2,000,000\frac{\$0 + \$0 + \$0 + \$0 + \$10,000,000}{5} = \$2,000,000

    • Problem of Representation: While the mean indicates an "average" of $2million\$2 million, this number is unrepresentative because $10,000,000\$10,000,000 is an extreme outlier. If the outlier is removed, the mean drops to zero.

  • Resistance to Outliers:

    • Mean: Significantly affected by outliers; they pull the mean toward the extreme value.

    • Median: Generally unaffected because outliers exist at the ends of the sorted list, not the center. (Note: Deleting an outlier may change the count of values and thus shift the median slightly, but the value of the outlier itself does not distort the result).

    • Mode: Generally unaffected by outliers.

"Average" Confusion and Real-World Examples

  • The Wage Dispute Scenario:

    • Context: News reports an average wage of $42\$42\,per hour in the industry. Workers at a firm claim their average is only $36\$36, while management claims the firm's average is $48\$48.

    • Resolution: Both can be correct if they use different measures of center.

    • Hypothetical Data: Five workers with wages $36,$36,$36,$36,\$36, \$36, \$36, \$36, and $960\$960.

    • Median calculation: The middle value is $36\$36.

    • Mean calculation: According to the text, the mean of such a distribution is reported as $48\$48.

  • Confusion Source: Misunderstandings often arise when the specific type of "average" (mean vs. median) is not specified, or when calculating methods are not transparent.

Calculating Weighted Mean

  • Definition: A weighted mean accounts for variations in the relative importance or "weight" of individual data values within a set.

  • Formula: Weighted Mean=(value×weight)weights\text{Weighted Mean} = \frac{\sum (\text{value} \times \text{weight})}{\sum \text{weights}}

  • Course Grade Example:

    • Structure: 4 Quizzes (each worth 15%15\%) and 1 Final Exam (worth 40%40\%).

    • Scores: Quiz scores are 75,80,84,8875, 80, 84, 88; Final Exam score is 9696.

    • Calculation using percentages as weighted values (15 and 40):

      • Sum of weighted values: (75×15)+(80×15)+(84×15)+(88×15)+(96×40)=1125+1200+1260+1320+3840=8745(75 \times 15) + (80 \times 15) + (84 \times 15) + (88 \times 15) + (96 \times 40) = 1125 + 1200 + 1260 + 1320 + 3840 = 8745

      • Sum of weights: 15+15+15+15+40=10015 + 15 + 15 + 15 + 40 = 100

      • Final Score: 8745100=87.45\frac{8745}{100} = 87.45

    • Rounding Rule Application: The finale score is rounded to 87.587.5.

Questions & Discussion

  • Think About It (Weights as Decimals):

    • Question: Because weights often represent percentages, would calculating the weighted mean using decimals (e.g., 0.150.15 and 0.400.40) change the final answer?

    • Logic: No, the answer remains the same because both the numerator ((x×w)\sum (x \times w)) and the denominator (w\sum w) are scaled by the same factor, maintaining the same ratio.

Formalizing the Mean with Summation Notation

  • The Summation Sign (\Sigma): This Greek capital letter sigma indicates that a set of numbers should be added together.

  • Variables:

    • xx: Represents each individual value in a data set.

    • nn: Represents the total number of values in a sample.

    • xˉ\bar{x}: The standard symbol for the mean of a sample.

    • μ\mu (mu): The Greek letter used to represent the mean of a population.

  • General Formulas:

    • Mean: xˉ=xn\bar{x} = \frac{\sum x}{n}

    • Weighted Mean: Weighted Mean=(x×w)w\text{Weighted Mean} = \frac{\sum (x \times w)}{\sum w}

Calculating Measures for Binned Data

  • Approach: For data organized into bins (ranges), assume that the middle value of the bin represents every data value within that bin.

  • Binned Data Example (Table of 50 values):

    • Bin 1 (0–6): Middle value = 33. Frequency = 1010. Contribution = (3×10)=30(3 \times 10) = 30.

    • Bin 2 (7–13): Middle value = 1010. Frequency = 1212. Contribution = (10×12)=120(10 \times 12) = 120.

    • Bin 3 (14–20): Middle value = 1717. Frequency = 1111. Contribution = (17×11)=187(17 \times 11) = 187.

    • Bin 4 (21–27): Middle value = 2424. Frequency = 1717. Contribution = (24×17)=408(24 \times 17) = 408.

  • Calculating the Mean:

    • Total Sum: 30+120+187+408=74530 + 120 + 187 + 408 = 745

    • Total Count (nn): 5050

    • Mean: 74550=14.9years\frac{745}{50} = 14.9\,\text{years}

  • Determining the Median and Mode:

    • Median: With 5050 values, the median is located between the 25th25\text{th} and 26th26\text{th} sorted values. Counting frequencies (10+12=2210+12=22), the 25th25\text{th} and 26th26\text{th} values fall into the 142014\text{--}20 bin, known as the median class.

    • Mode: The bin with the highest frequency. In this data set, the mode is the 212721\text{--}27 bin (frequency of 1717).