Mean, Median, Mode

SECTION 1: MEASURES OF CENTER

Measures of center are single numerical values used to summarize the central position or a typical value of a data set.

1. Mean (Average)

Definition: The balance point or the average value of a data set.
How to Calculate: Add all data values together and divide the sum by the total count of numbers in the data set.
Formula Example: Mean = (Sum of all numbers) / (Total count of numbers)
Real-World Example: For the instructor age data set {21, 23, 25, 26, 30}:
- Sum = 21 + 23 + 25 + 26 + 30 = 125
- Mean = 125 / 5 = 25 years old

2. Median (Middle Value)

Definition: The exact middle value when data points are arranged in order.
How to Find: 1. Arrange the numbers in order from least to greatest.
2. If the count is odd, choose the middle number.
3. If the count is even, add the two middle values together and divide by 2 to find their average.
Real-World Example (Odd Count): For data {21, 23, 25, 26, 30}, the middle value is 25.
Real-World Example (Even Count): For quiz score data where the middle numbers are 45 and 45:
- Median = (45 + 45) / 2 = 45

3. Mode

Definition: The number or numbers that appear most frequently in a data set.
Key Guidelines: A data set can have one mode, more than one mode, or no mode at all (if no numbers repeat).
Real-World Example: In the earnings data set {6, 23, 23, 23, 24, 24, 24, 25, 26}, the modes are both 23 and 24 because they each repeat three times.

SECTION 2: MEASURES OF VARIATION (SPREAD)

Measures of variation describe the distribution, spread, or how far data values fluctuate from one another and from the center.

1. Range

Definition: The total distance covered between the highest and lowest points of the data set.
Formula: Range = Maximum value - Minimum value
Real-World Example: Given temperature values {3, 10, 14, 19, 22, 29, 32, 36, 49, 61}:
- Range = 61 - 3 = 58

2. Quartiles

Definition: Values that divide an ordered data set into four equal parts (quarters).
First Quartile (Q1): The median of the lower half of the data (halfway to the main median).
Third Quartile (Q3): The median of the upper half of the data (halfway from the main median to the maximum).

3. Interquartile Range (IQR)

Definition: The distance between the first and third quartiles, representing the spread of the middle 50% of the data.
Formula: IQR = Q3 - Q1
Real-World Example: In a data set with a lower median (Q1) of 14 and an upper median (Q3) of 36:
- IQR = 36 - 14 = 22

4. Mean Absolute Deviation (MAD)

Definition: The average distance between each individual data point and the calculated mean of the data set. It gauges volatility or consistency.
How to Calculate:
1. Find the mean of the data set.
2. Calculate the positive distance (absolute difference) between each data value and that mean.
3. Find the average of those calculated distances.

SECTION 3: CALCULATING OUTLIERS

An outlier is an extreme value within a data set that is significantly higher or lower than the rest of the observations. We use a strict mathematical limits process to verify them.

The 4-Step Outlier Formula Plan

To determine mathematically if an outlier exists, use the following steps:

Step 1: Calculate the Interquartile Range (IQR = Q3 - Q1).
Step 2: Multiply the IQR by the standard scaling constant of 1.5.
- Magic Product = 1.5 x IQR
Step 3: Establish the Lower Limit
- Lower Limit = Q1 - (1.5 x IQR)
- Any data value below this lower limit is a low outlier.
Step 4: Establish the Upper Limit
- Upper Limit = Q3 + (1.5 x IQR)
- Any data value above this upper limit is a high outlier.

Real-World Walkthrough Example

Data Set: {0, 8, 10, 11, 13, 16, 22, 24, 27, 33, 40, 58, 77}

Median: 22
Lower Half Median (Q1): 10.5 (average of 10 and 11)
Upper Half Median (Q3): 36.5 (average of 40 and 58)
IQR Calculation: 36.5 - 10.5 = 26
Scale Factor Application: 26 x 1.5 = 39
Boundary Check:
- Lower Limit = 10.5 - 39 = -28.5 (No data point is below this)
- Upper Limit = 36.5 + 39 = 75.5 (The data value 77 is greater than 75.5)
Conclusion: 77 is verified as a mathematical outlier.

SECTION 4: DATA VISUALIZATION TIERS

1. Line Plots (Dot Plots)

What it shows: Displays individual data frequency along a standard continuous number line using dots or X marks over each value.
Pros: Every single distinct data point is completely visible.
Key Rule: The raw data set does not need to be ordered before marking points onto the plot.

2. Histograms

What it shows: Graphs continuous quantitative data sorted into equal-sized ranges called intervals along the X-axis. The Y-axis tracks how often (frequency) values land in those intervals.
Key Graphic Guidelines:
- Because intervals are continuous, the bars on the graph must touch (unlike isolated bar charts).
- All interval blocks must be equal in width.
- Intervals with zero items recorded have a bar height of 0.

3. Box Plots (Box & Whisker Plots)

What it shows: A graphic summary of a data set mapped across its Five-Number Summary boundaries.
The 5-Number Summary Framework:
1. Minimum: The lowest value in the data set (left whisker tip).
2. First Quartile (Q1): The start boundary of the main box container.
3. Median (Q2): The internal dividing line inside the box container.
4. Third Quartile (Q3): The end boundary edge of the box container.
5. Maximum: The highest value in the data set (right whisker tip).
Percentages to Memorize:
- The main central box contains exactly 50% of all your data.
- Each individual segment section (each whisker or half-box quadrant) represents exactly 25% of the observations.

SECTION 5: SHAPE & DISTRIBUTION PATTERNS

When analyzing a data display graph, look for specific distribution trends and structural shapes:

Peaks: Distinct cluster high points or crest positions showing maximum frequency zones.
Gaps: Empty sections along the axis showing where absolutely no observations occurred.
Clusters: Tight groupings where numerous data entries occur close together.
Symmetric: A balanced distribution shape where the left half of the graph looks like a mirror reflection of the right half.
Skewed Right (Positive Skew): The bulk of your data values clump on the left side, and a long tail or extreme outlier pulls the mean outward toward the right side.
Skewed Left (Negative Skew): The bulk of your data values clump on the right side, and a long tail extends toward lower values on the left side.