Ch 3 - Numerical Summaries of Data
Chapter 3 Numerical Summaries of Data
Sanduni Palliyage, PhDMATH 220 – Fall 2024James Madison University
The Mean
Overview
The mean, or average, summarizes a quantitative variable with a single value, representing a measure of data center.
Calculation
Formula: Mean = (Sum of all data points) / (Number of data points)
Example:
For the data set [10, 15, 9, 21, 17, 10]:
Mean = (10 + 15 + 9 + 21 + 17 + 10) / 6 = 82 / 6 = 13.667
Notation for the Mean
Sample Mean and Population Mean
Denote data points as:𝑥1, 𝑥2, … , 𝑥𝑛
Sample Mean:
x̄ = (𝑥1 + 𝑥2 + … + 𝑥𝑛) / n
Population Mean:
μ = (𝑥1 + 𝑥2 + … + 𝑥𝑁) / N
The Median
Definition and Importance
The median is the middle value in a dataset when organized in ascending order, marking the 50th percentile.
50% of data values lie below, and 50% lie above the median.
Calculation differs based on whether n (number of observations) is odd or even.
Calculation Steps
Organize: Arrange your data in increasing order.
Find n: Determine the total number of data values, n.
Determine Median:
If n is odd: Median = 𝑛+1 / 2
If n is even: Median = (Average of the two middle numbers)
Examples of Finding the Median
Odd Number of Observations
Data: 10, 8, 14, 20, 19, 17, 8
Sorted: 8, 8, 10, 14, 17, 19, 20
Median Calculation:
n = 7 (Odd) -> Median position: (7+1)/2 = 4th observation = 14
Even Number of Observations
Data: 10, 8, 14, 20, 19, 17, 8, 21
Sorted: 8, 8, 10, 14, 17, 19, 20, 21
Median Calculation:
n = 8 (Even) -> Median = (4th + 5th) / 2 = (14 + 17) / 2 = 15.5
Comparing Mean and Median
Key Differences
Mean includes all data points; sensitive to extremes.
Median only considers the middle value(s); resistant to extreme values.
Example Analysis
Data Set 1: 1, 2, 3, 4, 5, 6
Mean = 3.5, Median = 3.5
Data Set 2: 1, 2, 3, 4, 5, 21
Mean = 6, Median = 3.5
Conclusion: Median stays constant; mean is skewed by the outlier (21).
The Mode
Definition
The mode is the value that appears most frequently in a dataset.
If multiple values tie for frequency, they all are modes (e.g. bimodal).
Example Data: [1, 2, 2, 3, 4, 5, 5, 5, 6]
Modes = 5
Measures of Spread
Overview
Measures of center do not provide insight into the data's spread.
Example: Number choices like temperature could be misleading without spread context.
Range
Definition: Difference between the maximum and minimum values.
Example:
SF Range = 63 - 51 = 12
STL Range = 79 - 30 = 49
The Variance
Definition
Variance measures data points’ dispersion from the mean.
Importance
Low Variance: Data values close to the mean.
High Variance: Values spread out from the mean, indicating varied data points.
Population Variance
Calculation Steps
Sum squared deviations from the mean.
Divide by the population size N.
Formula
Variance (σ²) = Σ(𝑥 - μ)² / N
Sample Variance
Key Differences
Sample variance (s²) uses sample mean (x̄) and divides by n - 1 to compensate for bias:
Formula: s² = Σ(𝑥 - x̄)² / (n - 1)
The Standard Deviation
Explanation
Definition: Square root of variance provides a measure of data spread with the same units as the data.
Population Standard Deviation (σ) = √σ²
Sample Standard Deviation (s) = √s²
z-Scores
Definition
A z-score indicates how many standard deviations a data point is from the mean:z = (𝑥 - μ) / σ
Positive z-scores indicate above-average values; negative indicates below average.
Quartiles and Percentiles
Quartiles
Quartiles divide data into four parts:
Q1 (25th percentile) separates lower 25%
Median (Q2) separates 50%
Q3 (75th percentile) separates upper 25%
Five-Number Summary
Components
Minimum
Q1
Median
Q3
Maximum
Outliers
Detection
An outlier significantly differs from other data points. Classify using interquartile range (IQR).
IQR Method Steps
Determine Q1 and Q3.
Calculate IQR = Q3 - Q1.
Compute boundaries using 1.5*IQR rules.
Boxplots
Explanation
Visual representation of the five-number summary. Useful in identifying outliers and comparing distributions.
Comparative Boxplots
Utility
Allow easy visual comparisons between datasets (e.g., rainfall over different time periods).
Conclusion
In this case, n = 7, since there are 7 values in the dataset.
To find n, which represents the total number of data values in your dataset, you simply count how many values are present. For example, if your dataset is [10, 8, 14, 20, 19, 17, 8], you would count these values:
10
8
14
20
19
17
8
In this case, n = 7, since there are 7 values in the dataset.
To calculate the population variance using a TI-84 calculator, follow these steps:
Enter your data:
Press the
STATbutton.Select
1: Edit...to access the lists.Enter your data into one of the columns (e.g.,
L1).
Calculate statistics:
After entering all your data points, press the
STATbutton again.Arrow over to the
CALCmenu.Select
1: 1-Var Stats.Input the list you used for your data (e.g.,
L1) and pressENTER.
Find the variance:
Look for the value labeled
σ²in the output. This is the population variance of the dataset you entered.
To find the z-score using a TI-84 calculator, you cannot directly calculate it as the calculator does not have a built-in function for z-scores. However, you can calculate it using the formula: z = (x - μ) / σ Here, x is your data point, μ is the mean of the dataset, and σ is the standard deviation. Follow these steps:
First, find the mean (μ) and standard deviation (σ):
Press the STAT button.
Choose 1: Edit to enter your data in one of the lists (e.g., L1).
After entering data, press STAT, then arrow over to CALC, and select 1: 1-Var Stats. This will give you the mean and standard deviation.
Once you have μ and σ, use the z-score formula to manually calculate z for your specific data point x.