Statistics - Z-Scores, Boxplots, Quartiles, Percentiles & Outliers
Z Scores
Z-score indicates how many standard deviations a value is from its population mean.
: value is one standard deviation above the mean.
: value is two standard deviations below the mean.
Computing a Z Score
- Let be a value from a population with mean and standard deviation .
- The z-score for is calculated as: .
Example
Mean height for adult men in the US: inches, inches.
Mean height for adult women in the US: inches, inches.
Man's height: 73 inches. Woman's height: 68 inches.
Z-score for the man's height:
- .
Z-score for the woman's height:
- .
The woman is taller relative to the population because of a higher z score.
Empirical Rule and Z Scores
- For bell-shaped populations:
- Approximately 68% of data have z scores between -1 and 1.
- Approximately 95% of data have z scores between -2 and 2.
- Almost all data have z scores between -3 and 3.
Boxplots
- Boxplot: A graph presenting the five-number summary and additional data information.
- Modified boxplot: a type of boxplot.
Constructing a Box Plot
- Data: Number of students absent in a middle school in Northwestern Montana during January.
- Step 1: Compute quartiles using technology (e.g., TI-84 Plus).
- Step 2: Draw vertical lines at Q1, Q2, and Q3; complete the box with horizontal lines.
- Step 3: Calculate the interquartile range (IQR).
- .
- Compute outlier boundaries:
- Lower outlier boundary: .
- Upper outlier boundary: .
- Step 4: Find the largest data value less than the upper boundary (77) and draw a horizontal line from to it.
- Step 5: Find the smallest data value greater than the lower boundary (41) and draw a horizontal line from to it.
- Step 6: Identify outliers (e.g., 100) and plot them separately.
Skewness and Boxplots
- Right Skew:
- Median closer to than .
- Upper whisker longer than lower whisker.
- Left Skew:
- Median closer to than .
- Lower whisker longer than upper whisker.
- Symmetric:
- Median approximately halfway between and .
- Whiskers approximately equal in length.
Quartiles
- Quartiles divide a dataset into four equal parts.
- Every dataset has three quartiles: , , and .
- : separates the lowest 25% from the highest 75%.
- : (median) separates the lowest 50% from the highest 50%.
- : separates the lowest 75% from the highest 25%.
Calculating Quartiles
- Arrange data in increasing order.
- Let = number of values.
- For : .
- For : .
- If is a whole number, the quartile is the average of the values in positions and .
- If is not a whole number, round up to the next whole number, and the quartile is the value in that position.
- is the median.
Example
- Annual rainfall in Los Angeles during February over several years (45 values, already sorted).
- For : . .
- For : . .
- Median: .
Five-Number Summary
- Consists of: minimum, , median, , maximum.
- Rainfall data summary: 0.14, 0.92, 3.21, 4.89, 13.68.
Using Technology for Quartiles
- Different technologies may use different procedures for finding quartiles.
- Example using TI-84 Plus calculator.
Detecting Outliers
- Outlier: A value much larger or smaller than the other values in a data set.
- Outliers can result from errors or reflect extreme values in the population.
IQR Method
- Find and .
- Compute .
- Compute outlier boundaries:
- Lower boundary: .
- Upper boundary: .
- Any data value below the lower boundary or above the upper boundary is an outlier.
Example
- Absent students data: , .
- .
- Lower boundary: .
- Upper boundary: .
- The value 100 is greater than the upper boundary and is an outlier.
Interpreting Quartiles
- The median divides the dataset into two parts.
- Quartiles divide a dataset into four parts.
- : separates the lowest 25% from the highest 75%.
- : (median) separates the lowest 50% from the highest 50%.
- : separates the lowest 75% from the highest 25%.
Examples
- Kayla's high B grade is more likely to be on the third quartile.
- Zorida (Q1), Phoebe (Q3), and Joanne (median); Zorida had the shortest average sleep duration.
Percentiles
- Percentiles divide a dataset into hundredths.
- The percentile separates the lowest % of the data from the highest %.
- Example: The 1st percentile separates the lowest 1% from the highest 99%.
Calculating Percentiles
- Arrange data in increasing order.
- Let = the number of values in the dataset.
- For the percentile, calculate .
- If is a whole number, the percentile is the average of the numbers in positions and .
- If is not a whole number, round it up to the next higher whole number, and the percentile is the number in this position.
Example
- Rainfall data in Los Angeles (45 values, already sorted); find the 60th percentile.
- .
- 60th percentile = .
Finding Percentile for a Given Value
- Arrange data in increasing order.
- Let be the value whose percentile is to be computed.
- Percentile .
- Round the result to the nearest whole number.
Example
- In 1989, rainfall was 1.9 inches; what percentile does this correspond to?
- 17 values are less than 1.9.
- Percentile . The value 1.9 corresponds to the 39th percentile.