1/26
Key vocabulary terms from Chapter 3 on numerically summarizing data, including measures of center, spread, position, and methods for identifying outliers.
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
Shape (distribution)
The distribution’s form, described by symmetry/skewness, number of peaks, clusters or gaps, and any outliers.
Center
A typical or representative value of a distribution; numerical measures include the mean and the median.
Spread
The variability or dispersion of the data, describing how far values are from each other or from the center.
Mean (x̄)
The sum of all observations divided by the number of observations (sample mean); population mean is μ.
Median
The middle value when data are ordered; if n is odd, it is the middle observation; if even, it is the average of the two middle observations.
Mode
The observation that occurs most frequently; often used for categorical data to indicate the most frequent category.
Resistant
A statistic that is little affected by outliers; the median is resistant, the mean is not.
Symmetric
A distribution with balanced tails; typically mean ≈ median.
Skewed right
A distribution with a longer tail to the right; generally mean > median.
Skewed left
A distribution with a longer tail to the left; generally mean < median.
Outlier
An observation unusually far from the rest of the data; causes can include measurement error, different population, or a rare event.
Range
Difference between the largest and smallest observations; simple but sensitive to outliers.
Interquartile Range (IQR)
Q3 − Q1; spread of the middle 50% of data; resistant to outliers.
Variance
Average squared deviation from the mean; s² for a sample, σ² for a population.
Standard Deviation
Square root of the variance; s for a sample, σ for a population; has the same units as the data and is not resistant.
Five-number summary
Minimum, Q1, M (median), Q3, Maximum.
Boxplot
Graph of the five-number summary; whiskers extend to the smallest/largest non-outlier observations; outliers shown as points or asterisks.
Quartile
Values that divide data into four equal parts: Q1 (25th percentile), Q2 (median), Q3 (75th percentile).
Z-score
The number of standard deviations an observation is from the mean; z = (observation − mean)/sd (sample) or (x − μ)/σ (population).
Percentile
A value such that p% of observations fall below it; common examples include quartiles (25th, 50th, 75th).
Empirical Rule
For bell-shaped data: about 68% within 1 SD, 95% within 2 SD, and 99.7% within 3 SD.
1.5 × IQR Rule (outliers)
Fences for identifying outliers: Lower fence = Q1 − 1.5·IQR; Upper fence = Q3 + 1.5·IQR; values outside are outliers.
Boxplot shape guidelines
In boxplots, symmetric distributions have median near the center with similar whiskers; skewed distributions show longer whiskers on the side of skew.
Describing distributions (center vs. spread)
For symmetric distributions report mean and SD; for skewed or outlier-containing distributions report median and IQR; use same measures when comparing groups.
Population vs Sample notation
x̄ denotes the sample mean; μ denotes the population mean; s denotes the sample standard deviation; σ denotes the population standard deviation.
Outlier effect on center
Outliers tend to have a large influence on the mean but little on the median; removing an outlier can change the mean more than the median.
Quartiles by hand vs. software
Q1 and Q3 are typically found by hand using the median and halves of the data; JMP may compute them differently.