September 11th

Collect quantitative data only when computation and comparison are possible; non-quantitative categories are not suitable for calculations.
When planning data entry, ensure the data will work for the intended analysis.
Acknowledge that different acceptable answers may exist depending on how data is loaded or entered into a computer; check feedback and adjust accordingly.

Five-number summary: $\min,\; Q1,\; Q2,\; Q_3,\; \max$
$Q_2$ is the median (= 50th percentile). For an odd sample size, it's the middle value; for even sizes, it’s the average of the two middle values.
$Q1$ is the median of the lower half; $Q3$ is the median of the upper half.
This summary provides key cut points of the data distribution.

Box spans from $Q1$ to $Q3$ ; a vertical line inside the box marks the median $Q_2$ .
Whiskers extend from the box to the data’s minimum and maximum values.
Interquartile Range: $\text{IQR} = Q3 - Q1$
50% of data lie inside the box (between $Q1$ and $Q3$ ).
Box shape reveals skewness: a longer whisker on one side suggests skew toward that side.

Outlier thresholds: $Q1 - 1.5\times\text{IQR} \quad \text{and} \quad Q3 + 1.5\times\text{IQR}$
Any data point outside these thresholds is considered an outlier.
Outliers can be identified visually as values far beyond the whiskers and confirmed by threshold calculation.

Quartiles divide data into 4 equal parts: Q1 (25th percentile), Q2 (50th percentile / median), Q3 (75th percentile).
Deciles divide data into 10 equal parts.
Percentiles indicate the percentage of values at or below a certain point.
Two common definitions for percentile p exist:
- Definition A (book-style): percentile p is where \frac{#{Xi < xp}}{n} = \frac{p}{100} (data strictly below $x_p$ ).
- Definition B (CDF style): $F(xp) = \frac{p}{100}$ where F is the cumulative distribution function (data at or below $xp$ ).
An Ogive (cumulative frequency plot) can be used to estimate percentiles.

Standardization converts data to a common scale with mean 0 and standard deviation 1.
Z-score formulas:
- Population: $z = \frac{X - \mu}{\sigma}$
- Sample: $z = \frac{\,X - \bar{X}\,}{s}$
Z-scores enable direct comparison across different distributions and map to the same probabilities as the original data.
Z-scores are typically rounded to two decimal places.

Empirical rule (68-95-99.7 rule) for a normal distribution:
- Within $\pm 1\sigma$ : about $P(|Z|\le 1) \approx 0.68$
- Within $\pm 2\sigma$ : about $P(|Z|\le 2) \approx 0.95$
- Within $\pm 3\sigma$ : about $P(|Z|\le 3) \approx 0.997$
These approximations help estimate probabilities from mean and standard deviation without detailed tables for normally distributed data.

Be consistent with percentile definitions used in that course/book when solving problems.
Use z-scores to streamline probability questions instead of recalculating from scratch.
Read box-and-whisker plots quickly to assess spread (IQR), center (median), skewness (whisker length), and potential outliers.
Report IQR along with quartiles to concisely convey data spread.