Descriptive Statistics
Measures of Location
Helps in summarizing the data before making inferences about the population.
Defines the center or middle of the sample data.
Arithmetic Mean
Definition: The sum of all observations divided by the number of observations.
Formula: x̄ = x₁ + x₂ + … + xₙ / n, x₁, x₂, …, xₙ are the sample values, and n is the total number of observations.
Sample Median
Definition: The middle value that splits the data in half.
Split the data into two halves by ordering:
If n is odd, take the middle point.
If n is even, take the average of the two middle points.
Notation: Q2
Example: Given ordered data: 3, 5, 7, 8, 8, 9, 10, 12, 35,
Since n is odd, the sample median Q2 is the fifth largest point, which is 8.
Calculation of Median
For lamb weight gain (n = 6, even), find Q2:
Ordered data: 1, 2, 10, 11, 13, 19.
Q2 calculation:
Data Distributions
Data distributions can be symmetric or asymmetric.
Symmetric: Left half mirrors the right half.
Asymmetric (skewed): Left and right halves do not mirror each other.
Mean vs. Median in Distributions:
Symmetric: Mean ≈ Median
Positively skewed: Mean > Median
Negatively skewed: Mean < Median
Median is more robust to extreme values, but mean is easier to compute.
Mode
Definition: Most frequently occurring value in a dataset.
Can be unimodal (1 mode), bimodal (2 modes), trimodal (3 modes), etc.
Percentiles
Definition: The p-th percentile (Vp) is where p% of data points are less than or equal to Vp.
Median as a Percentile: Median (Q2) is the 50th percentile.
Commonly used percentiles: Quartiles (Q1, Q2, Q3), Quintiles, Deciles.
Quartiles
Definition: Split the data into four equal parts.
Q1: 25th percentile (lower quartile).
Q2: 50th percentile (median).
Q3: 75th percentile (upper quartile).
Calculating Quartiles
Q1 calculation:
If n/4 is not an integer, take (k + 1)-th largest sample point.
If n/4 is an integer, take the average of the (n/4)th and (n/4 + 1)th largest.
Q2 follows similar rules for n/2.
Q3 follows similar rules for 3n/4.
Inter-Quartile Range (IQR)
Definition: IQR = Q3 - Q1
Minimum (y(1)): Smallest value in dataset.
Maximum (y(n)): Largest value in dataset.
Five Number Summary
Comprises: {y(1), Q1, Q2, Q3, y(n)}
Useful for quick data overview.
Boxplot
Graphical representation of the five-number summary showing the IQR and quartiles: Q1, Q2, Q3.
Outliers
Definition: Points significantly different from other points in the dataset.
Can result from errors or unique circumstances.
Fence calculations:
Lower fence:
Upper fence:
Outliers are data points outside these fences.