Descriptive Statistics

Measures of Location
  • Helps in summarizing the data before making inferences about the population.

  • Defines the center or middle of the sample data.

Arithmetic Mean
  • Definition: The sum of all observations divided by the number of observations.

  • Formula: x̄ = x₁ + x₂ + … + xₙ / n, x₁, x₂, …, xₙ are the sample values, and n is the total number of observations.

Sample Median
  • Definition: The middle value that splits the data in half.

  • Split the data into two halves by ordering:

    • If n is odd, take the middle point.

    • If n is even, take the average of the two middle points.

  • Notation: Q2

  • Example: Given ordered data: 3, 5, 7, 8, 8, 9, 10, 12, 35,

    • Since n is odd, the sample median Q2 is the fifth largest point, which is 8.

Calculation of Median
  • For lamb weight gain (n = 6, even), find Q2:

    • Ordered data: 1, 2, 10, 11, 13, 19.

    • Q2 calculation:
      Q2=10+112=10.5extlbQ2 = \frac{10 + 11}{2} = 10.5 ext{ lb}

Data Distributions
  • Data distributions can be symmetric or asymmetric.

    • Symmetric: Left half mirrors the right half.

    • Asymmetric (skewed): Left and right halves do not mirror each other.

  • Mean vs. Median in Distributions:

    • Symmetric: Mean ≈ Median

    • Positively skewed: Mean > Median

    • Negatively skewed: Mean < Median

    • Median is more robust to extreme values, but mean is easier to compute.

Mode
  • Definition: Most frequently occurring value in a dataset.

  • Can be unimodal (1 mode), bimodal (2 modes), trimodal (3 modes), etc.

Percentiles
  • Definition: The p-th percentile (Vp) is where p% of data points are less than or equal to Vp.

  • Median as a Percentile: Median (Q2) is the 50th percentile.

  • Commonly used percentiles: Quartiles (Q1, Q2, Q3), Quintiles, Deciles.

Quartiles
  • Definition: Split the data into four equal parts.

    • Q1: 25th percentile (lower quartile).

    • Q2: 50th percentile (median).

    • Q3: 75th percentile (upper quartile).

Calculating Quartiles
  • Q1 calculation:

    • If n/4 is not an integer, take (k + 1)-th largest sample point.

    • If n/4 is an integer, take the average of the (n/4)th and (n/4 + 1)th largest.

  • Q2 follows similar rules for n/2.

  • Q3 follows similar rules for 3n/4.

Inter-Quartile Range (IQR)
  • Definition: IQR = Q3 - Q1

  • Minimum (y(1)): Smallest value in dataset.

  • Maximum (y(n)): Largest value in dataset.

Five Number Summary
  • Comprises: {y(1), Q1, Q2, Q3, y(n)}

  • Useful for quick data overview.

Boxplot
  • Graphical representation of the five-number summary showing the IQR and quartiles: Q1, Q2, Q3.

Outliers
  • Definition: Points significantly different from other points in the dataset.

  • Can result from errors or unique circumstances.

  • Fence calculations:

    • Lower fence: Q11.5imesIQRQ1 - 1.5 imes IQR

    • Upper fence: Q3+1.5imesIQRQ3 + 1.5 imes IQR

  • Outliers are data points outside these fences.