Class3

Class Overview

  • Course: Analysis of univariate data: measures of location of a data set

  • Institution: Department of Statistics, UC3M

  • Focus: Understanding data distributions through centrality measures and other descriptive statistics.

Chapter 3: Analysis of Univariate Data

  • Key Topics:

    • Centrality measures: mean, median, mode

    • Other distribution points: minimum, maximum, quartiles, and percentiles

  • Recommended Reading: Explore how Wikipedia defines percentiles for a comprehensive understanding.

Measures of Location

  • Standard Measures:

    • Mode: The most frequent value in a dataset.

    • Median: The middle value when the data is ordered, accounting for order.

    • Mean: The average calculated by summing the values and dividing by the number of values.

  • Other Points of Interest:

    • Minimum: The lowest value in the dataset.

    • Maximum: The highest value in the dataset.

    • Quartiles: Values that divide the data into four equal parts; specifically, Q1 (first quartile) and Q3 (third quartile).

    • Percentiles: Values that divide the data into 100 equal parts.

Measures of Centrality

Mode

  • Definition: The mode represents the most frequent observation in the dataset.

  • Example Table - Political Parties:

    • PSOE: 20 (0.30)

    • PP: 18 (0.27)

    • UP: 11 (0.17)

    • VOX: 9 (0.14)

    • Cs: 3 (0.05)

    • Más Madrid: 3 (0.05)

    • Other: 2 (0.03)

    • Total: 66 (1.00)

  • Notes: The mode can be calculated for qualitative data; datasets may be unimodal, bimodal, etc.

Median

  • Definition: The median is the value that separates the higher half from the lower half of the data set.

  • Sample Calculations: Shows how to calculate the median depending on whether the data size (n) is odd or even.

Mean

  • Calculation of Mean:

    • Example with values: 5, 3, 11, 21, 7, 5, 2, 1, 3

    • Formula: ( x = \frac{(5 + 3 + 11 + 21 + 7 + 5 + 2 + 1 + 3)}{9} = 6.44 )

Sensitivity of Measures

  • Example Comparison:

  • Sample 1: All values clustered around 3.

    • Mode = 3, Median = 3, Mean = 3

  • Sample 2: Contains an extreme value (500).

    • Mode = 3, Median = 3, Mean = 52.5

  • Conclusion: The mean is highly sensitive to outliers, while mode and median remain unchanged.

Measures of Distribution Extremes

  • Minimum and Maximum:

    • The most extreme values that give context to the dataset's range.

Percentiles

  • Definition: Divides the data into segments to separate ordered data into portions.

  • Calculation: To find the p x 100% percentile, use the formula ( r = p \times (n + 1) )

    • Example for 80% Percentile: Calculate using ordered dataset.

Quartiles

  • Q1 and Q3: Represent the 25th and 75th percentiles respectively, providing insight into data distribution.

  • Example Calculation: For dataset 5, 3, 11, 21, 7, 5...

  • Result: Q1 = 2.5, Q3 = 9

Exercises and Applications

  • Exercises involve calculating measures of central tendency and percentiles from given tables.

  • Example scenarios include mayoral ages and student commuting times, requiring practical analysis of discrete data.