Class3
Class Overview
Course: Analysis of univariate data: measures of location of a data set
Institution: Department of Statistics, UC3M
Focus: Understanding data distributions through centrality measures and other descriptive statistics.
Chapter 3: Analysis of Univariate Data
Key Topics:
Centrality measures: mean, median, mode
Other distribution points: minimum, maximum, quartiles, and percentiles
Recommended Reading: Explore how Wikipedia defines percentiles for a comprehensive understanding.
Measures of Location
Standard Measures:
Mode: The most frequent value in a dataset.
Median: The middle value when the data is ordered, accounting for order.
Mean: The average calculated by summing the values and dividing by the number of values.
Other Points of Interest:
Minimum: The lowest value in the dataset.
Maximum: The highest value in the dataset.
Quartiles: Values that divide the data into four equal parts; specifically, Q1 (first quartile) and Q3 (third quartile).
Percentiles: Values that divide the data into 100 equal parts.
Measures of Centrality
Mode
Definition: The mode represents the most frequent observation in the dataset.
Example Table - Political Parties:
PSOE: 20 (0.30)
PP: 18 (0.27)
UP: 11 (0.17)
VOX: 9 (0.14)
Cs: 3 (0.05)
Más Madrid: 3 (0.05)
Other: 2 (0.03)
Total: 66 (1.00)
Notes: The mode can be calculated for qualitative data; datasets may be unimodal, bimodal, etc.
Median
Definition: The median is the value that separates the higher half from the lower half of the data set.
Sample Calculations: Shows how to calculate the median depending on whether the data size (n) is odd or even.
Mean
Calculation of Mean:
Example with values: 5, 3, 11, 21, 7, 5, 2, 1, 3
Formula: ( x = \frac{(5 + 3 + 11 + 21 + 7 + 5 + 2 + 1 + 3)}{9} = 6.44 )
Sensitivity of Measures
Example Comparison:
Sample 1: All values clustered around 3.
Mode = 3, Median = 3, Mean = 3
Sample 2: Contains an extreme value (500).
Mode = 3, Median = 3, Mean = 52.5
Conclusion: The mean is highly sensitive to outliers, while mode and median remain unchanged.
Measures of Distribution Extremes
Minimum and Maximum:
The most extreme values that give context to the dataset's range.
Percentiles
Definition: Divides the data into segments to separate ordered data into portions.
Calculation: To find the p x 100% percentile, use the formula ( r = p \times (n + 1) )
Example for 80% Percentile: Calculate using ordered dataset.
Quartiles
Q1 and Q3: Represent the 25th and 75th percentiles respectively, providing insight into data distribution.
Example Calculation: For dataset 5, 3, 11, 21, 7, 5...
Result: Q1 = 2.5, Q3 = 9
Exercises and Applications
Exercises involve calculating measures of central tendency and percentiles from given tables.
Example scenarios include mayoral ages and student commuting times, requiring practical analysis of discrete data.