Statistics clip 3 Descriptive Statistics: Numerical Measures
Knowledge Clip 3: Descriptive Statistics: Numerical Measures
Introduction
Prof. Dr. Tammo H.A. Bijmolt
Date: 9/7/2022
Literature for this topic: Chapter 3
Statistical Methods
Descriptive Statistics
Overview of data analysis
Importance: Start the data analysis of each project with descriptive statistics to understand the characteristics of the dataset.
Types of Statistics:
Descriptive Statistics: Summarizes and describes characteristics of data.
Inferential Statistics: Makes inferences and predictions about a population based on a sample.
Properties of Distributions
Key Characteristics:
Measures of Central Tendency (Location):
Mode: The most frequently occurring value in the dataset. It is the only measure of central tendency applicable to nominal variables.
Median: The middle value when data is ordered from lowest to highest. In the case of an even number of observations, it is the average of the two middle values. It is usable for ordinal and higher measurement levels.
Mean: The average value calculated as the sum of all values divided by the number of cases. Applicable only for interval and ratio variables.
Measures of Variability
Key Concepts:
Minimum & Maximum: The lowest and highest values respectively, applicable to ordinal and higher measurement levels.
Range: The difference between the maximum and minimum values observed in the data. It is only valid for interval or ratio measurement levels.
Interquartile Range (IQR): The difference between the third quartile (75th percentile) and the first quartile (25th percentile), representing the middle 50% of the data.
Example Data Illustration
Data Calculation Examples:
For Age: IQR = 71 - 43 = 28
For Expenditure: IQR = 1215 - 409.50 = 805.50
Additional Measures of Variability
Standard Deviation: A measure of the amount of variation or dispersion in a set of values, applicable for interval or ratio variables only.
Variance: The square of the standard deviation, also only applicable for interval or ratio variables.
Coefficient of Variation (CV)
Definition: A relative measure of variability that expresses how much variation is present in relation to the average of the variable. It allows for comparison across datasets that differ in scale.
CV Calculation Examples:
Age:
Expenditures:
Interpretation: There is relatively more variability in expenditures than in age due to the higher CV.
Properties of Distributions II
Additional Characteristics:
Shape of the Distribution: Can be symmetric (skewness = 0) or skewed (to the left or right), relevant for interval or ratio measurement levels.
Modal Classes:
Unimodal: One mode.
Bimodal: Two modes.
Multimodal: Multiple modes, less common.
Frequency Distribution Example
Data Sample: Valid N=833, Missing=0 for variables such as Age, Gender, Type of Household, Province, Education Level, Expenditure.
Summary Statistics:
Mean Age: 56.96
Median Age: 62.00
Mode Age: 73
Standard Deviation (Age): 16.970
Variance (Age): 287985
Skewness (Age): -0.485 (indicates slight left skew)
Range (Age): 71 (age range from 18 to 89)
Shape of Distributions Visualized
Example of Positive/Right Skewness
Expenditures Analysis:
Skewness = 1.224, indicating a right skew (tail extends rightward).
If you consider a variable X with the following characteristics:
Mean (A) = 1.40
Median (B) = 1.00
Mode (C) = 0.75
The statement to analyze is:
a) P(X > A) = 0.5
b) P(X > B) = 0.5
c) P(X > C) = 0.5
Conclusion
Summary of Key Points
Understanding descriptive statistics is critical for effective data analysis.
Basic measures such as mean, median, mode, range, variance, and standard deviation are essential to summarize and describe data.
Properties of distributions, including skewness and modal classification, provide insights into the characteristics of the data.
Acknowledgements
Thanks from the University of Groningen, Faculty of Economics and Business.
Additional resources available through other knowledge clips.