Statistics clip 3 Descriptive Statistics: Numerical Measures

Knowledge Clip 3: Descriptive Statistics: Numerical Measures

Introduction

  • Prof. Dr. Tammo H.A. Bijmolt

  • Date: 9/7/2022

  • Literature for this topic: Chapter 3

Statistical Methods

  • Descriptive Statistics

    • Overview of data analysis

    • Importance: Start the data analysis of each project with descriptive statistics to understand the characteristics of the dataset.

  • Types of Statistics:

    • Descriptive Statistics: Summarizes and describes characteristics of data.

    • Inferential Statistics: Makes inferences and predictions about a population based on a sample.

Properties of Distributions

Key Characteristics:
  1. Measures of Central Tendency (Location):

    • Mode: The most frequently occurring value in the dataset. It is the only measure of central tendency applicable to nominal variables.

    • Median: The middle value when data is ordered from lowest to highest. In the case of an even number of observations, it is the average of the two middle values. It is usable for ordinal and higher measurement levels.

    • Mean: The average value calculated as the sum of all values divided by the number of cases. Applicable only for interval and ratio variables.

Measures of Variability
  • Key Concepts:

    • Minimum & Maximum: The lowest and highest values respectively, applicable to ordinal and higher measurement levels.

    • Range: The difference between the maximum and minimum values observed in the data. It is only valid for interval or ratio measurement levels.

    • Interquartile Range (IQR): The difference between the third quartile (75th percentile) and the first quartile (25th percentile), representing the middle 50% of the data.

Example Data Illustration

  • Data Calculation Examples:

    • For Age: IQR = 71 - 43 = 28

    • For Expenditure: IQR = 1215 - 409.50 = 805.50

Additional Measures of Variability

  • Standard Deviation: A measure of the amount of variation or dispersion in a set of values, applicable for interval or ratio variables only.

  • Variance: The square of the standard deviation, also only applicable for interval or ratio variables.

Coefficient of Variation (CV)

  • Definition: A relative measure of variability that expresses how much variation is present in relation to the average of the variable. It allows for comparison across datasets that differ in scale.

  • CV Calculation Examples:

    • Age: rac16.9756.96=0.298=29.8%rac{16.97}{56.96} = 0.298 = 29.8\%

    • Expenditures: rac744.65960.38=0.775=77.5%rac{744.65}{960.38} = 0.775 = 77.5\%

  • Interpretation: There is relatively more variability in expenditures than in age due to the higher CV.

Properties of Distributions II

Additional Characteristics:
  • Shape of the Distribution: Can be symmetric (skewness = 0) or skewed (to the left or right), relevant for interval or ratio measurement levels.

  • Modal Classes:

    • Unimodal: One mode.

    • Bimodal: Two modes.

    • Multimodal: Multiple modes, less common.

Frequency Distribution Example

  • Data Sample: Valid N=833, Missing=0 for variables such as Age, Gender, Type of Household, Province, Education Level, Expenditure.

  • Summary Statistics:

    • Mean Age: 56.96

    • Median Age: 62.00

    • Mode Age: 73

    • Standard Deviation (Age): 16.970

    • Variance (Age): 287985

    • Skewness (Age): -0.485 (indicates slight left skew)

    • Range (Age): 71 (age range from 18 to 89)

Shape of Distributions Visualized

Example of Positive/Right Skewness
  • Expenditures Analysis:

    • Skewness = 1.224, indicating a right skew (tail extends rightward).

    • If you consider a variable X with the following characteristics:

    • Mean (A) = 1.40

    • Median (B) = 1.00

    • Mode (C) = 0.75

    • The statement to analyze is:

    • a) P(X > A) = 0.5

    • b) P(X > B) = 0.5

    • c) P(X > C) = 0.5

Conclusion

Summary of Key Points
  • Understanding descriptive statistics is critical for effective data analysis.

  • Basic measures such as mean, median, mode, range, variance, and standard deviation are essential to summarize and describe data.

  • Properties of distributions, including skewness and modal classification, provide insights into the characteristics of the data.

Acknowledgements

  • Thanks from the University of Groningen, Faculty of Economics and Business.

  • Additional resources available through other knowledge clips.