Descriptive Statistics Notes

Descriptive Statistics Overview

  • Descriptive Statistics: A branch of statistics that provides summaries about the sample and measures.

Key Terms to Understand

  • Central Tendency: Indicates the average or center of a data set.
  • Variation: Describes how much the data varies or spreads out.
  • Skewness: Measures asymmetry of the probability distribution of a real-valued random variable.
  • Kurtosis: Measures the tailedness of the probability distribution.

Learning Objectives

Know
  • Definitions of central tendency, variation, skewness, and kurtosis.
  • Three principal measures of central tendency (mean, median, mode) and their application.
  • Types of mean (arithmetic, weighted, geometric, harmonic).
Understand
  • Importance of calculating summary statistics.
  • Computation of the following measurements:
    • Arithmetic mean
    • Weighted mean
    • Geometric mean
    • Harmonic mean
    • Range
    • Mean deviation
    • Variance
    • Standard deviation
    • Coefficient of Variation
  • Impact of skewness on the relative positions of the three measures of central tendency.
Be Able to
  • Compute various measures of central tendency and variation.
  • Create frequency distributions and histograms.
  • Assess skewness and kurtosis within data distributions.

Measures of Central Tendency

Mean
  • Arithmetic Mean: Sum of all observations divided by the number of observations.
    Formula: ar{Y} = rac{1}{n} imes extstyle{igg( extstyle{igg( extstyle{igg( Y1 + Y2 + ar{Y} + … + Y_n} igg) } igg) } igg)}

    • Excel Command: =AVERAGE(A1:An)
  • Weighted Mean: The mean calculated by giving different weights to different observations.
    Formula: Yˉ=extstyle(extstyle(extstyle(w<em>1Y</em>1+w<em>2Y</em>2++w<em>nY</em>n)))extstyle(w<em>1+w</em>2++wn)\bar{Y} = \frac{ extstyle{\bigg( extstyle{\bigg( extstyle{\bigg( w<em>1Y</em>1 + w<em>2Y</em>2 + … + w<em>nY</em>n} \bigg) }\bigg) }\bigg)}{ extstyle{\bigg( w<em>1 + w</em>2 + … + w_n} \bigg)}

  • Geometric Mean: Use when values have multiplicative relationships.
    Formula: GM=extnthrootofextstyle(Y<em>1imesY</em>2imesimesYn)GM = ext{n-th root of } extstyle{\bigg(Y<em>1 imes Y</em>2 imes … imes Y_n} \bigg)

    • Excel Command: =GEOMEAN(A1:An)
  • Harmonic Mean: Useful for rates. Formula: H=nextstyle(1Y<em>1+1Y</em>2++1Yn)H = \frac{n}{ extstyle{\bigg( \frac{1}{Y<em>1} + \frac{1}{Y</em>2} + … + \frac{1}{Y_n} \bigg)} }

    • Excel Command: =HARMEAN(A1:An)
  • Median: The middle value when data is arranged in order.

    • Excel Command: =MEDIAN(A1:An)
  • Mode: The most frequently occurring value in a distribution.

    • Excel Command: =MODE(A1:An)

Relationship Between Shape of Distribution and Averages

  • In a symmetric distribution:
    • Mean = Median = Mode
  • In Negatively Skewed Distribution:
    • Mean < Median < Mode
  • In Positively Skewed Distribution:
    • Mean > Median > Mode

Measures of Dispersion

  • Range: The difference between the highest and lowest value.
    Formula: extRange=extMaxextMinext{Range} = ext{Max} - ext{Min}
  • Interquartile Range (IQR): Difference between the 3rd quartile (Q3) and 1st quartile (Q1).
    Formula: extIQR=Q3Q1ext{IQR} = Q3 - Q1
  • Mean Deviation: The average of the absolute deviations from the mean.
    Formula: MD=1nimesextstyle(Y<em>1Yˉ+Y</em>2Yˉ++YnYˉ)MD = \frac{1}{n} imes extstyle{\bigg( |Y<em>1 - \bar{Y}| + |Y</em>2 - \bar{Y}| + … + |Y_n - \bar{Y}| \bigg) }
  • Variance: The average squared deviation from the mean. Formula: V=1nimesextstyle((Y<em>1Yˉ)2+(Y</em>2Yˉ)2++(YnYˉ)2)V = \frac{1}{n} imes extstyle{\bigg( (Y<em>1 - \bar{Y})^2 + (Y</em>2 - \bar{Y})^2 + … + (Y_n - \bar{Y})^2 \bigg) }
    • Excel Command: =VAR(A1:An)
  • Standard Deviation: The square root of variance. Formula: SD=extsqrt(V)SD = ext{sqrt}(V)
    • Excel Command: =STDEV(A1:An)
  • Coefficient of Variation (CV): Ratio of the standard deviation to the mean.
    Formula: CV=SDYˉimes100CV = \frac{SD}{\bar{Y}} imes 100

Assessing Shape of Distribution

  • Skewness and Kurtosis: Measure of asymmetry and peakedness of the distribution.
  • Skewness range: -∞ to +∞.
    • Positive: Right-skewed
    • Negative: Left-skewed
    • Zero: Symmetric
  • Kurtosis range: 1 to ∞.
    • Leptokurtic: Peaked distribution
    • Platykurtic: Flat distribution
    • Mesokurtic: Normal distribution (Kurtosis = 3)