Measures of Central Tendency, Dispersion and Association

Measures of Central Tendency, Dispersion, and Association

Introduction

  • Presented by Zhenisbek Assylbekov

  • Department of Mathematical Sciences

  • Course: STAT 524: Applied Multivariate Analysis

Overview of Key Concepts

  1. Central Tendency

    • Definition: Represents a typical value for a variable.

  2. Dispersion

    • Definition: Refers to the extent to which individual observations deviate from a central value for a variable.

  3. Association

    • Definition: Examines how variables are related to one another simultaneously.

  4. Statistics as a Subject

    • Definition: The science and art of using sample information to infer knowledge about populations.

Population and Sample

  • Population:

    • Definition: The collection of all objects of interest from which inferences may be made.

    • Alternative view: A collection of all possible random draws from a stochastic model (e.g., independent draws from a coin).

  • Population Parameter:

    • Definition: A numerical characteristic of a population, often unknown.

  • Sampling:

    • Definition: Selecting a subset of the population for measurement or observation.

    • Sample Statistic: A numerical characteristic of a sample that estimates an unknown population parameter.

Big Picture of Statistics

  • Example Scenario: Assessment of public opinion on the death penalty.

  • Samples & Inference:

    • Population percentage: 65% in favor.

    • Sample size: 1082 responses.

    • Conclusion: With 95% confidence, the population percentage is between 62% and 68%.

Notation

  • Let:

    • xijx_{ij} = Measurement for variable j in observation i.

    • pp = Number of variables.

    • nn = Number of observations.

  • Data Vector (ith observation):

    • extbf{x}{i} = egin{bmatrix} x{i1} \ x{i2} \ ext{…} \ x{ip} \ ext{…} \ ext{…} \ ext{…} \ ext{…} \ ext{…} ext{…} \ x{n1} \ x{n2} \ ext{…} \ x{np} \ ext{…} \ ext{…} \ ext{…} \ ext{…} \ ext{…} \ ext{…} \ ext{…} \ ext{…} \ ext{…} \ ext{…} ext{…} \ x{i1} \ x{i2} \ ext{…} \ x{ip} \ n
      ight). ext{ }

  • Data Matrix:

    • extbf{X} = egin{bmatrix} extbf{x}^{ op}{1} \ extbf{x}^{ op}{2} \ ext{…} \ extbf{x}^{ op}{n} \ ext{…} \ ext{…} \ ext{…} \ x{n1} \ x{n2} \ ext{…} \ x{np} \ ext{…} \ ext{…} \ ext{…} \ ext{…} \ ext{…} \ ext{…} ext{…} \ ext{…} \ ext{…} \ ext{…} ext{ }. \

Central Tendency

  1. Population Mean ( µ ):

    • Definition: Theoretical population average.

    • Notation: extµ<em>j=E[X</em>j]ext{µ}<em>{j} = E[X</em>{j}] where E is expectation, representing the average of variable Xj across the population.

  2. Sample Mean ( x̄ ):

    • Definition: Empirical average derived from sample data.

    • Notation: xˉ<em>j=rac1nextimesext(extextsummingfromi=1ton(x</em>ij)ext)\bar{x}<em>j = rac{1}{n} ext{ } imes ext{ } \bigg( ext{ } ext{summing from i=1 to n} (x</em>{ij}) ext{ }\bigg)

  3. Properties of Sample Mean:

    • The sample mean is a function of random data: Xˉ<em>j=rac1nextimesext(extsummingfromi=1tonX</em>ij)\bar{X}<em>j = rac{1}{n} ext{ } imes ext{ } \bigg( ext{summing from i=1 to n} X</em>{ij}\bigg)

    • Its expectation equals the population mean: E[Xˉ<em>j]=µ</em>jE[\bar{X}<em>j] = µ</em>j

The Mean Vector

  • Population Mean Vector ( µ ):

    • A vector containing population means for all variables: ext{µ} = egin{bmatrix} µ{1} \ µ{2} \ … \ µ_{p} \ ext{…} \ ext{…} \ ext{…} … extit{ …} \ ext{…} \ ext{…} \ ext{…} … extit{ …} \ ext{… ext{}} … \ } ext{ }

  • Sample Mean Vector ( x̄ ):

    • Vector representing sample means for all variables:
      ar{x} = egin{bmatrix} ar{x}{1} \ ar{x}{2} \ … \ ar{x}_{p} \ ext{…} \ ext{…} \ ext{…} … extit{ …} \ ext{…} \ ext{…} \ ext{…} … extit{ …} \ ext{… ext{}} … \ } ext{ }

Variance

  • Population Variance ( σ² ):

    • Measures spread in variable values.

    • Definition: σ2<em>j=E[(X</em>jµj)2]σ^{2}<em>{j} = E[(X</em>{j} - µ_{j})^{2}]

    • Interpretation: Larger values indicate more spread from the mean.

  • Sample Variance ( s² ):

    • Definition: Estimates population variance:
      s2<em>j=rac1n1extimesext(extsummingfromi=1ton(x</em>ijxˉ)2)s^{2}<em>{j} = rac{1}{n - 1} ext{ } imes ext{ }\bigg( ext{summing from i=1 to n }(x</em>{ij - \bar{x}})^{2}\bigg)

  • Properties:

    • The expectation of sample variance equals the population variance:
      E[S2<em>j]=σ2</em>jE[S^{2}<em>{j}] = σ^{2}</em>{j}

Example: Pulse Rates Calculation

  • Given sample pulse rates: 64, 68, 74, 76, 78.

    1. Sample Mean:
      xˉ=rac64+68+74+76+785=72\bar{x} = rac{64 + 68 + 74 + 76 + 78}{5} = 72

    2. Sample Variance:
      s2=rac(6472)2+(6872)2+(7472)2+(7672)2+(7872)251=34s^{2} = rac{(64 - 72)^{2} + (68 - 72)^{2} + (74 - 72)^{2} + (76 - 72)^{2} + (78 - 72)^{2}}{5 - 1} = 34

    3. Standard Deviation:
      s=extextsqrt(34)extextextextextapprox.5.83s = ext{ } ext{sqrt}(34) ext{ } ext{ } ext{ } ext{ } ext{approx. } 5.83

Introduction to Measures of Association

  • Purpose: Quantifies relationships between two or more variables.

  • Key Measures:

    • Covariance

    • Correlation

Covariance

  • Definition:

    • Indicates linear relationship between two variables.

    • Formula:
      σ<em>jk=E[(X</em>jµ<em>j)(X</em>kµk)]σ<em>{jk} = E[(X</em>{j} - µ<em>{j})(X</em>{k} - µ_{k})]

  • Example Application:

    • Assessing if height and weight are positively correlated (e.g., taller individuals tend to weigh more).

Interpretation of Covariance

  1. Positive Covariance:

    • Indicates one variable increases as the second variable also increases.

  2. Negative Covariance:

    • Indicates one variable increases while the second variable decreases.

  3. Zero Covariance:

    • Implies the absence of a linear relationship between the variables.

  4. Limitation:

    • Covariance's scale-dependence makes it hard to interpret without standardization.

Sample Covariance

  • Formula for estimating population covariance from samples:

    • s<em>jk=racextsummingfromi=1ton((x</em>ijxˉ<em>j)(x</em>ikxˉk))n1s<em>{jk} = rac{ ext{summing from i=1 to n}\big( (x</em>{ij} - \bar{x}<em>{j})(x</em>{ik} - \bar{x}_{k}) \big)}{n - 1}

  • Properties:

    • Sample covariance is unbiased for population covariance:
      E[s<em>jk]=σ</em>jkE[s<em>{jk}] = σ</em>{jk}

Example: Calculating Covariance

  • Given dataset of heights and weights (individuals 1-5):

    Height (x₁)

    Weight (x₂)

    62

    120

    65

    135

    70

    150

    72

    160

    68

    155

    • Calculation steps:

    1. Compute summations:

      • extsummingxi1=337ext{summing } x_{i1} = 337

      • extsummingxi2=720ext{summing } x_{i2} = 720

      • extsumming(x<em>i1imesx</em>i2)=48775ext{summing } (x<em>{i1} imes x</em>{i2}) = 48775

    2. Covariance:

      • s12=racextSummation5rac(337)(720/5)4=61.75s_{12} = rac{ ext{Summation}}{5} - rac{(337)(720/5)}{4} = 61.75

Variance-Covariance Matrix

  • Definition:

    • Organized pattern of population variances and covariances:
      Σ = egin{bmatrix} σ^{2}{1} & σ{12} & … & σ{1p} \ σ{21} & σ^{2}{2} & … & σ{2p} \ … & … & … & … \ σ{p1} & σ{p2} & … & σ^{2}_{p} \ ext{…} \ ext{…} \ } ext{ }

  • Properties:

    • Both ΣΣ and SS are symmetric matrices (e.g., σ<em>jk=σ</em>kjσ<em>{jk} = σ</em>{kj}).

  • Unbiasedness:

    • E[S]=extPopulationMatrixΣE[S] = ext{Population Matrix Σ}

Correlation

  • Definition:

    • Standardizes covariance to yield a unit-free measure of the strength of relationship.

    • Formula:
      ρ<em>jk=racσ</em>jkσ<em>jσ</em>kρ<em>{jk} = rac{σ</em>{jk}}{σ<em>{j}σ</em>{k}}

    • Estimate by:
      r<em>jk=racs</em>jks<em>js</em>kr<em>{jk} = rac{s</em>{jk}}{s<em>{j}s</em>{k}}

  • Sample Correlation Matrix (R):
    R = egin{bmatrix} 1 & r{12} & … \ r{21} & 1 & … \ … & … & … \ } ext{ }

Example: Calculating Correlation

  • Given Standard Deviations::

    • s1=3.97s_{1} = 3.97

    • s2=16.36s_{2} = 16.36

  • Correlation Calculation:

    • r12=rac61.753.97imes16.36extextextextextextextextextextextext0.95r_{12} = rac{61.75}{3.97 imes 16.36} ext{ } ext{ } ext{ } ext{ } ext{ } ext{ } ext{ } ext{ } ext{ } ext{ } ext{ } ext{ } ≈ 0.95

    • Interpretation: Indicates a strong positive correlation showing linear relationship between height and weight.

  • Correlation Matrix Example:
    R = egin{bmatrix} 1 & 0.95 \ 0.95 & 1 \ ext{…} \ } ext{ }

Introduction to Additional Measures of Dispersion

  • Definition: Provides insights into data spread or variability.

  • Previous focus: Individual variances of variables.

  • Key Measures:

    • Total Variation

    • Generalized Variance

Total Variation

  • Definition: Measures overall variability of a set of variables.

  • Defined as the trace of the variance-covariance matrix:

    • exttrace(Σ)=σ2<em>1+σ2</em>2++σp2ext{trace}(Σ) = σ^{2}<em>{1} + σ^{2}</em>{2} + … + σ^{2}_{p}

  • Estimation:

    • exttrace(S)=s2<em>1+s2</em>2++sp2ext{trace}(S) = s^{2}<em>{1} + s^{2}</em>{2} + … + s^{2}_{p}

Example of Total Variation

  • Application on USDA women’s health survey nutrient intake data using R software.

Weakness of Total Variation

  • Example of Simulated Data through pairs of variables showcasing different correlation levels:

    • Correlations:

      • r=0r=0 // Non-related

      • r=0.7r=0.7 // Moderate correlation

      • r=0.9r=0.9 // High correlation

    • Observation: Total variation equals 2 regardless of correlation.

Determinant of a Matrix

  • Definition: A measure rewriting for dispersion.

  • For a 2x2 Matrix:

    • A = egin{bmatrix} a & b \ c & d \ ext{ } ext{ } ext{ } ext{ } ext{ }

    • Determinant:
      A=adbc|A| = ad - bc

  • For general p × p matrix:

    • B=extsummingfrom1top((1)j+1b<em>1jB</em>1j)|B| = ext{summing from 1 to p} ((-1)^{j+1}b<em>{1j}|B</em>{1j}|)

Generalized Variance

  • Definition: An overall dispersion measure in multivariate data.

  • Defined as the determinant of the variance-covariance matrix:

    • extGeneralizedPopulationVariance=extdet(Σ)ext{Generalized Population Variance} = ext{det}(Σ)

    • Sample Generalized Variance = det(S)

Example of Generalized Variance

  • Find the generalized sample variance for the Women's Health Survey data.