Measures of Central Tendency, Dispersion and Association

Measures of Central Tendency, Dispersion, and Association

Introduction

Presented by Zhenisbek Assylbekov
Department of Mathematical Sciences
Course: STAT 524: Applied Multivariate Analysis

Overview of Key Concepts

Central Tendency
- Definition: Represents a typical value for a variable.
Dispersion
- Definition: Refers to the extent to which individual observations deviate from a central value for a variable.
Association
- Definition: Examines how variables are related to one another simultaneously.
Statistics as a Subject
- Definition: The science and art of using sample information to infer knowledge about populations.

Population and Sample

Population:
- Definition: The collection of all objects of interest from which inferences may be made.
- Alternative view: A collection of all possible random draws from a stochastic model (e.g., independent draws from a coin).
Population Parameter:
- Definition: A numerical characteristic of a population, often unknown.
Sampling:
- Definition: Selecting a subset of the population for measurement or observation.
- Sample Statistic: A numerical characteristic of a sample that estimates an unknown population parameter.

Big Picture of Statistics

Example Scenario: Assessment of public opinion on the death penalty.
Samples & Inference:
- Population percentage: 65% in favor.
- Sample size: 1082 responses.
- Conclusion: With 95% confidence, the population percentage is between 62% and 68%.

Notation

Let:
- $x_{ij}$ = Measurement for variable j in observation i.
- $p$ = Number of variables.
- $n$ = Number of observations.
Data Vector (ith observation):
- extbf{x}{i} = egin{bmatrix} x{i1} \ x{i2} \ ext{…} \ x{ip} \ ext{…} \ ext{…} \ ext{…} \ ext{…} \ ext{…} ext{…} \ x{n1} \ x{n2} \ ext{…} \ x{np} \ ext{…} \ ext{…} \ ext{…} \ ext{…} \ ext{…} \ ext{…} \ ext{…} \ ext{…} \ ext{…} \ ext{…} ext{…} \ x{i1} \ x{i2} \ ext{…} \ x{ip} \ n
  ight). ext{ }
Data Matrix:
- extbf{X} = egin{bmatrix} extbf{x}^{ op}{1} \ extbf{x}^{ op}{2} \ ext{…} \ extbf{x}^{ op}{n} \ ext{…} \ ext{…} \ ext{…} \ x{n1} \ x{n2} \ ext{…} \ x{np} \ ext{…} \ ext{…} \ ext{…} \ ext{…} \ ext{…} \ ext{…} ext{…} \ ext{…} \ ext{…} \ ext{…} ext{ }. \

Central Tendency

Population Mean ( µ ):
- Definition: Theoretical population average.
- Notation: $ext{µ}{j} = E[X{j}]$ where E is expectation, representing the average of variable Xj across the population.
Sample Mean ( x̄ ):
- Definition: Empirical average derived from sample data.
- Notation: $\bar{x}j = rac{1}{n} ext{ } imes ext{ } \bigg( ext{ } ext{summing from i=1 to n} (x{ij}) ext{ }\bigg)$
Properties of Sample Mean:
- The sample mean is a function of random data: $\bar{X}j = rac{1}{n} ext{ } imes ext{ } \bigg( ext{summing from i=1 to n} X{ij}\bigg)$
- Its expectation equals the population mean: $E[\bar{X}j] = µj$

The Mean Vector

Population Mean Vector ( µ ):
- A vector containing population means for all variables: ext{µ} = egin{bmatrix} µ{1} \ µ{2} \ … \ µ_{p} \ ext{…} \ ext{…} \ ext{…} … extit{ …} \ ext{…} \ ext{…} \ ext{…} … extit{ …} \ ext{… ext{}} … \ } ext{ }
Sample Mean Vector ( x̄ ):
- Vector representing sample means for all variables:
  ar{x} = egin{bmatrix} ar{x}{1} \ ar{x}{2} \ … \ ar{x}_{p} \ ext{…} \ ext{…} \ ext{…} … extit{ …} \ ext{…} \ ext{…} \ ext{…} … extit{ …} \ ext{… ext{}} … \ } ext{ }

Variance

Population Variance ( σ² ):
- Measures spread in variable values.
- Definition: $σ^{2}{j} = E[(X{j} - µ_{j})^{2}]$
- Interpretation: Larger values indicate more spread from the mean.
Sample Variance ( s² ):
- Definition: Estimates population variance:
 $s^{2}{j} = rac{1}{n - 1} ext{ } imes ext{ }\bigg( ext{summing from i=1 to n }(x{ij - \bar{x}})^{2}\bigg)$
Properties:
- The expectation of sample variance equals the population variance:
 $E[S^{2}{j}] = σ^{2}{j}$

Example: Pulse Rates Calculation

Given sample pulse rates: 64, 68, 74, 76, 78.
1. Sample Mean:
  $\bar{x} = rac{64 + 68 + 74 + 76 + 78}{5} = 72$
2. Sample Variance:
  $s^{2} = rac{(64 - 72)^{2} + (68 - 72)^{2} + (74 - 72)^{2} + (76 - 72)^{2} + (78 - 72)^{2}}{5 - 1} = 34$
3. Standard Deviation:
  $s = ext{ } ext{sqrt}(34) ext{ } ext{ } ext{ } ext{ } ext{approx. } 5.83$

Introduction to Measures of Association

Purpose: Quantifies relationships between two or more variables.
Key Measures:
- Covariance
- Correlation

Covariance

Definition:
- Indicates linear relationship between two variables.
- Formula:
 $σ{jk} = E[(X{j} - µ{j})(X{k} - µ_{k})]$
Example Application:
- Assessing if height and weight are positively correlated (e.g., taller individuals tend to weigh more).

Interpretation of Covariance

Positive Covariance:
- Indicates one variable increases as the second variable also increases.
Negative Covariance:
- Indicates one variable increases while the second variable decreases.
Zero Covariance:
- Implies the absence of a linear relationship between the variables.
Limitation:
- Covariance's scale-dependence makes it hard to interpret without standardization.

Sample Covariance

Formula for estimating population covariance from samples:
- $s{jk} = rac{ ext{summing from i=1 to n}\big( (x{ij} - \bar{x}{j})(x{ik} - \bar{x}_{k}) \big)}{n - 1}$
Properties:
- Sample covariance is unbiased for population covariance:
 $E[s{jk}] = σ{jk}$

Example: Calculating Covariance

Given dataset of heights and weights (individuals 1-5):
Height (x₁)
Weight (x₂)
62
120
65
135
70
150
72
160
68
155
- Calculation steps:
1. Compute summations:
 - $ext{summing } x_{i1} = 337$
 - $ext{summing } x_{i2} = 720$
 - $ext{summing } (x{i1} imes x{i2}) = 48775$
2. Covariance:
 - $s_{12} = rac{ ext{Summation}}{5} - rac{(337)(720/5)}{4} = 61.75$

Height (x₁)	Weight (x₂)
62	120
65	135
70	150
72	160
68	155

Variance-Covariance Matrix

Definition:
- Organized pattern of population variances and covariances:
  Σ = egin{bmatrix} σ^{2}{1} & σ{12} & … & σ{1p} \ σ{21} & σ^{2}{2} & … & σ{2p} \ … & … & … & … \ σ{p1} & σ{p2} & … & σ^{2}_{p} \ ext{…} \ ext{…} \ } ext{ }
Properties:
- Both $Σ$ and $S$ are symmetric matrices (e.g., $σ{jk} = σ{kj}$ ).
Unbiasedness:
- $E[S] = ext{Population Matrix Σ}$

Correlation

Definition:
- Standardizes covariance to yield a unit-free measure of the strength of relationship.
- Formula:
 $ρ{jk} = rac{σ{jk}}{σ{j}σ{k}}$
- Estimate by:
 $r{jk} = rac{s{jk}}{s{j}s{k}}$
Sample Correlation Matrix (R):
R = egin{bmatrix} 1 & r{12} & … \ r{21} & 1 & … \ … & … & … \ } ext{ }

Example: Calculating Correlation

Given Standard Deviations::
- $s_{1} = 3.97$
- $s_{2} = 16.36$
Correlation Calculation:
- $r_{12} = rac{61.75}{3.97 imes 16.36} ext{ } ext{ } ext{ } ext{ } ext{ } ext{ } ext{ } ext{ } ext{ } ext{ } ext{ } ext{ } ≈ 0.95$
- Interpretation: Indicates a strong positive correlation showing linear relationship between height and weight.
Correlation Matrix Example:
R = egin{bmatrix} 1 & 0.95 \ 0.95 & 1 \ ext{…} \ } ext{ }

Introduction to Additional Measures of Dispersion

Definition: Provides insights into data spread or variability.
Previous focus: Individual variances of variables.
Key Measures:
- Total Variation
- Generalized Variance

Total Variation

Definition: Measures overall variability of a set of variables.
Defined as the trace of the variance-covariance matrix:
- $ext{trace}(Σ) = σ^{2}{1} + σ^{2}{2} + … + σ^{2}_{p}$
Estimation:
- $ext{trace}(S) = s^{2}{1} + s^{2}{2} + … + s^{2}_{p}$

Example of Total Variation

Application on USDA women’s health survey nutrient intake data using R software.

Weakness of Total Variation

Example of Simulated Data through pairs of variables showcasing different correlation levels:
- Correlations:
  - $r=0$ // Non-related
  - $r=0.7$ // Moderate correlation
  - $r=0.9$ // High correlation
- Observation: Total variation equals 2 regardless of correlation.

Determinant of a Matrix

Definition: A measure rewriting for dispersion.
For a 2x2 Matrix:
- A = egin{bmatrix} a & b \ c & d \ ext{ } ext{ } ext{ } ext{ } ext{ }
- Determinant:
  $|A| = ad - bc$
For general p × p matrix:
- $|B| = ext{summing from 1 to p} ((-1)^{j+1}b{1j}|B{1j}|)$

Generalized Variance

Definition: An overall dispersion measure in multivariate data.
Defined as the determinant of the variance-covariance matrix:
- $ext{Generalized Population Variance} = ext{det}(Σ)$
- Sample Generalized Variance = det(S)

Example of Generalized Variance

Find the generalized sample variance for the Women's Health Survey data.