Measures of Central Tendency, Dispersion and Association
Measures of Central Tendency, Dispersion, and Association
Introduction
Presented by Zhenisbek Assylbekov
Department of Mathematical Sciences
Course: STAT 524: Applied Multivariate Analysis
Overview of Key Concepts
Central Tendency
Definition: Represents a typical value for a variable.
Dispersion
Definition: Refers to the extent to which individual observations deviate from a central value for a variable.
Association
Definition: Examines how variables are related to one another simultaneously.
Statistics as a Subject
Definition: The science and art of using sample information to infer knowledge about populations.
Population and Sample
Population:
Definition: The collection of all objects of interest from which inferences may be made.
Alternative view: A collection of all possible random draws from a stochastic model (e.g., independent draws from a coin).
Population Parameter:
Definition: A numerical characteristic of a population, often unknown.
Sampling:
Definition: Selecting a subset of the population for measurement or observation.
Sample Statistic: A numerical characteristic of a sample that estimates an unknown population parameter.
Big Picture of Statistics
Example Scenario: Assessment of public opinion on the death penalty.
Samples & Inference:
Population percentage: 65% in favor.
Sample size: 1082 responses.
Conclusion: With 95% confidence, the population percentage is between 62% and 68%.
Notation
Let:
= Measurement for variable j in observation i.
= Number of variables.
= Number of observations.
Data Vector (ith observation):
extbf{x}{i} = egin{bmatrix} x{i1} \ x{i2} \ ext{…} \ x{ip} \ ext{…} \ ext{…} \ ext{…} \ ext{…} \ ext{…} ext{…} \ x{n1} \ x{n2} \ ext{…} \ x{np} \ ext{…} \ ext{…} \ ext{…} \ ext{…} \ ext{…} \ ext{…} \ ext{…} \ ext{…} \ ext{…} \ ext{…} ext{…} \ x{i1} \ x{i2} \ ext{…} \ x{ip} \ n
ight). ext{ }
Data Matrix:
extbf{X} = egin{bmatrix} extbf{x}^{ op}{1} \ extbf{x}^{ op}{2} \ ext{…} \ extbf{x}^{ op}{n} \ ext{…} \ ext{…} \ ext{…} \ x{n1} \ x{n2} \ ext{…} \ x{np} \ ext{…} \ ext{…} \ ext{…} \ ext{…} \ ext{…} \ ext{…} ext{…} \ ext{…} \ ext{…} \ ext{…} ext{ }. \
Central Tendency
Population Mean ( µ ):
Definition: Theoretical population average.
Notation: where E is expectation, representing the average of variable Xj across the population.
Sample Mean ( x̄ ):
Definition: Empirical average derived from sample data.
Notation:
Properties of Sample Mean:
The sample mean is a function of random data:
Its expectation equals the population mean:
The Mean Vector
Population Mean Vector ( µ ):
A vector containing population means for all variables: ext{µ} = egin{bmatrix} µ{1} \ µ{2} \ … \ µ_{p} \ ext{…} \ ext{…} \ ext{…} … extit{ …} \ ext{…} \ ext{…} \ ext{…} … extit{ …} \ ext{… ext{}} … \ } ext{ }
Sample Mean Vector ( x̄ ):
Vector representing sample means for all variables:
ar{x} = egin{bmatrix} ar{x}{1} \ ar{x}{2} \ … \ ar{x}_{p} \ ext{…} \ ext{…} \ ext{…} … extit{ …} \ ext{…} \ ext{…} \ ext{…} … extit{ …} \ ext{… ext{}} … \ } ext{ }
Variance
Population Variance ( σ² ):
Measures spread in variable values.
Definition:
Interpretation: Larger values indicate more spread from the mean.
Sample Variance ( s² ):
Definition: Estimates population variance:
Properties:
The expectation of sample variance equals the population variance:
Example: Pulse Rates Calculation
Given sample pulse rates: 64, 68, 74, 76, 78.
Sample Mean:
Sample Variance:
Standard Deviation:
Introduction to Measures of Association
Purpose: Quantifies relationships between two or more variables.
Key Measures:
Covariance
Correlation
Covariance
Definition:
Indicates linear relationship between two variables.
Formula:
Example Application:
Assessing if height and weight are positively correlated (e.g., taller individuals tend to weigh more).
Interpretation of Covariance
Positive Covariance:
Indicates one variable increases as the second variable also increases.
Negative Covariance:
Indicates one variable increases while the second variable decreases.
Zero Covariance:
Implies the absence of a linear relationship between the variables.
Limitation:
Covariance's scale-dependence makes it hard to interpret without standardization.
Sample Covariance
Formula for estimating population covariance from samples:
Properties:
Sample covariance is unbiased for population covariance:
Example: Calculating Covariance
Given dataset of heights and weights (individuals 1-5):
Height (x₁)
Weight (x₂)
62
120
65
135
70
150
72
160
68
155
Calculation steps:
Compute summations:
Covariance:
Variance-Covariance Matrix
Definition:
Organized pattern of population variances and covariances:
Σ = egin{bmatrix} σ^{2}{1} & σ{12} & … & σ{1p} \ σ{21} & σ^{2}{2} & … & σ{2p} \ … & … & … & … \ σ{p1} & σ{p2} & … & σ^{2}_{p} \ ext{…} \ ext{…} \ } ext{ }
Properties:
Both and are symmetric matrices (e.g., ).
Unbiasedness:
Correlation
Definition:
Standardizes covariance to yield a unit-free measure of the strength of relationship.
Formula:
Estimate by:
Sample Correlation Matrix (R):
R = egin{bmatrix} 1 & r{12} & … \ r{21} & 1 & … \ … & … & … \ } ext{ }
Example: Calculating Correlation
Given Standard Deviations::
Correlation Calculation:
Interpretation: Indicates a strong positive correlation showing linear relationship between height and weight.
Correlation Matrix Example:
R = egin{bmatrix} 1 & 0.95 \ 0.95 & 1 \ ext{…} \ } ext{ }
Introduction to Additional Measures of Dispersion
Definition: Provides insights into data spread or variability.
Previous focus: Individual variances of variables.
Key Measures:
Total Variation
Generalized Variance
Total Variation
Definition: Measures overall variability of a set of variables.
Defined as the trace of the variance-covariance matrix:
Estimation:
Example of Total Variation
Application on USDA women’s health survey nutrient intake data using R software.
Weakness of Total Variation
Example of Simulated Data through pairs of variables showcasing different correlation levels:
Correlations:
// Non-related
// Moderate correlation
// High correlation
Observation: Total variation equals 2 regardless of correlation.
Determinant of a Matrix
Definition: A measure rewriting for dispersion.
For a 2x2 Matrix:
A = egin{bmatrix} a & b \ c & d \ ext{ } ext{ } ext{ } ext{ } ext{ }
Determinant:
For general p × p matrix:
Generalized Variance
Definition: An overall dispersion measure in multivariate data.
Defined as the determinant of the variance-covariance matrix:
Sample Generalized Variance = det(S)
Example of Generalized Variance
Find the generalized sample variance for the Women's Health Survey data.