Notes on Unbiased Estimators, Variability, Z-scores, and Inference

Unbiased Estimators, Variability, and Inference

  • Key population parameters (μ, σ^2, σ)

    • μ (mu): population mean
    • σ^2 (sigma squared): population variance
    • σ (sigma): population standard deviation
    • The goal of statistics is to learn about these population parameters from samples.
  • Unbiased estimators and what they estimate

    • Unbiased estimator: an estimator whose expected value equals the population parameter it estimates.
    • Sample mean as an estimator of the population mean
    • If we take many samples (e.g., all possible samples of size n from a population), the means of those samples vary, but the mean of all those sample means equals the population mean:
    • E[Xˉ]=μ\mathbb{E}[\bar{X}] = \mu
    • The population mean is what we’re trying to estimate.
    • Sample variance as an estimator of the population variance
    • The usual unbiased estimator for the population variance uses the divisor (n − 1):
    • s2=1n1<em>i=1n(X</em>iXˉ)2s^2 = \frac{1}{n-1} \sum<em>{i=1}^n (X</em>i - \bar{X})^2
    • E[s2]=σ2\mathbb{E}[s^2] = \sigma^2
    • Standard deviation as an estimator
    • The standard deviation is the square root of the variance estimator: s=s2s = \sqrt{s^2}
    • In the lecture, it is stated that the (sample) standard deviation is an unbiased estimator of the population standard deviation. In standard theory, E[s] is not equal to σ in general; this is a common point of confusion and depends on definitions and sampling details. The important take-away for inference is that s^2 is unbiased for σ^2, and s is the natural scale for dispersion when working with data in the same units as the data.
    • Mean Absolute Deviation (MAD)
    • MAD = (1/n) \sum{i=1}^n |Xi - \bar{X}|.
    • The MAD is not an unbiased estimator of the population variance or standard deviation; it is often used as a descriptive measure of dispersion, not for inferential purposes.
  • Why we use samples and the role of the sampling distribution

    • Sampling enables inference about the population when full data are unavailable or impractical.
    • The intuitive idea: the mean of the sample means equals the population mean; the spread of sample means relates to population variance (conceptually leading to the idea of inferential statistics).
    • Inference relies on distributional properties of estimators (e.g., sample mean’s distribution) to quantify uncertainty about population parameters.
  • Variability and dispersion measures

    • Variance
    • Population variance: Var(X)=σ2=E[(Xμ)2]\operatorname{Var}(X) = \sigma^2 = \mathbb{E}[(X-\mu)^2]
    • In a sample, the variance is estimated by s^2 as above.
    • Standard deviation
    • Population standard deviation: σ=Var(X)\sigma = \sqrt{\operatorname{Var}(X)}
    • Sample standard deviation: s=s2s = \sqrt{s^2}
    • Relationship to data description
    • The variance/standard deviation quantify how much data vary around the center.
    • MAD is another dispersion measure but not used for inferential estimation of population parameters.
    • Example context from the lecture
    • Data set described in the lecture led to reported numbers likeMAD ≈ 25.5 and a standard deviation around 25.7 (illustrative values from the session).
    • A data point of 74 with a mean around 50 gives a deviation of 24, which is about 0.96 standard deviations if the SD is ~25.0–26.0.
  • Z-scores and relative location

    • Z-score definition (relative location in the data set)
    • Population form:
    • z<em>i=X</em>iμσz<em>i = \frac{X</em>i - \mu}{\sigma}
    • Sample form (relative to the sample):
    • z<em>i=X</em>iXˉsz<em>i = \frac{X</em>i - \bar{X}}{s}
    • Interpretations
    • A z-score tells how many standard deviations an observation is away from the mean.
    • Sign indicates direction (positive above the mean, negative below).
    • Example from the lecture
    • With mean ≈ 50 and SD ≈ 25.7, the value 74 yields z745025.70.96.z ≈ \frac{74-50}{25.7} \approx 0.96.
    • A value below the mean, e.g., 32 away would yield negative z (e.g., about -1.8 when mean is 50 and SD is ~25.7).
  • Coefficient of variation (CV)

    • Definition (for a sample):
    • CV=sXˉ\text{CV} = \frac{s}{\bar{X}}
    • Purpose
    • A dimensionless measure that compares the extent of variability relative to the mean; useful for comparing variation across data sets with different units or means.
    • Example interpretation
    • A larger CV indicates more relative variability for a given mean.
  • Chebyshev’s inequality (a general bound)

    • Statement
    • For any distribution with mean μ and standard deviation σ, for any z > 0:
    • Pr(Xμzσ)11z2\Pr(|X - \mu| \le z\sigma) \ge 1 - \frac{1}{z^2}
    • Special case with z = 2
    • Pr(Xμ2σ)114=34=0.75\Pr(|X - \mu| \le 2\sigma) \ge 1 - \frac{1}{4} = \frac{3}{4} = 0.75
    • Interpretation from the lecture
    • No matter how the data look, at least 75% of observations lie within two standard deviations of the mean.
    • Note
    • This bound applies to all distributions, but it is often loose; actual data from normal distributions adhere to tighter empirical rules (68-95-99.7).
  • Empirical rule vs. normal distribution (the bell curve)

    • Normal (bell-shaped) distribution and the empirical rule (68-95-99.7)
    • About 68% of data within ±1σ
    • About 95% within ±2σ
    • About 99.7% within ±3σ
    • The lecture’s statement
    • The lecture asserted that “98% of the data are within two standard deviations” for a normal distribution, which is a common misstatement. The correct normal-rule is about 95% within two standard deviations.
    • Practical takeaway
    • If a data set is approximately normal, most data lie within a few standard deviations of the mean; outside of that, data are increasingly rare.
  • Practical computation notes (Excel and calculators)

    • Variance and standard deviation in Excel
    • Population variance: VAR.P(data)\text{VAR.P}(\text{data}) or VarP\operatorname{Var}_{P}
    • Sample variance: VAR.S(data)\text{VAR.S}(\text{data}) or VarS\operatorname{Var}_{S}
    • Population standard deviation: STDEV.P(data)\text{STDEV.P}(\text{data})
    • Sample standard deviation: STDEV.S(data)\text{STDEV.S}(\text{data})
    • How to use (described in the lecture)
    • For a small set of numbers (e.g., four numbers), you can type =VAR.S(A1:A4) or =STDEV.S(A1:A4) to get the sample variance or standard deviation.
    • For a larger data block (e.g., 200 observations), you can select the full range (e.g., A1:A200) and use STDEV.S to get the sample SD directly.
    • Manual calculation notes (if not using Excel)
    • Compute the mean: (\bar{X})
    • Compute deviations from the mean, square them for variance, or take absolute values for MAD
    • For variance, sum of squared deviations divided by (n − 1) for a sample: (s^2 = \frac{\sum (X_i - \bar{X})^2}{n-1})
    • For standard deviation, take the square root: (s = \sqrt{s^2})
    • The process is computationally heavy by hand for large samples; software or calculators greatly speed it up.
    • A note on practice
    • The lecture emphasizes using software (Excel or calculators) to avoid tedious hand calculations, especially for large data sets.
  • Connections to broader concepts (course context)

    • Chapter 3 focus: Measures of location (mean, median, mode) and measures of variability (variance, standard deviation, coefficient of variation,MAD).
    • Why we care about estimators in inferential statistics (Chapter 7): Use sample statistics to make inferences about population parameters with quantified uncertainty.
    • The relationship between sampling error and confidence in population conclusions: smaller variance of the sampling distribution (e.g., by increasing n) leads to more precise estimates.
  • Summary of key ideas to study

    • Population vs. sample parameters: μ, σ^2, σ vs. X̄, s^2, s
    • Unbiased estimators: E[X̄] = μ and E[s^2] = σ^2 (MAD is not an unbiased estimator for these parameters)
    • Variance and standard deviation: how dispersion around the mean is quantified; SD is the natural scale of dispersion
    • Z-scores: standardize observations to compare locations on a common scale
    • Coefficient of variation: relative dispersion measure independent of unit scale
    • Chebyshev’s inequality: a universal bound on how data must spread around the mean, applicable to any distribution
    • Empirical rule (normal distribution): what to expect for data that are approximately normally distributed, and awareness of common misstatements
    • Practical computation: use spreadsheet tools to compute variance, SD, and related measures; know basic manual steps if software isn’t available
  • Quick reference formulas (LaTeX)

    • Population mean: μ\mu
    • Population variance: σ2=Var(X)=E[(Xμ)2]\sigma^2 = \operatorname{Var}(X) = \mathbb{E}[(X-\mu)^2]
    • Population standard deviation: σ=σ2\sigma = \sqrt{\sigma^2}
    • Sample mean: Xˉ=1n<em>i=1nX</em>i\bar{X} = \frac{1}{n} \sum<em>{i=1}^n X</em>i
    • Sample variance (unbiased estimator): s2=1n1<em>i=1n(X</em>iXˉ)2s^2 = \frac{1}{n-1} \sum<em>{i=1}^n (X</em>i - \bar{X})^2
    • Sample standard deviation: s=s2s = \sqrt{s^2}
    • Expected value of sample mean: E[Xˉ]=μ\mathbb{E}[\bar{X}] = \mu
    • Expected value of sample variance: E[s2]=σ2\mathbb{E}[s^2] = \sigma^2
    • Z-score (population): z<em>i=X</em>iμσz<em>i = \frac{X</em>i - \mu}{\sigma}
    • Z-score (sample): z<em>i=X</em>iXˉsz<em>i = \frac{X</em>i - \bar{X}}{s}
    • Coefficient of variation (sample): CV=sXˉ\text{CV} = \frac{s}{\bar{X}}
    • Chebyshev’s inequality: Pr(Xμzσ)11z2\Pr(|X - \mu| \le z\sigma) \ge 1 - \frac{1}{z^2}
    • Normal empirical rules: within ±1σ ≈ 68%, within ±2σ ≈ 95%, within ±3σ ≈ 99.7%