Estimating Point Estimators

Overview of Estimation Methods

  • Two main methods for finding point estimators:
    • Method of Moments (MoM)
    • Maximum Likelihood Estimation (MLE)
  • These methods are generalizable to any parametric model.
  • Focus on random samples from one- or two-parameter distributions with closed-form solutions.

Method of Moments (MoM)

  • Equate sample moments with theoretical moments.
  • Involves solving a system of equations where the number of equations equals the number of parameters.
  • Theoretical moments (population moments) are denoted as 𝜇k and sample moments as mk.
  • E[X^k] = \begin{cases} \sum{x} x^k f(x) & discrete \ \int{-\infty}^{\infty} x^k f(x) dx & continuous \end{cases}
  • Mk = E[X^k] = \frac{1}{n} \sum{i=1}^{n} X_i^k

Calculation Methods

  • Informally

    1. Equate the first sample moment about the origin to the first theoretical moment.
    2. Equate the second sample moment about the origin to the second theoretical moment.
    3. Repeat for all parameters.
    4. Solve the system of equations for the parameters.
  • Formally

    • Given X1, X2, …, Xn \sim f(x; \theta), where \theta = (\theta1, …, \thetat) has dimension t; the t moment estimators of \theta are the unique solutions \hat{\theta1}, …, \hat{\theta_t} satisfying the system of equations:
      • M_1 = E[X^1]
      • M_2 = E[X^2]
      • \vdots
      • M_t = E[X^t]
      • m1 = \mu1
      • m2 = \mu2
      • \vdots
      • mt = \mut

Examples of Method of Moments Estimator

  • Poisson:

    • f(x, \lambda) = \frac{\lambda^x e^{-\lambda}}{x!}
    • E[X] = \lambda
    • \hat{\lambda}_{MM} = \bar{X}
  • Bernoulli:

    • f(x, p) = p^x (1-p)^{1-x}
    • E[X] = p
    • \hat{p}_{MM} = \bar{X}
  • Uniform (0, θ):

    • f(x, \theta) = \frac{1}{\theta - 0}
    • E[X] = \frac{1}{2}(0 + \theta)
    • \hat{\theta}_{MM} = 2\bar{X}
  • Normal (\mu, \sigma^2):

    • E[X] = \mu
    • E[X^2] = \mu^2 + \sigma^2
    • \hat{\mu}_{MM} = \bar{X}
    • \hat{\sigma}^2{MM} = \frac{1}{n} \sum{i=1}^{n} (X_i - \bar{X})^2
  • Binomial:

    • f(x; n, p) = \binom{n}{x} p^x (1-p)^{n-x}
    • E[X] = np
    • E[X^2] = n(n-1)p^2 + np
    • \hat{n}{MM} = \frac{\bar{X}^2}{\bar{X} - (\frac{1}{n} \sum{i=1}^{n} X_i^2 - \bar{X}^2)}
    • \hat{p}{MM} = 1 - \frac{(\frac{1}{n} \sum{i=1}^{n} X_i^2 - \bar{X}^2)}{\bar{X}}

Consistency of Moments

  • If Y1, …, Yn are i.i.d. and E[|Y^k|] < \infty, then mk \rightarrow \muk.
  • MoM estimators are often consistent.
    • Consistent Estimators Examples
      • Uniform(0, θ): \hat{\theta} = 2\bar{Y} \rightarrow \theta
      • Poisson(λ): \hat{\lambda} = \bar{Y} \rightarrow \lambda
      • Normal(\mu, \sigma^2): \hat{\mu} = \bar{Y} \rightarrow \mu and \hat{\sigma}^2 = \frac{1}{n} \sum (Y_i - \bar{Y})^2 \rightarrow \sigma^2
      • Bernoulli(p): \hat{p} = \bar{Y} \rightarrow p
  • If a moment estimator \hat{\theta} is a continuous function g(m1, …, mt) and g(\mu1, …, \mut) = \theta, then \hat{\theta} is a consistent estimator.

Maximum Likelihood Estimation (MLE)

  • Find the maximum of the likelihood function to get the most likely parameter values.
  • The value(s) of \theta that maximize \mathcal{L}(\theta; y) = f(y; \theta).
  • The MLE of θ is \hat{\theta} = \operatorname{argmax}_{\theta \in \Theta} \mathcal{L}(\theta; y)

Optimization Refresher

  • For a twice-differentiable function g(x), any optimal point x^* satisfies:
    • \frac{d}{dx} g(x) |_{x=x^*} = 0
    • x^ is a local maximum if: \frac{d^2}{dx^2} g(x) |_{x=x^} < 0
    • x^ is a local minimum if: \frac{d^2}{dx^2} g(x) |_{x=x^} > 0
  • If g(x) > 0 for all x on an interval (a, b), then g(x) achieves a local minimum (or maximum) at x^ if and only if log(g(x)) achieves a local minimum (or maximum) at x^.

Remarks on Maximum Likelihood

  • In the context of i.i.d. univariate models, Y1, …, Yn \sim f(y; \theta) i.i.d.

Examples of Finding the MLE

  • Bernoulli:

    • f(Yi; p) = p^{Yi} (1-p)^{1-Y_i}
    • \mathcal{L}(p; Y1, …, Yn) = \prod{i=1}^{n} p^{Yi} (1-p)^{1-Yi} = p^{\sum{i=1}^{n} Yi} (1-p)^{n - \sum{i=1}^{n} Y_i}
    • \ell(p; Y1, …, Yn) = \sum{i=1}^{n} Yi \log(p) + (n - \sum{i=1}^{n} Yi) \log(1-p)
    • \frac{\partial}{\partial p} \ell(p; Y1, …, Yn) = \frac{\sum{i=1}^{n} Yi}{p} - \frac{n - \sum{i=1}^{n} Yi}{1-p} = 0
    • \hat{p}{ML} = \frac{\sum{i=1}^{n} Y_i}{n} = \bar{X}
  • Gaussian:

    • f(Yi; \mu, \sigma^2) = \frac{1}{\sqrt{2 \pi \sigma^2}} \exp{\frac{-(Yi - \mu)^2}{2 \sigma^2}}
    • \mathcal{L}(\mu, \sigma^2; Y1, …, Yn) = \prod{i=1}^{n} \frac{1}{\sqrt{2 \pi \sigma^2}} \exp{\frac{-(Yi - \mu)^2}{2 \sigma^2}} = \sigma^{-n} (2 \pi)^{-\frac{n}{2}} \exp{-\frac{\sum{i=1}^{n} (Yi - \mu)^2}{2 \sigma^2}}
    • \ell(\mu, \sigma^2; Y1, …, Yn) = -\frac{n}{2} \log(\sigma^2) - \frac{n}{2} \log(2 \pi) - \frac{\sum{i=1}^{n} (Yi - \mu)^2}{2 \sigma^2}
    • \frac{\partial}{\partial \mu} \ell(\mu, \sigma^2; Y1, …, Yn) = \frac{\sum{i=1}^{n} (Yi - \mu)}{\sigma^2} = 0
    • \hat{\mu}{ML} = \frac{\sum{i=1}^{n} Y_i}{n} = \bar{Y}
    • \frac{\partial}{\partial \sigma^2} \ell(\mu, \sigma^2; Y1, …, Yn) = -\frac{n}{2 \sigma^2} + \frac{\sum{i=1}^{n} (Yi - \mu)^2}{2 (\sigma^2)^2} = 0
    • \hat{\sigma}^2{ML} = \frac{\sum{i=1}^{n} (Y_i - \hat{\mu})^2}{n}

Properties of MLEs

  • MLEs have many favorable limiting properties:
    • Consistency
    • Efficiency
    • Invariance (or “functional equivariance”)
    • Asymptotic normality (under regularity conditions)

Invariance

  • If \hat{\theta} is the MLE for \theta, and if g(\theta) is any transformation of \theta, then the MLE for \alpha = g(\theta) is \hat{\alpha} = g(\hat{\theta}).
  • Another way to put this: If \hat{\theta} is MLE of \theta, and t is any function with a twice-differentiable inverse on \Theta, then t(\hat{\theta}) is the MLE of t(\theta).

Limiting Distribution of MLE

  • Under several regularity conditions:

    • \sqrt{n}(\hat{\theta} - \theta) \rightarrow N(0, I(\theta)^{-1}), where I(\theta) is information.
    • I(\theta) = E[-\frac{d^2}{d \theta^2} \ell(\theta; Y)]

MLE Terms and Conditions

  • Let Y1, Y2, … \sim f(y; \theta^*), and assume:

    • (A0) f is "identifiable": if \theta \neq \theta', then for at least one y in the support set f(y; \theta) \neq f(y; \theta')
    • (A1) f “has common support”: \lbrace y: f(y; \theta) > 0 \rbrace = \lbrace y; f(y; \theta') > 0 \rbrace for any \theta, \theta'
    • (A2) the parameter space \Theta contains an open set \omega and \theta^* is an interior point of \omega
    • (A3) f is differentiable on \omega
  • Under (A0) – (A3), the likelihood equation

    • \frac{d}{d \theta} \ell(\theta; \mathcal{Y}) = 0
    • Has a root (or solution) \hat{\theta}n such that \hat{\theta}n \rightarrow \theta^*
  • Corollary: If \hat{\theta}n is unique and \Theta is an open interval, then with probability tending to 1, \hat{\theta}n is the MLE.

Asymptotic Normality of MLE

  • Let Y1, Y2, … \sim f(y; \theta^*), and now assume (A0) – (A1) hold along with:

    • (A2’) the parameter space \Theta is an open interval;
    • (A3) the Fisher information is positive and finite; and a more complicated rule related to f being thrice differentiable: Any consistent root of the likelihood satisfies
    • \sqrt{n} (\hat{\theta}_{MLE} - \theta) \rightarrow N(0, \frac{1}{I(\theta)})
    • \hat{\theta}_{MLE} \rightarrow \theta

Recap

  • Method of moments derives point estimators by equating sample moments and population moments.
  • Maximum likelihood estimation provides a framework for deriving point estimators by optimizing the likelihood of data.
  • MoM estimates are usually consistent.
  • MLEs are invariant for transformations of parameters and asymptotically normal.