Estimating Point Estimators
Overview of Estimation Methods
- Two main methods for finding point estimators:
- Method of Moments (MoM)
- Maximum Likelihood Estimation (MLE)
- These methods are generalizable to any parametric model.
- Focus on random samples from one- or two-parameter distributions with closed-form solutions.
Method of Moments (MoM)
- Equate sample moments with theoretical moments.
- Involves solving a system of equations where the number of equations equals the number of parameters.
- Theoretical moments (population moments) are denoted as 𝜇k and sample moments as mk.
- E[X^k] = \begin{cases} \sum{x} x^k f(x) & discrete \ \int{-\infty}^{\infty} x^k f(x) dx & continuous \end{cases}
- Mk = E[X^k] = \frac{1}{n} \sum{i=1}^{n} X_i^k
Calculation Methods
Informally
- Equate the first sample moment about the origin to the first theoretical moment.
- Equate the second sample moment about the origin to the second theoretical moment.
- Repeat for all parameters.
- Solve the system of equations for the parameters.
Formally
- Given X1, X2, …, Xn \sim f(x; \theta), where \theta = (\theta1, …, \thetat) has dimension t; the t moment estimators of \theta are the unique solutions \hat{\theta1}, …, \hat{\theta_t} satisfying the system of equations:
- M_1 = E[X^1]
- M_2 = E[X^2]
- \vdots
- M_t = E[X^t]
- m1 = \mu1
- m2 = \mu2
- \vdots
- mt = \mut
- Given X1, X2, …, Xn \sim f(x; \theta), where \theta = (\theta1, …, \thetat) has dimension t; the t moment estimators of \theta are the unique solutions \hat{\theta1}, …, \hat{\theta_t} satisfying the system of equations:
Examples of Method of Moments Estimator
Poisson:
- f(x, \lambda) = \frac{\lambda^x e^{-\lambda}}{x!}
- E[X] = \lambda
- \hat{\lambda}_{MM} = \bar{X}
Bernoulli:
- f(x, p) = p^x (1-p)^{1-x}
- E[X] = p
- \hat{p}_{MM} = \bar{X}
Uniform (0, θ):
- f(x, \theta) = \frac{1}{\theta - 0}
- E[X] = \frac{1}{2}(0 + \theta)
- \hat{\theta}_{MM} = 2\bar{X}
Normal (\mu, \sigma^2):
- E[X] = \mu
- E[X^2] = \mu^2 + \sigma^2
- \hat{\mu}_{MM} = \bar{X}
- \hat{\sigma}^2{MM} = \frac{1}{n} \sum{i=1}^{n} (X_i - \bar{X})^2
Binomial:
- f(x; n, p) = \binom{n}{x} p^x (1-p)^{n-x}
- E[X] = np
- E[X^2] = n(n-1)p^2 + np
- \hat{n}{MM} = \frac{\bar{X}^2}{\bar{X} - (\frac{1}{n} \sum{i=1}^{n} X_i^2 - \bar{X}^2)}
- \hat{p}{MM} = 1 - \frac{(\frac{1}{n} \sum{i=1}^{n} X_i^2 - \bar{X}^2)}{\bar{X}}
Consistency of Moments
- If Y1, …, Yn are i.i.d. and E[|Y^k|] < \infty, then mk \rightarrow \muk.
- MoM estimators are often consistent.
- Consistent Estimators Examples
- Uniform(0, θ): \hat{\theta} = 2\bar{Y} \rightarrow \theta
- Poisson(λ): \hat{\lambda} = \bar{Y} \rightarrow \lambda
- Normal(\mu, \sigma^2): \hat{\mu} = \bar{Y} \rightarrow \mu and \hat{\sigma}^2 = \frac{1}{n} \sum (Y_i - \bar{Y})^2 \rightarrow \sigma^2
- Bernoulli(p): \hat{p} = \bar{Y} \rightarrow p
- Consistent Estimators Examples
- If a moment estimator \hat{\theta} is a continuous function g(m1, …, mt) and g(\mu1, …, \mut) = \theta, then \hat{\theta} is a consistent estimator.
Maximum Likelihood Estimation (MLE)
- Find the maximum of the likelihood function to get the most likely parameter values.
- The value(s) of \theta that maximize \mathcal{L}(\theta; y) = f(y; \theta).
- The MLE of θ is \hat{\theta} = \operatorname{argmax}_{\theta \in \Theta} \mathcal{L}(\theta; y)
Optimization Refresher
- For a twice-differentiable function g(x), any optimal point x^* satisfies:
- \frac{d}{dx} g(x) |_{x=x^*} = 0
- x^ is a local maximum if: \frac{d^2}{dx^2} g(x) |_{x=x^} < 0
- x^ is a local minimum if: \frac{d^2}{dx^2} g(x) |_{x=x^} > 0
- If g(x) > 0 for all x on an interval (a, b), then g(x) achieves a local minimum (or maximum) at x^ if and only if log(g(x)) achieves a local minimum (or maximum) at x^.
Remarks on Maximum Likelihood
- In the context of i.i.d. univariate models, Y1, …, Yn \sim f(y; \theta) i.i.d.
Examples of Finding the MLE
Bernoulli:
- f(Yi; p) = p^{Yi} (1-p)^{1-Y_i}
- \mathcal{L}(p; Y1, …, Yn) = \prod{i=1}^{n} p^{Yi} (1-p)^{1-Yi} = p^{\sum{i=1}^{n} Yi} (1-p)^{n - \sum{i=1}^{n} Y_i}
- \ell(p; Y1, …, Yn) = \sum{i=1}^{n} Yi \log(p) + (n - \sum{i=1}^{n} Yi) \log(1-p)
- \frac{\partial}{\partial p} \ell(p; Y1, …, Yn) = \frac{\sum{i=1}^{n} Yi}{p} - \frac{n - \sum{i=1}^{n} Yi}{1-p} = 0
- \hat{p}{ML} = \frac{\sum{i=1}^{n} Y_i}{n} = \bar{X}
Gaussian:
- f(Yi; \mu, \sigma^2) = \frac{1}{\sqrt{2 \pi \sigma^2}} \exp{\frac{-(Yi - \mu)^2}{2 \sigma^2}}
- \mathcal{L}(\mu, \sigma^2; Y1, …, Yn) = \prod{i=1}^{n} \frac{1}{\sqrt{2 \pi \sigma^2}} \exp{\frac{-(Yi - \mu)^2}{2 \sigma^2}} = \sigma^{-n} (2 \pi)^{-\frac{n}{2}} \exp{-\frac{\sum{i=1}^{n} (Yi - \mu)^2}{2 \sigma^2}}
- \ell(\mu, \sigma^2; Y1, …, Yn) = -\frac{n}{2} \log(\sigma^2) - \frac{n}{2} \log(2 \pi) - \frac{\sum{i=1}^{n} (Yi - \mu)^2}{2 \sigma^2}
- \frac{\partial}{\partial \mu} \ell(\mu, \sigma^2; Y1, …, Yn) = \frac{\sum{i=1}^{n} (Yi - \mu)}{\sigma^2} = 0
- \hat{\mu}{ML} = \frac{\sum{i=1}^{n} Y_i}{n} = \bar{Y}
- \frac{\partial}{\partial \sigma^2} \ell(\mu, \sigma^2; Y1, …, Yn) = -\frac{n}{2 \sigma^2} + \frac{\sum{i=1}^{n} (Yi - \mu)^2}{2 (\sigma^2)^2} = 0
- \hat{\sigma}^2{ML} = \frac{\sum{i=1}^{n} (Y_i - \hat{\mu})^2}{n}
Properties of MLEs
- MLEs have many favorable limiting properties:
- Consistency
- Efficiency
- Invariance (or “functional equivariance”)
- Asymptotic normality (under regularity conditions)
Invariance
- If \hat{\theta} is the MLE for \theta, and if g(\theta) is any transformation of \theta, then the MLE for \alpha = g(\theta) is \hat{\alpha} = g(\hat{\theta}).
- Another way to put this: If \hat{\theta} is MLE of \theta, and t is any function with a twice-differentiable inverse on \Theta, then t(\hat{\theta}) is the MLE of t(\theta).
Limiting Distribution of MLE
Under several regularity conditions:
- \sqrt{n}(\hat{\theta} - \theta) \rightarrow N(0, I(\theta)^{-1}), where I(\theta) is information.
- I(\theta) = E[-\frac{d^2}{d \theta^2} \ell(\theta; Y)]
MLE Terms and Conditions
Let Y1, Y2, … \sim f(y; \theta^*), and assume:
- (A0) f is "identifiable": if \theta \neq \theta', then for at least one y in the support set f(y; \theta) \neq f(y; \theta')
- (A1) f “has common support”: \lbrace y: f(y; \theta) > 0 \rbrace = \lbrace y; f(y; \theta') > 0 \rbrace for any \theta, \theta'
- (A2) the parameter space \Theta contains an open set \omega and \theta^* is an interior point of \omega
- (A3) f is differentiable on \omega
Under (A0) – (A3), the likelihood equation
- \frac{d}{d \theta} \ell(\theta; \mathcal{Y}) = 0
- Has a root (or solution) \hat{\theta}n such that \hat{\theta}n \rightarrow \theta^*
Corollary: If \hat{\theta}n is unique and \Theta is an open interval, then with probability tending to 1, \hat{\theta}n is the MLE.
Asymptotic Normality of MLE
Let Y1, Y2, … \sim f(y; \theta^*), and now assume (A0) – (A1) hold along with:
- (A2’) the parameter space \Theta is an open interval;
- (A3) the Fisher information is positive and finite; and a more complicated rule related to f being thrice differentiable: Any consistent root of the likelihood satisfies
- \sqrt{n} (\hat{\theta}_{MLE} - \theta) \rightarrow N(0, \frac{1}{I(\theta)})
- \hat{\theta}_{MLE} \rightarrow \theta
Recap
- Method of moments derives point estimators by equating sample moments and population moments.
- Maximum likelihood estimation provides a framework for deriving point estimators by optimizing the likelihood of data.
- MoM estimates are usually consistent.
- MLEs are invariant for transformations of parameters and asymptotically normal.