E300 Key Definitions and Formulae

0.0(0)
studied byStudied by 0 people
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
Card Sorting

1/83

encourage image

There's no tags or description

Looks like no tags are added yet.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

84 Terms

1
New cards

FGLS process (multiplicative heteroskedasticity)

See slides + notes

2
New cards

Cochrane Orcutt + Prais Winsten FGLS (serial error correlation)

  1. Run OLS on Y = Xbeta + epsilon to obtain residuals

  2. Estimate an AR coefficient by regressing residuals at t on residuals at t-1

  3. Perform the pictured GLS transformation with your estimated p

  4. Run OLS on the transformed data

This is Prais-Winsten estimation, if we ignore the first observation which needs a different transformation, we run the Cochrane-Orcutt procedure.

<ol><li><p>Run OLS on Y = Xbeta + epsilon to obtain residuals</p></li><li><p>Estimate an AR coefficient by regressing residuals at t on residuals at t-1</p></li><li><p>Perform the pictured GLS transformation with your estimated p</p></li><li><p>Run OLS on the transformed data</p></li></ol><p>This is Prais-Winsten estimation, if we ignore the first observation which needs a different transformation, we run the Cochrane-Orcutt procedure.</p><p></p>
3
New cards

Formula for F statistic

knowt flashcard image
4
New cards

Newey-West variance estimator

Noisy for large j, but good for small j

<img src="https://knowt-user-attachments.s3.amazonaws.com/9d264c57-508e-466d-bb53-fe2ed77c46a4.png" data-width="100%" data-align="center"><p>Noisy for large j, but good for small j</p>
5
New cards

GLS

With heteroskedasticity, the conditional error variance will be some function of X. To remove it, denote P as the ‘square root’ of that funciton, then transform each variable by dividing it by P. Running the transformed regression will give a conditional error variance of 1.

6
New cards

Stationarity of VAR(p)

From the pictured VAR(p) model, stationarity is ensured when all roots of the lag polynomial expression (below) are greater than one in absolute value.

<p>From the pictured VAR(p) model, stationarity is ensured when all roots of the lag polynomial expression (below) are greater than one in absolute value.</p>
7
New cards

Vector MA(inf) and impulse response functions

Like in AR(1), VAR(1) can be represented in vector MA(inf) form, pictured. The top-left mxm block of F^s is the impulse response function.

The i-th row and j-th column of this mxm block can be interpreted as the effect of a one unit increase in the j-th component of ‘s-th’ lag of the errors on the i-th component of y at t. The response is dynamic, hence different effects on y over time.

The impulse response function can also describe the plot over time of the response of the i-th component of y at t+s to a one time impulse in the j-th component of the error term at t.

<p>Like in AR(1), VAR(1) can be represented in vector MA(inf) form, pictured. The top-left mxm block of F^s is the impulse response function. </p><p>The i-th row and j-th column of this mxm block can be interpreted as the effect of a one unit increase in the j-th component of ‘s-th’ lag of the errors on the i-th component of y at t. The response is dynamic, hence different effects on y over time. </p><p>The impulse response function can also describe the plot over time of the response of the i-th component of y at t+s to a one time impulse in the j-th component of the error term at t.</p>
8
New cards

Strict and weak stationarity

Strict stationarity: (yt and yt+s) have the same joint distribution for all t and s

Weak stationarity: Mean and variance are time invariant, and the covariance function depends only on the lag (not on the time)

9
New cards

Calculating variance, autocovariance and autocorrelation for an AR(1) process (finite and infinite time)

knowt flashcard image

Represent your stable AR(1) as MA(inf) (Wold representation), then use this to find variance, autocovariance and autocorrelation. In the finite past, this can still be stationary so long as we assume that the initial condition (Y0) is distributed with 0 mean and the same variance as pictured for the infinite past.

10
New cards

Calculating variance, autocovariance and autocorrelation for an MA(q) process

knowt flashcard image

Autocovariance just becomes the pairs of coefficients that have overlap (i.e. if the lag is 1 and there it is MA(3), there will be overlap in the t-1, t-2, and t-3 terms.

11
New cards

Invertibility and the difference between it and stationarity

Stationarity means that the MA interpretation of AR converges, so you can write your AR process Y as MA (in terms of the movements of its errors). This is true if the roots of your lag polynomial are all greater than one in absolute value.

Invertibility means that your MA model can be inverted so that your errors are written in terms of the current and past values of Y. Again, this ensures that a unique model can be fit to the data. It also requires parameters to be inside the unit circle, but for the inverse roots of the MA polynomial.

12
New cards

Summing up AR / MA / ARMA processes

AR(1) + AR(1) = ARMA(2,1)

AR(1) + White noise process = ARMA(1,1)

MA(1) + MA(1) = MA(1)

Where all processes on the left side are uncorrelated at all lags and leads. This is useful because it allows aggregation of individual AR(1) processes

13
New cards

Optimal forecasting

Minimises mean squared forecast error, proof shown

knowt flashcard image

14
New cards

OLS consistency in FDL models (but not autoregression)

  • No multicollinearity

  • Strict exogeneity conditional expectation of the error term given all past, present and future values of X is 0

15
New cards

Why does strict exogeneity fail for AR processes

knowt flashcard image

16
New cards

Calculating autocorrelation functions, variance, and autocovariance for random walks, with and without drift

knowt flashcard image

17
New cards

Unit roots, difference and trend stationarity

Unit root: Lag polynomial has one root on with an absolute value of 1 (on the unit circle)

Difference stationary: Becomes stationary after first differencing (i.e. the difference between a variable and its first lag is a stationary process)

Trend stationary: Becomes stationary after removing a deterministic time trend.

18
New cards

Dickey Fuller and ADF tests (including consequences of under and overspecification

Tests for non-stationarity in the process. Different Dickey-Fuller tests shown. Choose accordingly. Estimate parameters via OLS then use t-test against relevant DF critical values to see this (if significant then reject the null of a unit root)

Use the ADF test if you believe there is autocorrelation in the errors of your OLS estimation. Add lags of the differenced term to the regression. Critical values are the same as before.

<p>Tests for non-stationarity in the process. Different Dickey-Fuller tests shown. Choose accordingly. Estimate parameters via OLS then use t-test against relevant DF critical values to see this (if significant then reject the null of a unit root)</p><p>Use the ADF test if you believe there is autocorrelation in the errors of your OLS estimation. Add lags of the differenced term to the regression. Critical values are the same as before.</p>
19
New cards

Cointegration and Error Correction Mechanisms

Cointegration: the property of a collection of I(1) processes that yield an I(0) linear combination. The coefficients on this linear combination are the cointegration vector.

Error Correction Mechanisms: Show short term adjustment dynamics of cointegrated variables, ensuring that deviations from their long-run equilibrium are corrected over time. Often interpreted via economic theory (see t-bill example in slides, great ratio example in problem set)

20
New cards

Engel Granger procedure for testing for cointegration

Regress on variable on the variable that you believe it is cointegrated with, then compute an ADF statistic on the residuals of this regression. In this case you have to use specia Engle Granger critical values. In particular, reject the null of no cointegration if t is less than the pictured crit values.

Additionally, this can be used to estimate the coefficients in the cointegrating vector, which is useful when used in error correction models.

<p>Regress on variable on the variable that you believe it is cointegrated with, then compute an ADF statistic on the residuals of this regression. In this case you have to use specia Engle Granger critical values. In particular, reject the null of no cointegration if t is less than the pictured crit values.</p><p>Additionally, this can be used to estimate the coefficients in the cointegrating vector, which is useful when used in error correction models.</p>
21
New cards

Dynamic models in differences and Error Correction Models

Different models for short run dynamics. Note that the inclusion of the Error Correction term (which is warranted when the variables are cointegrated because both sides of the equation would still be stationary) allows for a better capture of how short run deviations revert back to long run EQ.

<p>Different models for short run dynamics. Note that the inclusion of the Error Correction term (which is warranted when the variables are cointegrated because both sides of the equation would still be stationary) allows for a better capture of how short run deviations revert back to long run EQ.</p>
22
New cards

Working out impulse response from a VAR(p) process

Convert to VAR(1) → convert to MA(inf) as shown → interpret the upper left m x m block of your coefficient matrix to get the impulse response function, remembering the exponent to get the requisite number of lags.

23
New cards

How to test for whether an estimator is consistent

If the asymptotic variance of the estimator approaches 0 as the sample size tends to infinity, the estimator is consistent.

24
New cards

ADF test

Run the regression as shown before (see PS6 answers for funky way of dealing with extra lags even though you likely won’t need to do this extra maths) then compare to DF critical values, ensuring you are using the right critical values (i.e. with / without intercept / trend). When doing this in the exam, test it over both 5% and 1% significance levels.

25
New cards

Changing models into their error correction form

Transform into error correction form by isolating a cointegration relationship, and then first differencing / manipulating other terms to ensure that every term on either side of the equation is stationary.

26
New cards

Deriving a likelihood / log likelihood function

The product of the value of the joint density at every observation of x and y with a given parameter. Log likelihood is just the log of this (so sum of the log joint density functions at every observation)

27
New cards

Formula for joint density f(y(i),x(i); theta)

Conditional density f(y(i); theta | x(i))* marginal density f(x(i); theta) (extension - put this in vector / matrix form)

28
New cards

Impact of marginal density on MLE

Assume that the marginal density doesn’t depend on the parameter of interest, so it can be ignored for likelihood maximisation.

If the marginal density does depend on the parameter, you can still estimate the parameter with MLE, but the conditional ML estimator would obviously be less efficient than the unconditional one that utilises information about the parameter from the marginal distribution of x.

29
New cards

Gaussian pdf

knowt flashcard image

30
New cards

Process for MLE estimate for a standard regression model with normal errors (remember two parameters of interest)

Identify conditional density of Y|X (normal since errors are normal), then derive the log likelihood function. Maximise log likelihood with respect to beta first, which should give the same estimator as OLS (because you are essentially minimising the residual sum of squares). Then take the FOC wrt the variance term, noting that the ML estimator for the variance is biased.

31
New cards

Finding a conditional density function for the dependent variable in time series data (not i.i.d.)

Use repeated factorisation, and remember the unconditional distribution of the initial value (normal with 0 mean and same variance as in AR(1) covered in time series, but needs to have its likelihood derived as well.

<p>Use repeated factorisation, and remember the unconditional distribution of the initial value (normal with 0 mean and same variance as in AR(1) covered in time series, but needs to have its likelihood derived as well.</p>
32
New cards

Process to show consistency of the ML estimator

Derive the likelihood function, noting that your result will be the same if you maximise your conditional likelihood function - the likelihood function of the true parameter value. Divide both sides by n and use LLN to get the expectation of your two likelihood functions over the true density. Then use Jensen’s inequality to show that your expression of the two expected log likelihoods must be less than or equal to 0 (pictured), which shows that the maximum must be achieved when your estimated parameter equals the true parameter.

<p>Derive the likelihood function, noting that your result will be the same if you maximise your conditional likelihood function - the likelihood function of the true parameter value. Divide both sides by n and use LLN to get the expectation of your two likelihood functions over the true density. Then use Jensen’s inequality to show that your expression of the two expected log likelihoods must be less than or equal to 0 (pictured), which shows that the maximum must be achieved when your estimated parameter equals the true parameter. </p>
33
New cards

Score function + individual score

Score function: partial derivative of the log likelihood function

Individual score: partial derivative of the log density

34
New cards

Proof that the expected value of the score function at the true parameter is 0

35
New cards

Fisher information

knowt flashcard image

Variance of the score function at the true parameter, formula pictured.

36
New cards

Derivation of Fisher information and information equality.

37
New cards

Asymptotic distribution of the score function

38
New cards

Use of Taylor approximation of the score function evaluated at the ML estimate to show that the ML estimator is asymptotically normal + asymptotic distribution of MLE

39
New cards

Cramer-Rao Lower Bound

The lowest possible variance for an unbiased estimator, equal to the reciprocal of the fisher information over the whole sample. Since this is the asymptotic variance of the ML estimator, it shows that MLE is asymptotically efficient (since it is asymptotically unbiased and achieves the C-R lower bound)

40
New cards

Likelihood ratio test formula + procedure (all tests have m restrictions)

Likelihood ratio lambda is the ratio of maximum likelihood under the restriction to maximum likelihood without the restriction.

The LR statistic is -2*log(lambda) and has chi squared distribution with m degrees of freedom.

<p>Likelihood ratio lambda is the ratio of maximum likelihood under the restriction to maximum likelihood without the restriction.</p><p>The LR statistic is -2*log(lambda) and has chi squared distribution with m degrees of freedom.</p>
41
New cards

Wald test formula + procedure

Uses the absolute value of the second derivative of the log likelihood function and has a chi squared distribution with 1 degree of freedom. Proof uses a second order Taylor approximation for the restricted log likelihood around the ML estimate.

<p>Uses the absolute value of the second derivative of the log likelihood function and has a chi squared distribution with 1 degree of freedom. Proof uses a second order Taylor approximation for the restricted log likelihood around the ML estimate.</p>
42
New cards

LM test formula + procedure

Remember again that this is the absolute value, not brackets

<p>Remember again that this is the absolute value, not brackets</p>
43
New cards

Which is the best likelihood-based test

Tests are asymptotically equivalent but may give different results in finite samples

LR test requires both restricted and unrestricted ML estimates, so may be more computationally demanding

W and LM tests are susceptible to reparameterisation of the model, whereas the LR test is invariant to this.

Order of likelihood to reject hypotheses: W → LR → LM (most likely to reject under the Wald test.

44
New cards

Probit Derivation (Latent Variable model)

knowt flashcard image

45
New cards

Probit Partial Effects

knowt flashcard image

(Remember it is different for discrete variables, just find the difference between the probit function with the discrete variable and without)

46
New cards

Delta Method

Derives the asymptotic distribution of an RV when the RV is a differentiable function of an asymptotically Gaussian RV where sqrt(n)[X-/theta] has a distributional limit of N(0,/sigma²)

<p>Derives the asymptotic distribution of an RV when the RV is a differentiable function of an asymptotically Gaussian RV where sqrt(n)[X-/theta] has a distributional limit of N(0,/sigma²)</p>
47
New cards

Bernoulli PDF

knowt flashcard image

48
New cards

Conditional likelihood function for Probit

Applies because Y|X is a Bernoulli random variable with y either 0 or 1. The total likelihood function is this * marginal density of X, but can be ignored so long as the marginal density doesn’t depend on the parameters of interest. Even if it does, it may be better to limit attention to the conditional likelihood since misspecification of the marginal density may result in inconsistent MLE

<p>Applies because Y|X is a Bernoulli random variable with y either 0 or 1. The total likelihood function is this * marginal density of X, but can be ignored so long as the marginal density doesn’t depend on the parameters of interest. Even if it does, it may be better to limit attention to the conditional likelihood since misspecification of the marginal density may result in inconsistent MLE </p>
49
New cards

Asymptotic distribution of a differentiable nonlinear function of a parameter estimated by MLE

Works because we know that beta is asymptotically normal from MLE

<p>Works because we know that beta is asymptotically normal from MLE</p>
50
New cards

Stationarity of VAR(p) as VAR(1)

When writing VAR(p) as VAR(1), (as shown), stationarity can be shown when the eigenvalues of F are less than one in absolute value (see picture for reminder of how to calculate eigenvalues)

<p>When writing VAR(p) as VAR(1), (as shown), stationarity can be shown when the eigenvalues of F are less than one in absolute value (see picture for reminder of how to calculate eigenvalues)</p>
51
New cards

Derivation of univariate IV estimator

52
New cards

Wald estimator (with binary instrument)

53
New cards

Why might the IV estimator be biased in small samples (even though it is consistent)

If the instrument is weak, the denominator may become very small or even 0, hence the expected value of your estimated coefficient may not even exist for some small sampels

54
New cards

How to show asymptotic normality of the IV estimator

55
New cards

Calculating the variance of the IV estimator

knowt flashcard image

(note that the variance may be far larger than OLS if the correlation between z and x is very small)

56
New cards

How do weak instruments jeopardise consistency of the IV estimator?

57
New cards

How do weak instruments exacerbate the effects of instrument endogeneity

(Note that if this is pronounced, enough, the IV estimator may have greater inconsistency than the endogenous OLS estimate)

58
New cards

Derive the IV estimator with additional exogenous explanatory variables

59
New cards

Derivation of the 2SLS estimator

60
New cards

Show consistency of the 2SLS estimator and asymptotic normality

<img src="https://knowt-user-attachments.s3.amazonaws.com/7831e5a2-568e-49d9-a7aa-99cf74972b64.png" data-width="100%" data-align="center"><p></p>
61
New cards

What is the variance estimate for the 2SLS estimator?

62
New cards

Testing for endogeneity

Regress your suspected endogenous variable on your set of instruments, then regress y on all explanatory variables and the residuals from your initial regression (coefficients should be unaffected here by FWL theorem). If the coefficient on the residuals is significant (>0) then the variable is endogenous. If not, it is exogenous and 2SLS is unnecessary and costly.

63
New cards

Rule of thumb value for finding weak instruments

If the F stat for coefficients in the first stage of the 2SLS regression is less than 10, then you have weak instruments and should either find new ones or remove the weakest ones to see if this strengthens your F stat.

64
New cards

Anderson Rubin Test for linear hypotheses when you have weak (but exogenous) instruments

knowt flashcard image

65
New cards

Hausman test (overidentifying restrictions test) for instrument endogeneity

(Example of 2 instruments for 1 endogenous variable)

Do 2SLS twice: once with one of the instruments, once with the other. If the difference between the estimates is large, then one or both of the instruments is endogenous (note that if the inconsistency from both instruments is similar, then the Hausman test won’t detect it)

66
New cards

Alternative test for instrument endogeneity

Use 2SLS to obtain estimated residuals (note these are the residuals from your fitted regression, not your second stage regression). Then regress the residuals on all exogenous regressors and all instruments. Obtain the R², where nR² is asymptotically chi-squared distributed with the number of overidentifying restrictions as its degrees of freedom. If nR² exceeds the 95th quantile of this, then some IVs are endogenous.

67
New cards

Asymptotic distribution of the 2SLS estimator with heteroskedasticity

knowt flashcard image

68
New cards

Conditions for asymptotic inference to hold for 2SLS with time series data

69
New cards

Conditions for finite sample inference to hold with any data

Must satisfy the Gauss-Markov assumptions

70
New cards

How do Method of Moments estimators work

Estimate k parameters using given expressions for the k moments of your random variable (therefore getting k expressions for k parameters) and then replacing these moments with the sample moments to get the estimate.

71
New cards

Formal definition of the GMM estimator

Uses a weighting matrix to combine multiple moment conditions (esimated via sample moments) into a quadratic form. This weighting matrix is chosen optimally so as to minimise the asymptotic variance of the estimator.

72
New cards

What is the optimal weighting matrix for GMM

It is the reciprocal of the variance expression (which you will likely have to estimate, but may be given)

73
New cards

Show how optimal GMM under homoskedasticity with an endogenous variable is 2SLS

74
New cards

For any vector of regressors A, when is the Pa*X1 = X1? (Pa is the projection matrix of A)

This is true when A contains a column that is linearly dependent with X1 (i.e. X1, or some form of linear transformation of X1, is a column of A)

75
New cards

Meaning of an iid sample of (y,x)

Each pair of (yi, xi) is independent with identical distribution. DOES NOT MEAN that y is independent of x, so you cannot use this to assume conditional / unconditional independence / 0 covariance between errors and regressors.

76
New cards

Points to evaluate an ADF specification

  • Even if the ADF specification includes additional lags, it assumes that shocks are all serially uncorrelated which may be very demanding

  • It may not include sufficient consideration for any potential deterministic breaks which you might be able to see in the data.

77
New cards

Tests for serial correlation / heteroskedasticity

  • Heteroskedasticity:

    • Breusch-Pagan: Regress squared residuals linearly on all explanatory variables and compare F statistic to critical values F(k,n-k-1) dist the asymptotic transformation to chi squared (almost identical for large n)

    • White: Regress squared residuals on continually increasing powers of predicted y (which is just a function of explanatory variables) then compute F statistic and compare against critical values

  • Serial Correlation:

    • Breusch-Godfrey: Compute LM statistic (T-q)*R² using R² from regression of residuals on explanatory variables and lagged residuals (q restrictions). Use chi-squared critical values with q d.f.

    • F-test from regression of residuals on explanatory variables and lagged residuals (Testing hypothesis that the coefficients on the residual lags are all 0)

78
New cards

What to do if there is heteroskedasticity / serial correlation in errors

  • Find a better model

  • Use heteroskedasticity robust / HAC standard errors

  • Use GLS / FGLS (can do it in different ways if needs be)

79
New cards

F Statistic - Both formulas

Also recall derivation for the first formula

80
New cards

F and t distributions

Use t when there is only one restrictions. It is the ratio of a standard normal RV (i.e. your hypothesis) on top of the square root of a chi-squared RV (i.e. the variance estimator) normalised by its degrees of freedom.

81
New cards

Difference in handling endogeneity between the causal and ‘best linear predictor’ interpretations of OLS

  • The causal interpretation sees OLS as estimating the causal mechanism between x and y, making the error term ‘all other factors’. Endogeneity is thus likely and will greatly harm the accuracy of your estimated causal effect

  • The BLP interpretation gives the OLS estimate as the just the best way to fit x to y in a linear fashion. The error term is thus just the errors of the observed y given x, and is not a function of all other factors like in causal interpretation. Therefore endogeneity is not a problem, even though the coefficient would be biased for causal interpretation. The assumption E(xe) = 0 is essentially free.

82
New cards

Dynamic completeness as a sufficient condition for the no serial correlation assumption in assyptotic time series OLS

As opposed to assuming no serial correlation, which implies that the model explains all of y’s dynamics

83
New cards

Source of the spurious regression problem

84
New cards