Time Series exam study guide
Here's a breakdown of stationary and covariance stationary processes, weak dependence, AR/MA models, consistency, asymptotic normality, random walks, unit roots, and I(1) processes, based on the provided text:
Stationary Process: A time series process where the probability distributions are stable over time. Formally, for any collection of time indices 1 ≤ t1 < t2 < ... < tm, the joint distribution of (xt1, xt2, ..., xtm) is the same as the joint distribution of (xt1+h, xt2+h, ..., xtm+h) for all integers h ≥ 1. This implies that xt has the same distribution as x1 for all t = 2, 3, ….
Covariance Stationary Process: A stochastic process {xt: t = 1, 2, ...} with a finite second moment [E(xt2) < ∞] is covariance stationary if:
E(xt) is constant.
Var(xt) is constant.
For any t, h ≥ 1, Cov(xt, xt+h) depends only on h and not on t.
The correlation between xt and xt+h also depends only on h.
Relationship: If a stationary process has a finite second moment, then it must be covariance stationary, but the converse is not necessarily true. "Stationarity" typically refers to strict stationarity.
Weak Dependence: Places restrictions on how strongly related the random variables xt and xt+h can be as the time distance between them (h) gets large. A stationary time series process {xt: t = 1, 2, …} is said to be weakly dependent if xt and xt+h are "almost independent" as h increases without bound. Covariance stationary sequences are weakly dependent if the correlation between xt and xt+h goes to zero "sufficiently quickly" as h → ∞. Covariance stationary sequences where Corr(xt, xt+h) → 0 as h → ∞ are said to be asymptotically uncorrelated.
MA(1) Model: A moving average process of order one [MA(1)] takes the form xt = et + α1et-1, where {et: t = 0, 1, ...} is an i.i.d. sequence with zero mean and variance σe2. Adjacent terms in the sequence are correlated: Cov(xt, xt+1) = α1Var(et) = α1σe2, and Corr(xt, xt+1) = α1/(1 + α12). However, variables in the sequence that are two or more time periods apart are uncorrelated because they are independent. An MA(1) is a stationary, weakly dependent sequence.
AR(1) Model: An autoregressive process of order one [AR(1)] takes the form yt = ρ1yt-1 + et, t = 1, 2, …, where {et: t = 1, 2, …} is an i.i.d. sequence with zero mean and variance σe2, and et is independent of y0. The crucial assumption for weak dependence of an AR(1) process is the stability condition |ρ1| < 1. When |ρ1| < 1, {yt} is a stable AR(1) process.
Consistency: An estimator is consistent if, as the sample size increases indefinitely, the estimator gets closer to the true parameter value. Formally, plim(Wn) = u, where Wn is an estimator of u.
Unbiasedness: An estimator is unbiased if its expected value equals the true parameter value.
Difference: Unbiasedness is a finite sample property, while consistency is an asymptotic property. An estimator can be unbiased but inconsistent, or consistent but biased.
Assumptions for Consistency: For OLS to be consistent, the following assumptions are needed:
TS.1': The stochastic process {(xt1, xt2, …, xtk, yt ): t = 1, 2, …, n} is stationary, weakly dependent, and follows the linear model yt = β0 + β1xt1 + β2xt2 + … + βkxtk + ut.
TS.2': No perfect collinearity.
TS.3': The explanatory variables are contemporaneously exogenous, that is, E(ut|xt1, …, xtk) = 0.
It is useful to know that the following consistency result only requires ut to have zero unconditional mean and to be uncorrelated with each xt j: E(ut) = 0, Cov(xt j , ut) = 0, j = 1, …, k.
Assumptions necessary for asymptotic normality (consistency assumptions + 2 variance assumptions)
TS.4': The errors are contemporaneously homoskedastic, that is, Var(ut|Xt) = σ2, where Xt is shorthand for (xt1, xt2, …, xtk).
TS.5': No serial correlation: For all t ≠ s, E(ut us|Xt, Xs) = 0.
Random Walk: A time series process where next period's value is obtained as this period's value, plus an independent (or at least an uncorrelated) error term. A random walk can be expressed as yt = yt-1 + et, where {et} is an i.i.d. sequence.
Unit Root Process: A highly persistent time series process where the current value equals last period's value, plus a weakly dependent disturbance. A random walk is a special case of a unit root process.
Random Walk with Drift: A unit root process that also has a constant term (drift) added in each period.
I(1) Process: A time series process that is integrated of order one. This means that the first difference of the process is weakly dependent (and often stationary).
Difference Stationary: A time series that becomes weakly dependent (and often stationary) after first differencing.
Here's an overview of Chapter 12 topics related to serial correlation and heteroskedasticity, addressing your specific questions based on the provided source material:
How (why) is variance biased with serial correlation and heteroskedasticity?
In the presence of serial correlation, OLS is no longer the Best Linear Unbiased Estimator (BLUE).
The usual OLS standard errors and test statistics are not valid, even asymptotically. The variance estimator will usually be biased when ρ ≠ 0 because it ignores the second term in (12.4).
Typically, the OLS standard errors underestimate the true uncertainty in the parameter estimates. This is especially true when errors are positively serially correlated.
How can we test for serial correlation? Alternative Durbin-Watson and Breusch-Godfrey tests
t-test for AR(1) Serial Correlation: Regress the OLS residuals on their lagged counterparts.
While derived from the AR(1) model, this test can detect other kinds of serial correlation, specifically correlation between adjacent errors.
The heteroskedasticity-robust t-statistic should be used to account for heteroskedasticity in et.
Durbin-Watson (DW) test: Based on OLS residuals, it provides a test for AR(1) serial correlation under classical assumptions.
Breusch-Godfrey test: An LM test for AR(q) serial correlation. The LM statistic also requires (12.23), but it can be made robust to heteroskedasticity.
When the regressors are strictly exogenous, a t-test can be performed on û t-4 in the regression of û t on û t-4, for all t = 5, ..., n.
When the xtj are not strictly exogenous, regression can be used with û t-4 replacing û t-1.
Correcting serial correlation: quasi-differencing, GLS, FGLS, Prais-Winsten, differencing
Quasi-Differencing & Feasible GLS (FGLS): Estimate ρ (the AR(1) correlation parameter) and use it to transform the variables, then apply OLS on the transformed equation.
This involves Cochrane-Orcutt (CO) or Prais-Winsten (PW) procedures. PW is preferred because dropping the first time period would mean losing N cross-sectional observations.
Differencing: Can eliminate serial correlation, especially with highly persistent data.
First-differencing has the advantage of turning an integrated time series process into a weakly dependent process.
Serial correlation robust standard errors
These standard errors account for serial correlation without needing to specify the form of serial correlation.
The serial correlation-robust standard errors are sometimes called heteroskedasticity and autocorrelation consistent, or HAC, standard errors.
The approach is to compute standard errors, confidence intervals, and test statistics that are valid in large cross sections under the weakest set of assumptions.
Does heteroskedasticity/ serial correlation affect biasedness?
As long as the explanatory variables are strictly exogenous, the b̂j are unbiased, regardless of the degree of serial correlation in the errors. This is analogous to the observation that heteroskedasticity in the errors does not cause bias in the b̂j.
Chapter 11 relaxed the strict exogeneity assumption to E(utuxt) 5 0 and showed that, when the data are weakly dependent, the b̂j are still consistent (although not necessarily unbiased). This result did not hinge on any assumption about serial correlation in the errors.
TS.3 rules out misspecifications such as omitted variables and certain kinds of measurement error, while TS.5 rules out serial correlation in the errors. It is important to remember that serially correlated errors cause problems that adjustments for heteroskedasticity are not able to address.
What is an ARCH model?
ARCH stands for Autoregressive Conditional Heteroskedasticity.
It models dynamic heteroskedasticity where the variance of the error term depends on past squared errors. The first order ARCH model is E(u2 t u t-1, u t-2, …) = E(u2 t u t-1) = a0 + a1u2 t-1.
Here's information about Chapter 18, based on the provided source material, covering unit root tests, forecasting, cointegration, infinite distributed lag models, vector autoregressive models, Granger causality, and forecasting AR(1) processes:
Unit Root Tests
Define a unit root process: A highly persistent time series process where the current value equals last period's value, plus a weakly dependent disturbance.
Dickey-Fuller (DF) Test: A t test of the unit root null hypothesis in an AR(1) model. A convenient equation for carrying out the unit root test is to subtract yt-1 from both sides of (18.17) and to define θ = p - 1: Δyt = α + θyt-1 + et. The null hypothesis is H0: θ = 0, and the alternative is H1: θ < 0.
How DF distribution compares to t: Under H0, yt-1 is I(1), and so the usual central limit theorem that underlies the asymptotic standard normal distribution for the t statistic does not apply: the t statistic does not have an approximate standard normal distribution even in large sample sizes. The asymptotic distribution of the t statistic under H0 has come to be known as the Dickey-Fuller distribution.
Augmented Dickey-Fuller Test: An extended version of the Dickey-Fuller test because the regression has been augmented with the lagged changes, Δyt-h. The inclusion of the lagged changes in (18.24) is intended to clean up any serial correlation in Δyt.
Forecasting
Terms:
Information Set: Contains information observed through time t-1.
Loss Function: Specifies how to measure the cost of forecast errors.
Point Forecast: The forecast obtained is usually called a point forecast.
Conditional Forecast: Forecast that relies on hypothesized values of unknown, future explanatory variables.
Unconditional Forecast: Model yt as a function of past information observed at the time the forecast is needed.
Calculation of Forecasting Confidence Intervals: A forecast interval is obtained in exactly the same way. If the model does not satisfy the classical linear model assumptions, the forecast interval is still approximately valid, provided ut given It-1 is normally distributed with zero mean and constant variance. Then:
̂en+1 = yn+1 − ̂fn [18.45]
Where ̂fn of yn+1 is usually called a point forecast
Let se( ̂fn) be the standard error of the forecast and let ̂σ be the standard error of the regression.
Cointegration
Define Cointegration: Exists if a linear combination of two I(1) variables is I(0). If yt and xt are I(1) but yt - xt is I(0), yt and xt cannot drift arbitrarily far apart.
Engle-Granger Test: Apply the Dickey-Fuller or augmented Dickey-Fuller test to the residuals, say, ̂ut = yt − ̂α − ̂βxt, from (18.31). The only difference is that the critical values account for estimation of β.
How EG test stat compares to t and DF: Testing for cointegration is more difficult when the (potential) cointegration parameter β is unknown. Rather than test for a unit root in {st}, we must first estimate β. If yt and xt are cointegrated, it turns out that the OLS estimator ̂β from the regression
yt = ̂α + ̂βxt [18.31]
is consistent for β. The problem is that the null hypothesis states that the two series are not cointegrated, which means that, under H0, we are running a spurious regression. Fortunately, it is possible to tabulate critical values even when β is estimated, where we apply the Dickey-Fuller or augmented Dickey-Fuller test to the residuals, say, ̂ut = yt − ̂α − ̂βxt, from (18.31). The only difference is that the critical values account for estimation of β. The resulting test is called the Engle-Granger test, and the asymptotic critical values are given in Table 18.4. These are taken from Davidson and MacKinnon (1993, Table 20.2).
Infinite Distributed Lag (IDL) Models
Long Run Propensity (LRP): The long-run effect, or long-run propensity (LRP), is the sum of all coefficients on current and lagged values of z. It measures the long-run change in y given a permanent change in z.
Solving Geometric Distributed Lag Models: The geometric (or Koyck) distributed lag is a special case of an IDL. With the Koyck distributed lag, the coefficient on zt-h is δhα0, where δ is between zero and one. Therefore, the effect of zt on yt declines geometrically as the lag increases.
Contemporaneous Exogeneity: The explanatory variables are contemporaneously exogenous, that is, E(ut|xt1, …, xtk) = 0.
Vector Autoregressive (VAR) Models
Basic Idea: A VAR model is a system of multiple time series equations where each variable is regressed on its own lags and the lags of all other variables in the system.
Granger Causality: A limited notion of causality where past values of one series (xt) are useful for predicting future values of another series (yt), after past values of yt have been controlled for.
Forecasting AR(1) at Time t+h
Standard Error of Forecast: The text does not provide a specific formula for the standard error of the forecast for ρ < 1 or ρ = 1 in yt = β0 + ρyT −1 + ut. However, it does mention that a forecast interval can be constructed in the same way as a prediction interval. See section 6.4.