L

Time Series Analysis and Forecasting

Stationarity

  • A time series is stationary when it has neither a deterministic trend nor a stochastic trend.
  • Deterministic trend: Data consistently increases or decreases over time.
  • Stochastic trend: Data exhibits a random walk, fluctuating up and down unpredictably.
  • Strict Stationarity: The joint distribution of the time series (e.g., y1, y2, …) does not change with time shifts. Difficult to assume in practice.
  • Weak Stationarity: Easier to assume. Requires a constant mean and constant variance across time shifts.
  • Unit root tests check for weak stationarity without assuming specific distributions.

MA Process (Moving Average)

  • Time series generated from a process.
  • Pure random process (white noise) is notated as \epsilon_t.
  • An MA process of order q follows the equation: yt = \theta0 \epsilont + \theta1 \epsilon{t-1} + \theta2 \epsilon{t-2} + … + \thetaq \epsilon_{t-q}
  • y_t is the sum of weighted random shocks over time.
  • q represents the number of time periods to go back into the past.
  • An MA process is created by short-term shocks affecting a short period of time without persistent trends.
  • Series generated are a moving average of shocks across time.
  • MA processes possess autocorrelation due to the use of lagged random shock values.
  • MA processes are stationary.

Real-World Examples

  • Manufacturing (Quality Control):
    • Error in a machine weighing candy packs.
    • Additional weight added to packs due to machine error.
    • Even after fixing the machine, leftover errors can persist in subsequent operations.
    • Tomorrow's packs might contain new errors plus leftover errors from today (MA(1) process).
  • Restaurant Sales:
    • Social media influencers featuring restaurants can cause a sales increase.
    • The increase is a random shock (\epsilon_t).
    • The popularity lasts for a short time (e.g., one to two weeks).
    • Sales become the weekly average of the initial shock plus the following weeks affected by the increase (MA(2) process).

Generating Simulated Data (Stata)

  • Set up time series structure with 200 observations (e.g., using command gen t = _n).
  • Generate white noise (random errors) using rnormal function (e.g., gen e = rnormal(0,1) for standard normal distribution).
  • Create lagged values of the error term (e.g., L.e for lag of e) after setting the time series (e.g., tsset t).
  • Simulate the MA process (e.g., gen y = e + 0.6*L.e for an MA(1) process).
  • Plot the generated process and check autocorrelation function (ACF) and Augmented Dickey-Fuller test.
    • Autocorrelation. The correlation between the current and the logged value of order one.
    • Augmented Dickey-Fuller (ADF) test. Unit roots are rejected (stationarity).

AR Process (Autoregressive Model)

  • Time series is a function of its own past values.
  • An AR process of lag p follows the equation: yt = \rho1 y{t-1} + \rho2 y{t-2} + … + \rhop y{t-p} + \epsilont where \epsilon_t is the random shock from the same period.
  • y_t is a weighted average of its own past values.
  • Random shocks from the past are not considered (unlike MA).
  • GDP growth rate, inflation rate often follow AR processes.
  • Similar to sticky prices in macroeconomics.
  • Each value remembers and tries to stick to the previous one, but with some decay.
  • Autocorrelation exists.
  • Not automatically stationary if it has unit roots.
  • Unit roots/random walk process comes from an AR process when \rho=1.

Possible Examples

  • Stock Returns: Similar to past values, unless there's a significant shock.
  • Inflation Rates: Related to sticky prices; difficult to experience a very high change in price quickly.
  • Unemployment Rate: Stays around the natural rate of unemployment.
  • Environment: Temperature anomalies can be modeled as AR(1) or AR(2) processes.
  • Audio signal.

Generation of AR one (1) Process (Stata)

  • Setup time series structure.
  • Generate three types of AR(1) processes with low, high, and unit persistence.
  • Generate the error term (random shocks).
  • Create variables for each AR process with different persistence levels.
  • Example. AR(1) process with low persistence:
    • gen e1 = rnormal(0,1)
    • gen y1 = .
    • replace y1 = e1[1] (first value)
    • Loop to generate subsequent values:
    • forvalues i = 2/200 {
    • replace y1[``i''] = 0.3*y1[``i''-1] + e1[``i'']
    • }
  • AR(1) with high persistence (\rho=0.95) and random walk (\rho=1) follow similar logic.
  • Plot the processes to visualize differences.
  • ACF plots show autocorrelation.
  • Augmented Dickey-Fuller test confirms stationarity (or non-stationarity) empirically.

ARMA Process

  • Combination of AR and MA processes.
  • Past values and past random shocks both affect the value today.
  • Mathematical representation:
  • yt = \rho1 y{t-1} + \rho2 y{t-2} + … + \rhop y{t-p} + \theta0 \epsilont + \theta1 \epsilon{t-1} + \theta2 \epsilon{t-2} + … + \thetaq \epsilon_{t-q}

When are ARMA Models Useful?

  • If the series is stationary.
  • If stationarity is not well-described by just AR or MA models alone.
  • Share prices: Stickiness (AR) and shocks to the company (MA).

ARIMA Process

  • A more general version of ARMA.
  • Takes non-stationary series into account.
  • AutoRegressive Integrated Moving Average.
  • Includes autoregressive features, moving average components, and order of integration.
  • Notation: ARIMA(p, d, q), where:
    • p = lag order for AR components
    • d = order of integration (number of times differenced to achieve stationarity)
    • q = lag order for MA components
  • If the series is nonstationary, difference it until it becomes stationary.
  • Then, fit the differenced series with an ARMA model.

Extensions to ARIMA Processes

  • ARIMAX: Includes exogenous variables (e.g., interest rates, policy changes).
  • SARIMA: Seasonal ARIMA; takes into account seasonal components.
  • SARIMAX: Seasonal ARIMA with exogenous variables.

ARIMA Modeling and Forecasting: Box-Jenkins Methodology

  • Developed by George Box and G.M. Jenkins in the 1970s.
  • A procedural approach for identifying, specifying, estimating, and diagnosing ARIMA models.
  • Relies on autocorrelations and partial autocorrelations to determine lag orders.
  • Box quote: All models are wrong, but some are useful.
    • Modeling is never be perfect, so aim for parsimony (simpler models).

Process of Statistical Modeling

  • Model Identification:
    • Initial Check: Determine stationarity through visual inspection and ADF tests. Apply differencing if needed to get to a stationary level.
    • Specify the differencing (d).
    • Choose p and q using ACF and PACF. AC is for MA terms. PACF is for AR terms
  • Estimation: Estimate candidate models based on ACF and PACF:
    • Model downwards.
  • Diagnostics.
  • Forecasting.

Implementing ARIMA Forecasting in Stata

  • Use DTA dataset of Philippine GDP growth rates from 1961 to 2023.
  • Step 1: Initial check with tsline to visualize and Augmented Dickey-Fuller test for stationarity.
  • Step 2: (If needed) Apply first difference if non-stationary.
  • Step 3: Model identification using ACF and PACF to choose p and q.
    • Example: corrgram gdpgr.
  • Estimate candidate ARIMA models using the arima command:
    • arima gdpgr, ar(1) ma(1) (ARMA(1,1))
  • Save estimated models using estimates save.
  • Step 4: Diagnostics (residuals analysis) using estimates use and predict commands.
  • Step 5: Forecasting using the chosen model.
    • Ask Stata to add more time periods.
    • predict predicted values for the growth rate.
    • Create a forecasting model.
    • Forecast the estimates for the exiting values, it will creates partially fitted values.
    • Solve for the rest of the years.
  • Plot actual vs. forecasted value. A stagnate growth rate will be shown, since stationary series to begin with.