Time Series Analysis and Forecasting
Stationarity
- A time series is stationary when it has neither a deterministic trend nor a stochastic trend.
- Deterministic trend: Data consistently increases or decreases over time.
- Stochastic trend: Data exhibits a random walk, fluctuating up and down unpredictably.
- Strict Stationarity: The joint distribution of the time series (e.g., y1, y2, …) does not change with time shifts. Difficult to assume in practice.
- Weak Stationarity: Easier to assume. Requires a constant mean and constant variance across time shifts.
- Unit root tests check for weak stationarity without assuming specific distributions.
MA Process (Moving Average)
- Time series generated from a process.
- Pure random process (white noise) is notated as \epsilon_t.
- An MA process of order q follows the equation: yt = \theta0 \epsilont + \theta1 \epsilon{t-1} + \theta2 \epsilon{t-2} + … + \thetaq \epsilon_{t-q}
- y_t is the sum of weighted random shocks over time.
- q represents the number of time periods to go back into the past.
- An MA process is created by short-term shocks affecting a short period of time without persistent trends.
- Series generated are a moving average of shocks across time.
- MA processes possess autocorrelation due to the use of lagged random shock values.
- MA processes are stationary.
Real-World Examples
- Manufacturing (Quality Control):
- Error in a machine weighing candy packs.
- Additional weight added to packs due to machine error.
- Even after fixing the machine, leftover errors can persist in subsequent operations.
- Tomorrow's packs might contain new errors plus leftover errors from today (MA(1) process).
- Restaurant Sales:
- Social media influencers featuring restaurants can cause a sales increase.
- The increase is a random shock (\epsilon_t).
- The popularity lasts for a short time (e.g., one to two weeks).
- Sales become the weekly average of the initial shock plus the following weeks affected by the increase (MA(2) process).
Generating Simulated Data (Stata)
- Set up time series structure with 200 observations (e.g., using command
gen t = _n). - Generate white noise (random errors) using
rnormal function (e.g., gen e = rnormal(0,1) for standard normal distribution). - Create lagged values of the error term (e.g.,
L.e for lag of e) after setting the time series (e.g., tsset t). - Simulate the MA process (e.g.,
gen y = e + 0.6*L.e for an MA(1) process). - Plot the generated process and check autocorrelation function (ACF) and Augmented Dickey-Fuller test.
- Autocorrelation. The correlation between the current and the logged value of order one.
- Augmented Dickey-Fuller (ADF) test. Unit roots are rejected (stationarity).
AR Process (Autoregressive Model)
- Time series is a function of its own past values.
- An AR process of lag p follows the equation: yt = \rho1 y{t-1} + \rho2 y{t-2} + … + \rhop y{t-p} + \epsilont where \epsilon_t is the random shock from the same period.
- y_t is a weighted average of its own past values.
- Random shocks from the past are not considered (unlike MA).
- GDP growth rate, inflation rate often follow AR processes.
- Similar to sticky prices in macroeconomics.
- Each value remembers and tries to stick to the previous one, but with some decay.
- Autocorrelation exists.
- Not automatically stationary if it has unit roots.
- Unit roots/random walk process comes from an AR process when \rho=1.
Possible Examples
- Stock Returns: Similar to past values, unless there's a significant shock.
- Inflation Rates: Related to sticky prices; difficult to experience a very high change in price quickly.
- Unemployment Rate: Stays around the natural rate of unemployment.
- Environment: Temperature anomalies can be modeled as AR(1) or AR(2) processes.
- Audio signal.
Generation of AR one (1) Process (Stata)
- Setup time series structure.
- Generate three types of AR(1) processes with low, high, and unit persistence.
- Generate the error term (random shocks).
- Create variables for each AR process with different persistence levels.
- Example. AR(1) process with low persistence:
gen e1 = rnormal(0,1)gen y1 = .replace y1 = e1[1] (first value)- Loop to generate subsequent values:
forvalues i = 2/200 {replace y1[``i''] = 0.3*y1[``i''-1] + e1[``i'']}
- AR(1) with high persistence (\rho=0.95) and random walk (\rho=1) follow similar logic.
- Plot the processes to visualize differences.
- ACF plots show autocorrelation.
- Augmented Dickey-Fuller test confirms stationarity (or non-stationarity) empirically.
ARMA Process
- Combination of AR and MA processes.
- Past values and past random shocks both affect the value today.
- Mathematical representation:
- yt = \rho1 y{t-1} + \rho2 y{t-2} + … + \rhop y{t-p} + \theta0 \epsilont + \theta1 \epsilon{t-1} + \theta2 \epsilon{t-2} + … + \thetaq \epsilon_{t-q}
When are ARMA Models Useful?
- If the series is stationary.
- If stationarity is not well-described by just AR or MA models alone.
- Share prices: Stickiness (AR) and shocks to the company (MA).
ARIMA Process
- A more general version of ARMA.
- Takes non-stationary series into account.
- AutoRegressive Integrated Moving Average.
- Includes autoregressive features, moving average components, and order of integration.
- Notation: ARIMA(p, d, q), where:
- p = lag order for AR components
- d = order of integration (number of times differenced to achieve stationarity)
- q = lag order for MA components
- If the series is nonstationary, difference it until it becomes stationary.
- Then, fit the differenced series with an ARMA model.
Extensions to ARIMA Processes
- ARIMAX: Includes exogenous variables (e.g., interest rates, policy changes).
- SARIMA: Seasonal ARIMA; takes into account seasonal components.
- SARIMAX: Seasonal ARIMA with exogenous variables.
ARIMA Modeling and Forecasting: Box-Jenkins Methodology
- Developed by George Box and G.M. Jenkins in the 1970s.
- A procedural approach for identifying, specifying, estimating, and diagnosing ARIMA models.
- Relies on autocorrelations and partial autocorrelations to determine lag orders.
- Box quote: All models are wrong, but some are useful.
- Modeling is never be perfect, so aim for parsimony (simpler models).
Process of Statistical Modeling
- Model Identification:
- Initial Check: Determine stationarity through visual inspection and ADF tests. Apply differencing if needed to get to a stationary level.
- Specify the differencing (d).
- Choose p and q using ACF and PACF. AC is for MA terms. PACF is for AR terms
- Estimation: Estimate candidate models based on ACF and PACF:
- Diagnostics.
- Forecasting.
Implementing ARIMA Forecasting in Stata
- Use DTA dataset of Philippine GDP growth rates from 1961 to 2023.
- Step 1: Initial check with
tsline to visualize and Augmented Dickey-Fuller test for stationarity. - Step 2: (If needed) Apply first difference if non-stationary.
- Step 3: Model identification using ACF and PACF to choose p and q.
- Estimate candidate ARIMA models using the
arima command:arima gdpgr, ar(1) ma(1) (ARMA(1,1))
- Save estimated models using
estimates save. - Step 4: Diagnostics (residuals analysis) using
estimates use and predict commands. - Step 5: Forecasting using the chosen model.
- Ask Stata to add more time periods.
- predict predicted values for the growth rate.
- Create a forecasting model.
- Forecast the estimates for the exiting values, it will creates partially fitted values.
- Solve for the rest of the years.
- Plot actual vs. forecasted value. A stagnate growth rate will be shown, since stationary series to begin with.