1/64
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced | Call with Kai |
|---|
No analytics yet
Send a link to your students to track their progress
( In the context of time series) = 'stationary' series
A series whose statistical properties, such as the mean and variance, do not change over time.
the primary purpose of 'differencing' a time series in ARIMA modeling
To stabilize the mean of the series by removing trends, thereby making it stationary.
In an ARIMA model, the 'Integrated' (I) component represents
The number of differencing steps required to make the time series data stationary.
Random Walk
The _____ model is a specific case where the differenced series is white noise with a zero mean, often used as the basis for the naïve forecasting method.
Formula: Random Walk with Drift
$y_t = c + y_{t-1} + \epsilon_t$, where $c$ is the average change between consecutive observations.
The series will show a long-term upward trend or drift.
In a Random Walk with Drift model, what happens to the series if the constant $c > 0$?
Autoregressive Model (AR)
A model that predicts the current value of a variable using a linear combination of its own previous (lagged) values.
the mathematical condition for an $AR(1)$ model to be considered stationary
The absolute value of the parameter $|\phi_1|$ must be less than 1.
The values tend to oscillate between positive and negative signs.
How does the series behave in an $AR(1)$ model when the parameter $\phi_1$ is negative?
Moving Average Model (MA)
A model that uses past forecast errors as predictors for the current value of the series.
What distinguishes the Partial Autocorrelation Function (PACF) from the Autocorrelation Function (ACF)
PACF measures the correlation between $y_t$ and $y_{t-k}$ after removing the effects of intermediate lags, whereas ACF measures the overall correlation.
The number of observations per year or the length of the seasonal cycle (e.g., 4 for quarterly, 12 for monthly).
In $SARIMA(p,d,q)(P,D,Q)_m$, the subscript $m$ represents…
It includes a correction term to prevent overfitting when the sample size $T$ is small relative to the number of parameters.
Why is the Corrected Akaike Information Criterion ($AIC_c$) preferred over standard $AIC$ in some scenarios?
$BIC$ (Bayesian Information Criterion) imposes a stricter penalty for adding parameters.
Which information criterion penalizes model complexity more heavily, $AIC$ or $BIC$?
the primary 'Key Idea' behind Simple Exponential Smoothing (SES)
Forecasts are calculated as a weighted average of past observations, with weights decaying exponentially as the data gets older.
It gives much more weight to the most recent observation, making the forecast highly reactive to recent changes.
In SES, if the smoothing parameter $\alpha$ is close to 1, how does the model prioritize data?
It gives more weight to the distant past, resulting in a smoother forecast that ignores short-term fluctuations.
In SES, if the smoothing parameter $\alpha$ is close to 0, what is the effect on the forecast?
Trend (or slope)
Holt’s Linear Trend method extends SES by adding a second smoothing equation to track the _____ of the series.
specific problem the 'Damped Trend' method solves in long-term forecasting
It prevents over-forecasting by gradually reducing the strength of a trend that might not continue indefinitely in reality.
The forecast converges to a constant horizontal level rather than increasing or decreasing indefinitely.
In the Damped Trend method, what happens to the forecast as the horizon $h$ approaches infinity?
is the role of the $\phi$ parameter in a Damped Trend model
It is the damping parameter (between 0 and 1) that determines how quickly the trend's impact fades over time.
Holt-Winters Additive Method
An extension of Holt's method that captures seasonality by adding a seasonal component to the level and trend.
When should one use Multiplicative Holt-Winters instead of the Additive method
When the seasonal variations change in proportion to the level of the series
(e.g., seasonal swings get larger as the data increases).
Match the Holt-Winters parameter to its component: $\alpha$, $\beta$, $\gamma$.
$\alpha$ for Level, $\beta$ for Trend, and $\gamma$ for Seasonality.
In an ETS model notation, such as $ETS(A, N, A)$, each letter stands for
The type of Error (Additive), Trend (None), and Seasonality (Additive).
How are optimal smoothing parameters (like $\alpha$ and $\beta$) typically chosen in exponential smoothing software
By using numerical optimization to minimize the Sum of Squared Errors ($SSE$).
In multiple regression, a coefficient $\beta_j$ represents
The marginal effect of predictor $x_j$ on the response variable $y$, holding all other predictors constant.
a major disadvantage of using a large number of 'Dummy Variables' to model seasonality in regression
It can lead to overfitting, where the model fits random noise instead of the true underlying patterns.
Why might 'Fourier terms' be preferred over dummy variables for modeling long seasonal cycles?
They can capture complex seasonality using fewer parameters and allow for smoother transitions between periods.
White Noise
In time series regression, the error term $\epsilon_t$ is assumed to be _____, meaning it has no autocorrelation.
the relationship between the number of differencing steps $d$ and the size of prediction intervals
The higher the value of $d$, the more rapidly the prediction intervals increase in size over the forecast horizon.
the general stationarity condition for AR models regarding 'complex roots'
The complex roots of the characteristic polynomial must lie outside the unit circle on the complex plane.
How the 'Integrated' component in ARIMA relate to a Random Walk model
An $ARIMA(0, 1, 0)$ model is equivalent to a Random Walk model.
Why an ARIMA model often described as 'hard to interpret' compared to decomposition methods
Because it models the data based on complex mathematical lags rather than visible structures like explicit trend or seasonality.
In Information Criteria, the term $L$ represents
The Likelihood of the data, which measures how well the model parameters fit the observed values.
Under what condition is $BIC$ specifically recommended over $AIC$ for model selection
When working with very large datasets and many potential models to avoid selecting overly complex models.
the specific goal of 'Portmanteau tests' (like Ljung-Box) in the ARIMA modeling procedure
To check if the residuals of the chosen model behave like white noise, indicating the model has captured all available information.
Simple Exponential Smoothing is considered the foundation for which two more advanced models
Holt’s Linear Trend and Holt-Winters Seasonal models.
The smoothing equation for SES in component form
$\ell_t = \alpha y_t + (1 - \alpha) \ell_{t-1}$, where $\ell_t$ is the level at time $t$.
the impact of a 'damped trend' on long-run forecasts compared to a 'linear trend'
Long-run forecasts for a damped trend stay constant
whereas a linear trend continues to increase or decrease indefinitely.
In the $ETS$ framework, 'N' stands for in a trend or seasonal component
None (the component is not included in the model).
How the $ETS(M, N, N)$ model treat errors differently than $ETS(A, N, N)$
It uses multiplicative errors (relative/percentage-based) instead of additive errors (absolute values).
the 'backshift notation' $B$ does when applied to a variable $y_t$
It shifts the data back one period in time ($B y_t = y_{t-1}$).
Least Squares
In multiple regression, minimizing the sum of squared errors ($SSE$) is conceptually known as the _____ method.
$AIC_c$
Which information criterion is specifically corrected for small sample sizes $T$?
'overfitting'
(In the context of regression)
When a model is too complex and fits the random noise in the training data rather than the underlying pattern, leading to poor future predictions.
the three components of a stationary series as defined by visible data structures
Roughly horizontal
constant variance
no patterns predictable in the long-term.
The weights decrease geometrically (exponentially)
____ occurs to the weights in Simple Exponential Smoothing as observations get older
In $ARIMA(p, d, q)$, the parameter $p$ denotes
The order of the autoregressive part (number of lagged observations used as predictors).
In $ARIMA(p, d, q)$, the parameter $q$ denote
The order of the moving average part (number of lagged forecast errors used as predictors).
Why prediction intervals for ARIMA models often tend to be 'too narrow' in practice
Because they usually do not account for uncertainty in parameter estimates or model selection.
$AIC$ (Akaike Information Criterion)
The _____ criterion is asymptotically equivalent to leave-one-out cross-validation for linear regression.
Additive Error, Additive Trend, and Additive Seasonality.
In the $ETS(A, A, A)$ model, the three 'A's represent____
A straight line.
If $c = 0$ and $d = 2$ in an ARIMA model, the long-term forecasts follow ____ shape
A quadratic trend.
If $c \ne 0$ and $d = 2$ in an ARIMA model, the long-term forecasts follow _____ shape
the primary benefit of the 'Innovation State Space' (ETS) framework
It provides a unified way to estimate parameters and calculate prediction intervals for various exponential smoothing methods.
True (though they generate different prediction intervals).
True or False: $ETS$ models with additive and multiplicative errors generate the same point forecasts if parameters are identical.
In a seasonal ARIMA model, $D$ represent
The number of seasonal differences used.
the first step in the traditional modeling procedure for ARIMA models
Plot the data and identify any unusual observations.
Why is 'numerical optimization' required for exponential smoothing instead of a closed-form solution
Because the relationship between the parameters and the error is non-linear, unlike standard linear regression.
The seasonal parts of the Autoregressive, Differencing, and Moving Average components.
In $SARIMA(p,d,q)(P,D,Q)_m$, $(P, D, Q)$ represents____
What distinguishes the 'Drift method' from the 'Naive method' in terms of their underlying models
The Drift method is based on a Random Walk with a non-zero mean ($c \ne 0$)
while the Naive method assumes a zero mean ($c = 0$).
In the context of Information Criteria, what happens to the penalty as the number of predictors $k$ increases
The penalty increases, making it harder for the model to achieve a lower (better) score without a significant improvement in fit.
What does a Portmanteau test result with a high p-value indicate for an ARIMA model
It suggests that the residuals are consistent with white noise and the model is adequate.
In regression modeling, why is 'Leave-one-out cross-validation' used
To estimate the test error and assess how well the model will generalize to new, unseen data.