1/80
i WILL DO WELL omg
Name | Mastery | Learn | Test | Matching | Spaced |
|---|
No study sessions yet.
library(fpp3)
includes:
tsibble (time series data frames)
fable (forecasting models)
feasts (features and plots)
fabletools (helper functions)
library(tidyverse)
Loads data manipulation and visualization tools (dplyr, ggplot2, readr, etc.). Lubridate for mdy()
dmy()
Converts "01/01/2000" into a date (day-month-year format).
seq(from = dmy("01/01/2000"), length.out = 1000, by = "1 day")
Generates a daily sequence of 1000 dates starting Jan 1, 2000
tsibble()
Makes a time-aware tibble (R dataframe)
index=date
tells the tsibble that the date column is the time variable.
key=
Tells R tsibble how to group multiple series (e.g., by country or store).
rnorm(n)
Creates 1000 random values from a normal distribution (mean 0, sd 1). This simulates white noise (pure randomness).
autoplot()
Automatically plots the time series using ggplot2
ACF()
Overall correlation with past values, how far correlations persist.
Bars inside blue lines: Likely just random noise
Bars outside: Significant lag relationships. helps choose ARIMA terms
PACF()
Direct correlation after removing indirect effects, where correlation cuts off. helps choose ARIMA terms
select()
selects variables (columns) from a tibble or data frame
filter()
chooses rows of the tibble that meet the conditions listed
gg_tsdisplay()
Shows a set of plots: the data, ACF, and PACF all together.
difference()
Calculates the difference between consecutive values — removes trend
features()
Extracts statistics from the data
model()
Fits one or more models to your data
ARIMA()
Fits an ARIMA model
pdq(p,d,q)
ARIMA parameters:
p: autoregressive order (How many past values to look at)
d: differencing order (How many times we subtract to remove trend)
q: moving average order (How many past mistakes to look at)
AIC
(Akaike Information Criterion): Lower = better model fit with less overfitting
.resid
features() column automatically created for model residuals
Positive .resid → model predicted too low
Negative .resid → model predicted too high
ljung_box test
Tests if residuals are white noise (no pattern).
Null hypothesis: residuals are white noise.
> 0.05 = good, residuals are random
train data
used to train model, usually seperate using filter_index(.~"date") (.~ means beginning). usually 80% data for train
test data
used to test model accuracy, usually seperate using filter_index("2018 W22"~.) usually 20% data for test
Box-Cox Transformation
A mathematical formula used to make data less skewed and more stable (less uneven), so models like ARIMA or ETS work better.
λ (lambda)
A number that controls how strong the Box-Cox transformation is. It decides how much to adjust the data.
λ = 0
Means take the log of the data (log(y)).
λ = 1
Means no transformation (data stays the same).
box_cox(y, λ)
Applies the Box-Cox transformation to a variable y.
inv_box_cox(y, λ)
Reverses the Box-Cox transformation, turning it back into the original scale.
features(..., guerrero)
Finds the best λ automatically for your data.
pull(lambda_guerrero)
Extracts that λ number from the result.
accuracy()
Compares forecasts to actual values to measure how good the predictions were.
forecast()
Creates future predictions from a fitted model (e.g., ARIMA, ETS, etc.).
h
The forecast horizon — how many steps ahead you want to predict (e.g., 10 days, 12 months). Example: forecast(h = 10).
level
The confidence level for prediction intervals (e.g., 95%). Example: forecast(level = 95) shows the 95% confidence range.
ME
Mean Error: The average bias. Tells if your model overpredicts (+) or underpredicts (−) on average. Ideally close to 0.
RMSE
Root Mean Squared Error: Think “average size of your mistakes.” Big errors count more heavily. Lower = better.
MAE
Mean Absolute Error: Average absolute difference between predicted and actual values. Lower = better.
MPE
Mean Percentage Error: Average error as a percent of actual values. Can be misleading if data has small numbers.
MAPE
Mean Absolute Percentage Error: Like MAE, but expressed as a percent. Example: 5 means “on average, 5% off.” Lower = better.
MASE
Mean Absolute Scaled Error: Compares your model to a “naïve” model (just repeating last value). < 1 means better than naïve.
RMSSE
Root Mean Squared Scaled Error: Similar to MASE but penalizes large errors more. < 1 = good.
ACF1
Autocorrelation of residuals (lag 1): Checks if leftover errors are related over time. Close to 0 = good (means residuals are random).
augment()
Adds fitted values and residuals (.fitted, .resid) to the dataset.
.fitted
The model’s predicted values for each time point; you read it by comparing it to the actual data—if .fitted is close to the real value, the model predicted well, but if it’s far, the model missed that point.
glance()
Gives summary info for models (AIC, BIC, log-likelihood, etc.).
AICc
Corrected AIC: AIC adjusted for small sample sizes; use when you have limited data. Lower = better.
BIC
Bayesian Information Criterion: Similar to AIC but gives a stronger penalty for extra parameters; lower = better and prefers simpler models.
log_lik
Log-Likelihood: Shows how likely the model is to produce the observed data; higher = better fit
df
Degrees of Freedom: Number of parameters estimated by the model; higher means a more complex model.
mean
Average of the fitted values or the data (depending on model type).
variance
Spread of the fitted values; shows how much predictions vary over time.
mutate()
Adds or changes a column in your dataset
filter_index()
Filters data based on time (e.g., only 2018–2020 dates).
geom_line()
Draws a line on a plot (used to add fitted or predicted values).
autolayer()
Adds another line (like actual data) to a plot that already has forecasts.
unitroot_ndiffs
Test to find how many differences are needed to make data stable (stationary).
stat_arch_lm
Checks if variance (spread) of data changes over time (heteroskedasticity).
ETS()
Error-Trend-Seasonal model — focuses on trends and repeating patterns.
NNETAR()
Neural Network AutoRegressive — machine-learning style time-series model.
TSLM()
Time Series Linear Model — regression model with trend and season terms.
pdq()
Sets non-seasonal ARIMA parameters: p = AR order, d = differences, q = MA order.
PDQ()
Sets seasonal ARIMA parameters: P = seasonal AR, D = seasonal differences, Q = seasonal MA.
autoplot(level=NULL)
Plots forecasts (without shaded confidence intervals if level=NULL).
forecast(h=)
Makes future predictions; h = how many steps ahead to forecast.
rnorm()
Generates random numbers from a normal (bell-curve) distribution.
for loop
Repeats code multiple times (used to build or simulate data step-by-step).
set.seed()
Makes random results repeatable (same random numbers every time you run it).
components()
Extracts the decomposed parts (trend, season, remainder).
summarise()
Aggregates data (e.g., find monthly average or total).
factor()
Treats a column as a category (useful for labeling seasons or quarters).
RW()
Random walk model — predicts the next value as the most recent value (aka naive).
TSLM(...~trend())
Time Series Linear Model — fits a straight line trend over time.
.model
Identifies which model each result came from (e.g., “ARIMA”, “ETS”, “TREND”).
pull()
Extracts a single column from your data (like .resid).
sum(na.rm = TRUE)
Adds values while ignoring any missing ones (NAs).
NAIVE()
The forecast is simply the very last observed value. (If sales were 100 yesterday, they will be 100 tomorrow).
SNAIVE()
Seasonal Naive: The forecast is the value from the last season. (If sales for January were 100 last year, they will be 100 this January).
MEAN()
The forecast is the average of all historical data. (If the average sales have been 100 over 10 years, the forecast is 100).
Winkler Score
Measures the accuracy of forecast intervals, balancing interval width and coverage. Lower scores indicate more precise and reliable forecasts; higher scores mean less accurate or overly wide intervals.