FINAL Stochastic Modeling

0.0(0)
studied byStudied by 0 people
0.0(0)
full-widthCall Kai
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
GameKnowt Play
Card Sorting

1/80

flashcard set

Earn XP

Description and Tags

i WILL DO WELL omg

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

81 Terms

1
New cards

library(fpp3)

includes:

  • tsibble (time series data frames)

  • fable (forecasting models)

  • feasts (features and plots)

  • fabletools (helper functions)

2
New cards

library(tidyverse)

Loads data manipulation and visualization tools (dplyr, ggplot2, readr, etc.). Lubridate for mdy()

3
New cards

dmy()

Converts "01/01/2000" into a date (day-month-year format).

4
New cards

seq(from = dmy("01/01/2000"), length.out = 1000, by = "1 day")

Generates a daily sequence of 1000 dates starting Jan 1, 2000

5
New cards

tsibble()

Makes a time-aware tibble (R dataframe)

6
New cards

index=date

tells the tsibble that the date column is the time variable.

7
New cards

key=

Tells R tsibble how to group multiple series (e.g., by country or store).

8
New cards

rnorm(n)

Creates 1000 random values from a normal distribution (mean 0, sd 1). This simulates white noise (pure randomness).

9
New cards

autoplot()

Automatically plots the time series using ggplot2

10
New cards

ACF()

Overall correlation with past values, how far correlations persist.

Bars inside blue lines: Likely just random noise

Bars outside: Significant lag relationships. helps choose ARIMA terms

11
New cards

PACF()

Direct correlation after removing indirect effects, where correlation cuts off. helps choose ARIMA terms

12
New cards

select()

selects variables (columns) from a tibble or data frame

13
New cards

filter()

chooses rows of the tibble that meet the conditions listed

14
New cards

gg_tsdisplay()

Shows a set of plots: the data, ACF, and PACF all together.

15
New cards

difference()

Calculates the difference between consecutive values — removes trend

16
New cards

features()

Extracts statistics from the data

17
New cards

model()

Fits one or more models to your data

18
New cards

ARIMA()

Fits an ARIMA model

19
New cards

pdq(p,d,q)

ARIMA parameters:

  • p: autoregressive order (How many past values to look at)

  • d: differencing order (How many times we subtract to remove trend)

  • q: moving average order (How many past mistakes to look at)

20
New cards

AIC

(Akaike Information Criterion): Lower = better model fit with less overfitting

21
New cards

.resid

features() column automatically created for model residuals
Positive .resid → model predicted too low
Negative .resid → model predicted too high

22
New cards

ljung_box test

Tests if residuals are white noise (no pattern).

Null hypothesis: residuals are white noise.
> 0.05 = good, residuals are random

23
New cards

train data

used to train model, usually seperate using filter_index(.~"date") (.~ means beginning). usually 80% data for train

24
New cards

test data

used to test model accuracy, usually seperate using filter_index("2018 W22"~.) usually 20% data for test

25
New cards

Box-Cox Transformation

A mathematical formula used to make data less skewed and more stable (less uneven), so models like ARIMA or ETS work better.

26
New cards

λ (lambda)

A number that controls how strong the Box-Cox transformation is. It decides how much to adjust the data.

27
New cards

λ = 0

Means take the log of the data (log(y)).

28
New cards

λ = 1

Means no transformation (data stays the same).

29
New cards

box_cox(y, λ)

Applies the Box-Cox transformation to a variable y.

30
New cards

inv_box_cox(y, λ)

Reverses the Box-Cox transformation, turning it back into the original scale.

31
New cards

features(..., guerrero)

Finds the best λ automatically for your data.

32
New cards

pull(lambda_guerrero)

Extracts that λ number from the result.

33
New cards

accuracy()

Compares forecasts to actual values to measure how good the predictions were.

34
New cards

forecast()

Creates future predictions from a fitted model (e.g., ARIMA, ETS, etc.).

35
New cards

h

The forecast horizon — how many steps ahead you want to predict (e.g., 10 days, 12 months). Example: forecast(h = 10).

36
New cards

level

The confidence level for prediction intervals (e.g., 95%). Example: forecast(level = 95) shows the 95% confidence range.

37
New cards

ME

Mean Error: The average bias. Tells if your model overpredicts (+) or underpredicts (−) on average. Ideally close to 0.

38
New cards

RMSE

Root Mean Squared Error: Think “average size of your mistakes.” Big errors count more heavily. Lower = better.

39
New cards

MAE

Mean Absolute Error: Average absolute difference between predicted and actual values. Lower = better.

40
New cards

MPE

Mean Percentage Error: Average error as a percent of actual values. Can be misleading if data has small numbers.

41
New cards

MAPE

Mean Absolute Percentage Error: Like MAE, but expressed as a percent. Example: 5 means “on average, 5% off.” Lower = better.

42
New cards

MASE

Mean Absolute Scaled Error: Compares your model to a “naïve” model (just repeating last value). < 1 means better than naïve.

43
New cards

RMSSE

Root Mean Squared Scaled Error: Similar to MASE but penalizes large errors more. < 1 = good.

44
New cards

ACF1

Autocorrelation of residuals (lag 1): Checks if leftover errors are related over time. Close to 0 = good (means residuals are random).

45
New cards

augment()

Adds fitted values and residuals (.fitted, .resid) to the dataset.

46
New cards

.fitted

The model’s predicted values for each time point; you read it by comparing it to the actual data—if .fitted is close to the real value, the model predicted well, but if it’s far, the model missed that point.

47
New cards

glance()

Gives summary info for models (AIC, BIC, log-likelihood, etc.).

48
New cards

AICc

Corrected AIC: AIC adjusted for small sample sizes; use when you have limited data. Lower = better.

49
New cards

BIC

Bayesian Information Criterion: Similar to AIC but gives a stronger penalty for extra parameters; lower = better and prefers simpler models.

50
New cards

log_lik

Log-Likelihood: Shows how likely the model is to produce the observed data; higher = better fit

51
New cards

df

Degrees of Freedom: Number of parameters estimated by the model; higher means a more complex model.

52
New cards

mean

Average of the fitted values or the data (depending on model type).

53
New cards

variance

Spread of the fitted values; shows how much predictions vary over time.

54
New cards

mutate()

Adds or changes a column in your dataset

55
New cards

filter_index()

Filters data based on time (e.g., only 2018–2020 dates).

56
New cards

geom_line()

Draws a line on a plot (used to add fitted or predicted values).

57
New cards

autolayer()

Adds another line (like actual data) to a plot that already has forecasts.

58
New cards

unitroot_ndiffs

Test to find how many differences are needed to make data stable (stationary).

59
New cards

stat_arch_lm

Checks if variance (spread) of data changes over time (heteroskedasticity).

60
New cards

ETS()

Error-Trend-Seasonal model — focuses on trends and repeating patterns.

61
New cards

NNETAR()

Neural Network AutoRegressive — machine-learning style time-series model.

62
New cards

TSLM()

Time Series Linear Model — regression model with trend and season terms.

63
New cards

pdq()

Sets non-seasonal ARIMA parameters: p = AR order, d = differences, q = MA order.

64
New cards

PDQ()

Sets seasonal ARIMA parameters: P = seasonal AR, D = seasonal differences, Q = seasonal MA.

65
New cards

autoplot(level=NULL)

Plots forecasts (without shaded confidence intervals if level=NULL).

66
New cards

forecast(h=)

Makes future predictions; h = how many steps ahead to forecast.

67
New cards

rnorm()

Generates random numbers from a normal (bell-curve) distribution.

68
New cards

for loop

Repeats code multiple times (used to build or simulate data step-by-step).

69
New cards

set.seed()

Makes random results repeatable (same random numbers every time you run it).

70
New cards

components()

Extracts the decomposed parts (trend, season, remainder).

71
New cards

summarise()

Aggregates data (e.g., find monthly average or total).

72
New cards

factor()

Treats a column as a category (useful for labeling seasons or quarters).

73
New cards

RW()

Random walk model — predicts the next value as the most recent value (aka naive).

74
New cards

TSLM(...~trend())

Time Series Linear Model — fits a straight line trend over time.

75
New cards

.model

Identifies which model each result came from (e.g., “ARIMA”, “ETS”, “TREND”).

76
New cards

pull()

Extracts a single column from your data (like .resid).

77
New cards

sum(na.rm = TRUE)

Adds values while ignoring any missing ones (NAs).

78
New cards

NAIVE()

The forecast is simply the very last observed value. (If sales were 100 yesterday, they will be 100 tomorrow).

79
New cards

SNAIVE()

Seasonal Naive: The forecast is the value from the last season. (If sales for January were 100 last year, they will be 100 this January).

80
New cards

MEAN()

The forecast is the average of all historical data. (If the average sales have been 100 over 10 years, the forecast is 100).

81
New cards

Winkler Score

Measures the accuracy of forecast intervals, balancing interval width and coverage. Lower scores indicate more precise and reliable forecasts; higher scores mean less accurate or overly wide intervals.