Time-Series Train/Validation/Test Splits

0.0(0)

Studied by 0 people

0.0(0)

Call Kai

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Knowt Play

Card Sorting

1/16

Earn XP

Description and Tags

This collection of flashcards covers vital vocabulary and concepts regarding time-series splits and evaluation metrics relevant to machine learning interviews.

Study Analytics

Name	Mastery	Learn	Test	Matching	Spaced

No study sessions yet.

17 Terms

New cards

Time-Series Data

Data that is ordered and has a temporal component, making it different from normal ML datasets.

New cards

Train Set

The subset of data containing the earliest timestamps used for training the model.

New cards

Validation Set

A portion of the data used to tune hyperparameters and simulate future unseen data; it comes after the training set.

New cards

Test Set

The subset of data that comes after the validation set, used for final model evaluation.

New cards

Hold-out Split

A method of splitting data for time-series where the dataset is divided into distinct training, validation, and test sets.

New cards

Rolling Window Validation

A validation method where both training and validation windows move forward through time, retaining a fixed training size.

New cards

Expanding Window Validation

A validation approach where the training window grows over time, keeping all historical data.

New cards

K-fold Cross-Validation

An invalid method for time-series data that shuffles data, which leads to future information leakage.

New cards

Mean Absolute Error (MAE)

A metric measuring the average absolute differences between predicted values and actual values, robust to outliers.

New cards

Root Mean Squared Error (RMSE)

A metric that measures the square root of the average squared differences between predicted and actual values, sensitive to outliers.

New cards

Loss Function

Quantifies how inaccurate the model is during training; it is minimized by the optimizer.

New cards

MSE (Mean Squared Error)

A regression loss function sensitive to large errors, providing smooth gradients for optimization.

New cards

Cross-Entropy Loss

A loss function used in classification tasks, measuring the difference between predicted probability distributions and true labels.

New cards

Hyperparameter Tuning

The process of optimizing parameters that govern the model training process without using the test data.

New cards

Concept Drift

The change in the statistical properties of the target variable, making past data less relevant for future predictions.

New cards

Evaluation Metrics

Measures of model performance, like MAE and RMSE, used to assess how well the model performs after training.

New cards

Non-IID Data

Data that is not independently and identically distributed, common in time-series due to the order and time dependency.