Time-Series Train/Validation/Test Splits

0.0(0)
studied byStudied by 0 people
0.0(0)
full-widthCall Kai
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
GameKnowt Play
Card Sorting

1/16

flashcard set

Earn XP

Description and Tags

This collection of flashcards covers vital vocabulary and concepts regarding time-series splits and evaluation metrics relevant to machine learning interviews.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

17 Terms

1
New cards

Time-Series Data

Data that is ordered and has a temporal component, making it different from normal ML datasets.

2
New cards

Train Set

The subset of data containing the earliest timestamps used for training the model.

3
New cards

Validation Set

A portion of the data used to tune hyperparameters and simulate future unseen data; it comes after the training set.

4
New cards

Test Set

The subset of data that comes after the validation set, used for final model evaluation.

5
New cards

Hold-out Split

A method of splitting data for time-series where the dataset is divided into distinct training, validation, and test sets.

6
New cards

Rolling Window Validation

A validation method where both training and validation windows move forward through time, retaining a fixed training size.

7
New cards

Expanding Window Validation

A validation approach where the training window grows over time, keeping all historical data.

8
New cards

K-fold Cross-Validation

An invalid method for time-series data that shuffles data, which leads to future information leakage.

9
New cards

Mean Absolute Error (MAE)

A metric measuring the average absolute differences between predicted values and actual values, robust to outliers.

10
New cards

Root Mean Squared Error (RMSE)

A metric that measures the square root of the average squared differences between predicted and actual values, sensitive to outliers.

11
New cards

Loss Function

Quantifies how inaccurate the model is during training; it is minimized by the optimizer.

12
New cards

MSE (Mean Squared Error)

A regression loss function sensitive to large errors, providing smooth gradients for optimization.

13
New cards

Cross-Entropy Loss

A loss function used in classification tasks, measuring the difference between predicted probability distributions and true labels.

14
New cards

Hyperparameter Tuning

The process of optimizing parameters that govern the model training process without using the test data.

15
New cards

Concept Drift

The change in the statistical properties of the target variable, making past data less relevant for future predictions.

16
New cards

Evaluation Metrics

Measures of model performance, like MAE and RMSE, used to assess how well the model performs after training.

17
New cards

Non-IID Data

Data that is not independently and identically distributed, common in time-series due to the order and time dependency.