1/16
This collection of flashcards covers vital vocabulary and concepts regarding time-series splits and evaluation metrics relevant to machine learning interviews.
Name | Mastery | Learn | Test | Matching | Spaced |
|---|
No study sessions yet.
Time-Series Data
Data that is ordered and has a temporal component, making it different from normal ML datasets.
Train Set
The subset of data containing the earliest timestamps used for training the model.
Validation Set
A portion of the data used to tune hyperparameters and simulate future unseen data; it comes after the training set.
Test Set
The subset of data that comes after the validation set, used for final model evaluation.
Hold-out Split
A method of splitting data for time-series where the dataset is divided into distinct training, validation, and test sets.
Rolling Window Validation
A validation method where both training and validation windows move forward through time, retaining a fixed training size.
Expanding Window Validation
A validation approach where the training window grows over time, keeping all historical data.
K-fold Cross-Validation
An invalid method for time-series data that shuffles data, which leads to future information leakage.
Mean Absolute Error (MAE)
A metric measuring the average absolute differences between predicted values and actual values, robust to outliers.
Root Mean Squared Error (RMSE)
A metric that measures the square root of the average squared differences between predicted and actual values, sensitive to outliers.
Loss Function
Quantifies how inaccurate the model is during training; it is minimized by the optimizer.
MSE (Mean Squared Error)
A regression loss function sensitive to large errors, providing smooth gradients for optimization.
Cross-Entropy Loss
A loss function used in classification tasks, measuring the difference between predicted probability distributions and true labels.
Hyperparameter Tuning
The process of optimizing parameters that govern the model training process without using the test data.
Concept Drift
The change in the statistical properties of the target variable, making past data less relevant for future predictions.
Evaluation Metrics
Measures of model performance, like MAE and RMSE, used to assess how well the model performs after training.
Non-IID Data
Data that is not independently and identically distributed, common in time-series due to the order and time dependency.