1/29
Vocabulary flashcards covering key concepts for evaluating machine learning models, including data splits, validation protocols, confusion matrices, performance metrics, and regression evaluation.
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
Training data
The subset of data used to train a machine learning model.
Test data
The subset of data used to evaluate the trained model's performance.
Hold-out validation
A validation method that randomly splits data into training and testing sets (e.g., 2/3 train, 1/3 test).
K-fold cross-validation
A validation technique that partitions data into k folds, trains on k−1 folds, tests on the remaining fold, and averages results over k runs.
Leave-one-out cross-validation
A special case of k-fold where k equals the number of samples; each sample is used once as the test set.
Final evaluation
Assessment of model performance on the designated test data after training and tuning.
Confusion matrix
A table that summarizes classifier performance by counting true positives, true negatives, false positives, and false negatives.
True Positive (TP)
A positive instance correctly predicted as positive.
True Negative (TN)
A negative instance correctly predicted as negative.
False Positive (FP)
A negative instance incorrectly predicted as positive.
False Negative (FN)
A positive instance incorrectly predicted as negative.
Accuracy
Proportion of correct predictions: (TP + TN) / (TP + TN + FP + FN).
Precision
Proportion of predicted positives that are actually positive: TP / (TP + FP).
Recall (Sensitivity)
Proportion of actual positives correctly identified: TP / (TP + FN).
F1 score
Harmonic mean of precision and recall: 2 × (Precision × Recall) / (Precision + Recall).
Per-class accuracy
Accuracy for a specific class i: C[i,i] / sum_j C[i,j], where C is the confusion matrix.
Overall accuracy (multi-class)
Sum of diagonal entries divided by total samples: sumi C[i,i] / sum{i,j} C[i,j].
Imbalanced data
A dataset where one class is far more frequent than others, which can bias simple accuracy metrics.
Positive vs Negative class
In binary classification, the 'positive' class is typically the class of interest; 'negative' is the opposite.
Type I error
False Positive: predicting positive when the actual class is negative.
Type II error
False Negative: predicting negative when the actual class is positive.
MAE (Mean Absolute Error)
Mean of absolute differences between predicted and actual values in regression.
MSE (Mean Squared Error)
Mean of squared differences between predicted and actual values.
RMSE (Root Mean Squared Error)
Square root of MSE; same units as the target variable.
Given-N evaluation
A recommender-system evaluation method using a Given set of observed items and a Test set to measure suggestions.
All But One
A variant of Given-N where the Given set contains all but one item; the Test set contains that single item.
Confusion matrix (multi-class)
A matrix C where C[i,j] counts items of true class i predicted as class j; diagonal entries are correct predictions.
iris dataset classes
Three-class example: Setosa, Versicolor, Virginica used for multi-class confusion-matrix demonstrations.
Cross-validation use-case
Used to estimate model performance when data are limited by rotating training/testing across folds.
Final evaluation on test data
The ultimate assessment of model performance on unseen data after development and tuning.