1/86
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced | Call with Kai |
|---|
No analytics yet
Send a link to your students to track their progress
Term
Definition
Prediction metric
A number used to evaluate how well a machine learning model’s predictions match the true outcomes.
Training metric vs evaluation metric
"A training metric helps optimize or monitor learning during training
Loss function
The objective a model tries to minimize during training; it may or may not be the same as the metric used for evaluation.
Baseline model
A simple reference model used to decide whether a more complex model is actually adding value.
Holdout test set
"A dataset kept separate from training and tuning
Validation set
A dataset used during model development to tune hyperparameters and compare model choices.
Cross-validation
A method that splits the data into multiple train/test folds to estimate model performance more reliably.
Data leakage
"When information from the future or target variable accidentally enters the training data
Confusion matrix
"A table that counts true positives
True positive
A case where the model predicts the positive class and the actual class is positive.
False positive
A case where the model predicts the positive class but the actual class is negative.
True negative
A case where the model predicts the negative class and the actual class is negative.
False negative
A case where the model predicts the negative class but the actual class is positive.
Accuracy
The fraction of predictions that are correct across all classes.
When accuracy is misleading
"Accuracy can be misleading when classes are imbalanced
Precision
"Of the cases predicted positive
Recall
"Of the actual positive cases
Sensitivity
Another name for recall; the true positive rate.
Specificity
"Of the actual negative cases
False positive rate
"Of the actual negative cases
False negative rate
"Of the actual positive cases
F1 score
"The harmonic mean of precision and recall
F-beta score
A version of F1 that weights recall more heavily when beta is greater than 1 and precision more heavily when beta is less than 1.
Balanced accuracy
"The average of recall across classes
Macro average
"An average of a metric calculated separately for each class
Micro average
"An average of a metric calculated from total counts across all classes
Weighted average
An average of class-level metrics weighted by the number of examples in each class.
ROC curve
A plot of true positive rate versus false positive rate across classification thresholds.
ROC-AUC
The area under the ROC curve; measures how well the model ranks positives above negatives across thresholds.
PR curve
A plot of precision versus recall across classification thresholds.
PR-AUC
The area under the precision-recall curve; often more useful than ROC-AUC for rare positive classes.
Classification threshold
The probability cutoff used to convert predicted probabilities into class labels.
Threshold tuning
"Choosing a classification threshold based on business costs
Log loss
A classification metric that penalizes confident wrong probability predictions more heavily than uncertain wrong predictions.
Brier score
A metric for probabilistic classification that measures the mean squared difference between predicted probabilities and actual outcomes.
Calibration
"How closely predicted probabilities match observed frequencies; for example
Calibration curve
A plot comparing predicted probabilities to actual outcome rates across probability bins.
Top-k accuracy
A classification metric where a prediction is counted correct if the true class appears in the model’s top k predicted classes.
Regression metric
A metric used to evaluate predictions of continuous numeric values.
Residual
The difference between the actual value and the predicted value.
Mean Absolute Error
The average absolute difference between predicted and actual values; easier to interpret because it uses the target’s original units.
MAE
Mean Absolute Error; average size of prediction errors without regard to direction.
Mean Squared Error
The average squared difference between predicted and actual values; penalizes large errors more than MAE.
MSE
Mean Squared Error; useful when large errors should be punished strongly.
Root Mean Squared Error
The square root of MSE; penalizes large errors and is expressed in the target’s original units.
RMSE
Root Mean Squared Error; commonly used when larger errors are especially costly.
Median Absolute Error
The median absolute prediction error; more robust to outliers than MAE or RMSE.
R-squared
The fraction of variance in the target explained by the model compared with predicting the mean.
Adjusted R-squared
"A version of R-squared that accounts for the number of predictors
Mean Absolute Percentage Error
The average absolute percentage error between predicted and actual values.
MAPE
Mean Absolute Percentage Error; easy to explain but problematic when actual values are zero or close to zero.
SMAPE
Symmetric Mean Absolute Percentage Error; a percentage-style error metric that tries to reduce some weaknesses of MAPE.
RMSLE
Root Mean Squared Logarithmic Error; useful when relative errors matter more than absolute errors and targets are nonnegative.
Mean Bias Error
The average signed error; shows whether predictions are systematically too high or too low.
Prediction bias
A systematic tendency for a model to overpredict or underpredict.
Quantile loss
"A metric used to evaluate quantile predictions
Pinball loss
Another name for quantile loss; penalizes errors differently depending on the target quantile.
Prediction interval coverage
The percentage of true values that fall inside the model’s predicted interval.
Sharpness
How narrow prediction intervals are; useful only when intervals also have good coverage.
Negative log likelihood
A probabilistic metric that rewards assigning high probability to the observed outcome and penalizes confident wrong predictions.
Time series forecasting metric
"A metric used to evaluate predictions ordered over time
Forecast horizon
"How far into the future a forecast predicts
Backtesting
Evaluating a forecasting model on historical time periods in the same order the model would have seen them in production.
Walk-forward validation
A time series validation method where the training window moves forward through time and predictions are tested on future periods.
Naive forecast
"A simple forecasting baseline
Seasonal naive forecast
"A forecasting baseline that uses the value from the same time in the previous season
MASE
Mean Absolute Scaled Error; compares forecast error to a naive baseline and works across different scales.
WAPE
"Weighted Absolute Percentage Error; total absolute error divided by total actual value
sMAPE drawback
SMAPE can still behave strangely when actual and predicted values are both near zero.
Point forecast
A single predicted value for each observation.
Probabilistic forecast
"A forecast that gives a distribution
CRPS
Continuous Ranked Probability Score; evaluates the full predicted probability distribution for a continuous outcome.
Business metric
"A metric tied directly to real-world value
Offline metric
A metric calculated on historical validation or test data before the model is deployed.
Online metric
A metric measured after deployment using real user or production behavior.
A/B test metric
A metric used to compare model variants in production by randomly assigning users or cases to different versions.
Metric tradeoff
"The idea that improving one metric can worsen another
Choosing a metric
"The best metric depends on the cost of different errors
Cost-sensitive evaluation
Evaluation that weights errors by their real-world cost instead of treating every mistake equally.
Model monitoring metric
"A metric tracked after deployment to detect performance degradation
Data drift
A change in the input data distribution between training and production.
Concept drift
A change in the relationship between input features and the target over time.
Target leakage check
A review to ensure features would have been available at prediction time and do not contain future information.
Confidence interval for a metric
"A range that estimates uncertainty around a metric value
Statistical significance
Evidence that a metric difference is unlikely to be due to random chance alone.