ML Metrics

0.0(0)

Studied by 0 people

Call Kai

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Knowt Play

Card Sorting

1/86

There's no tags or description

Looks like no tags are added yet.

Last updated 2:47 PM on 5/16/26

Name	Mastery	Learn	Test	Matching	Spaced	Call with Kai

No analytics yet

Send a link to your students to track their progress

87 Terms

New cards

Term

Definition

New cards

Prediction metric

A number used to evaluate how well a machine learning model’s predictions match the true outcomes.

New cards

Training metric vs evaluation metric

"A training metric helps optimize or monitor learning during training

New cards

Loss function

The objective a model tries to minimize during training; it may or may not be the same as the metric used for evaluation.

New cards

Baseline model

A simple reference model used to decide whether a more complex model is actually adding value.

New cards

Holdout test set

"A dataset kept separate from training and tuning

New cards

Validation set

A dataset used during model development to tune hyperparameters and compare model choices.

New cards

Cross-validation

A method that splits the data into multiple train/test folds to estimate model performance more reliably.

New cards

Data leakage

"When information from the future or target variable accidentally enters the training data

New cards

Confusion matrix

"A table that counts true positives

New cards

True positive

A case where the model predicts the positive class and the actual class is positive.

New cards

False positive

A case where the model predicts the positive class but the actual class is negative.

New cards

True negative

A case where the model predicts the negative class and the actual class is negative.

New cards

False negative

A case where the model predicts the negative class but the actual class is positive.

New cards

Accuracy

The fraction of predictions that are correct across all classes.

New cards

When accuracy is misleading

"Accuracy can be misleading when classes are imbalanced

New cards

Precision

"Of the cases predicted positive

New cards

Recall

"Of the actual positive cases

New cards

Sensitivity

Another name for recall; the true positive rate.

New cards

Specificity

"Of the actual negative cases

New cards

False positive rate

"Of the actual negative cases

New cards

False negative rate

"Of the actual positive cases

New cards

F1 score

"The harmonic mean of precision and recall

New cards

F-beta score

A version of F1 that weights recall more heavily when beta is greater than 1 and precision more heavily when beta is less than 1.

New cards

Balanced accuracy

"The average of recall across classes

New cards

Macro average

"An average of a metric calculated separately for each class

New cards

Micro average

"An average of a metric calculated from total counts across all classes

New cards

Weighted average

An average of class-level metrics weighted by the number of examples in each class.

New cards

ROC curve

A plot of true positive rate versus false positive rate across classification thresholds.

New cards

ROC-AUC

The area under the ROC curve; measures how well the model ranks positives above negatives across thresholds.

New cards

PR curve

A plot of precision versus recall across classification thresholds.

New cards

PR-AUC

The area under the precision-recall curve; often more useful than ROC-AUC for rare positive classes.

New cards

Classification threshold

The probability cutoff used to convert predicted probabilities into class labels.

New cards

Threshold tuning

"Choosing a classification threshold based on business costs

New cards

Log loss

A classification metric that penalizes confident wrong probability predictions more heavily than uncertain wrong predictions.

New cards

Brier score

A metric for probabilistic classification that measures the mean squared difference between predicted probabilities and actual outcomes.

New cards

Calibration

"How closely predicted probabilities match observed frequencies; for example

New cards

Calibration curve

A plot comparing predicted probabilities to actual outcome rates across probability bins.

New cards

Top-k accuracy

A classification metric where a prediction is counted correct if the true class appears in the model’s top k predicted classes.

New cards

Regression metric

A metric used to evaluate predictions of continuous numeric values.

New cards

Residual

The difference between the actual value and the predicted value.

New cards

Mean Absolute Error

The average absolute difference between predicted and actual values; easier to interpret because it uses the target’s original units.

New cards

MAE

Mean Absolute Error; average size of prediction errors without regard to direction.

New cards

Mean Squared Error

The average squared difference between predicted and actual values; penalizes large errors more than MAE.

New cards

MSE

Mean Squared Error; useful when large errors should be punished strongly.

New cards

Root Mean Squared Error

The square root of MSE; penalizes large errors and is expressed in the target’s original units.

New cards

RMSE

Root Mean Squared Error; commonly used when larger errors are especially costly.

New cards

Median Absolute Error

The median absolute prediction error; more robust to outliers than MAE or RMSE.

New cards

R-squared

The fraction of variance in the target explained by the model compared with predicting the mean.

New cards

Adjusted R-squared

"A version of R-squared that accounts for the number of predictors

New cards

Mean Absolute Percentage Error

The average absolute percentage error between predicted and actual values.

New cards

MAPE

Mean Absolute Percentage Error; easy to explain but problematic when actual values are zero or close to zero.

New cards

SMAPE

Symmetric Mean Absolute Percentage Error; a percentage-style error metric that tries to reduce some weaknesses of MAPE.

New cards

RMSLE

Root Mean Squared Logarithmic Error; useful when relative errors matter more than absolute errors and targets are nonnegative.

New cards

Mean Bias Error

The average signed error; shows whether predictions are systematically too high or too low.

New cards

Prediction bias

A systematic tendency for a model to overpredict or underpredict.

New cards

Quantile loss

"A metric used to evaluate quantile predictions

New cards

Pinball loss

Another name for quantile loss; penalizes errors differently depending on the target quantile.

New cards

Prediction interval coverage

The percentage of true values that fall inside the model’s predicted interval.

New cards

Sharpness

How narrow prediction intervals are; useful only when intervals also have good coverage.

New cards

Negative log likelihood

A probabilistic metric that rewards assigning high probability to the observed outcome and penalizes confident wrong predictions.

New cards

Time series forecasting metric

"A metric used to evaluate predictions ordered over time

New cards

Forecast horizon

"How far into the future a forecast predicts

New cards

Backtesting

Evaluating a forecasting model on historical time periods in the same order the model would have seen them in production.

New cards

Walk-forward validation

A time series validation method where the training window moves forward through time and predictions are tested on future periods.

New cards

Naive forecast

"A simple forecasting baseline

New cards

Seasonal naive forecast

"A forecasting baseline that uses the value from the same time in the previous season

New cards

MASE

Mean Absolute Scaled Error; compares forecast error to a naive baseline and works across different scales.

New cards

WAPE

"Weighted Absolute Percentage Error; total absolute error divided by total actual value

New cards

sMAPE drawback

SMAPE can still behave strangely when actual and predicted values are both near zero.

New cards

Point forecast

A single predicted value for each observation.

New cards

Probabilistic forecast

"A forecast that gives a distribution

New cards

CRPS

Continuous Ranked Probability Score; evaluates the full predicted probability distribution for a continuous outcome.

New cards

Business metric

"A metric tied directly to real-world value

New cards

Offline metric

A metric calculated on historical validation or test data before the model is deployed.

New cards

Online metric

A metric measured after deployment using real user or production behavior.

New cards

A/B test metric

A metric used to compare model variants in production by randomly assigning users or cases to different versions.

New cards

Metric tradeoff

"The idea that improving one metric can worsen another

New cards

Choosing a metric

"The best metric depends on the cost of different errors

New cards

Cost-sensitive evaluation

Evaluation that weights errors by their real-world cost instead of treating every mistake equally.

New cards

Model monitoring metric

"A metric tracked after deployment to detect performance degradation

New cards

Data drift

A change in the input data distribution between training and production.

New cards

Concept drift

A change in the relationship between input features and the target over time.

New cards

Target leakage check

A review to ensure features would have been available at prediction time and do not contain future information.

New cards

Confidence interval for a metric

"A range that estimates uncertainty around a metric value

New cards

Statistical significance

Evidence that a metric difference is unlikely to be due to random chance alone.

New cards