Evaluation of Machine Learning Algorithms

0.0(0)
studied byStudied by 0 people
GameKnowt Play
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
Card Sorting

1/29

flashcard set

Earn XP

Description and Tags

Vocabulary flashcards covering key concepts for evaluating machine learning models, including data splits, validation protocols, confusion matrices, performance metrics, and regression evaluation.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

30 Terms

1
New cards

Training data

The subset of data used to train a machine learning model.

2
New cards

Test data

The subset of data used to evaluate the trained model's performance.

3
New cards

Hold-out validation

A validation method that randomly splits data into training and testing sets (e.g., 2/3 train, 1/3 test).

4
New cards

K-fold cross-validation

A validation technique that partitions data into k folds, trains on k−1 folds, tests on the remaining fold, and averages results over k runs.

5
New cards

Leave-one-out cross-validation

A special case of k-fold where k equals the number of samples; each sample is used once as the test set.

6
New cards

Final evaluation

Assessment of model performance on the designated test data after training and tuning.

7
New cards

Confusion matrix

A table that summarizes classifier performance by counting true positives, true negatives, false positives, and false negatives.

8
New cards

True Positive (TP)

A positive instance correctly predicted as positive.

9
New cards

True Negative (TN)

A negative instance correctly predicted as negative.

10
New cards

False Positive (FP)

A negative instance incorrectly predicted as positive.

11
New cards

False Negative (FN)

A positive instance incorrectly predicted as negative.

12
New cards

Accuracy

Proportion of correct predictions: (TP + TN) / (TP + TN + FP + FN).

13
New cards

Precision

Proportion of predicted positives that are actually positive: TP / (TP + FP).

14
New cards

Recall (Sensitivity)

Proportion of actual positives correctly identified: TP / (TP + FN).

15
New cards

F1 score

Harmonic mean of precision and recall: 2 × (Precision × Recall) / (Precision + Recall).

16
New cards

Per-class accuracy

Accuracy for a specific class i: C[i,i] / sum_j C[i,j], where C is the confusion matrix.

17
New cards

Overall accuracy (multi-class)

Sum of diagonal entries divided by total samples: sumi C[i,i] / sum{i,j} C[i,j].

18
New cards

Imbalanced data

A dataset where one class is far more frequent than others, which can bias simple accuracy metrics.

19
New cards

Positive vs Negative class

In binary classification, the 'positive' class is typically the class of interest; 'negative' is the opposite.

20
New cards

Type I error

False Positive: predicting positive when the actual class is negative.

21
New cards

Type II error

False Negative: predicting negative when the actual class is positive.

22
New cards

MAE (Mean Absolute Error)

Mean of absolute differences between predicted and actual values in regression.

23
New cards

MSE (Mean Squared Error)

Mean of squared differences between predicted and actual values.

24
New cards

RMSE (Root Mean Squared Error)

Square root of MSE; same units as the target variable.

25
New cards

Given-N evaluation

A recommender-system evaluation method using a Given set of observed items and a Test set to measure suggestions.

26
New cards

All But One

A variant of Given-N where the Given set contains all but one item; the Test set contains that single item.

27
New cards

Confusion matrix (multi-class)

A matrix C where C[i,j] counts items of true class i predicted as class j; diagonal entries are correct predictions.

28
New cards

iris dataset classes

Three-class example: Setosa, Versicolor, Virginica used for multi-class confusion-matrix demonstrations.

29
New cards

Cross-validation use-case

Used to estimate model performance when data are limited by rotating training/testing across folds.

30
New cards

Final evaluation on test data

The ultimate assessment of model performance on unseen data after development and tuning.