Classification Quality Metrics

The confusion matrix is used to evaluate the performance of a classification model.
It compares the actual labels with the predicted labels.
The matrix is structured as follows:
- True Positive (TP): People who have COVID and are tested positive.
- True Negative (TN): People who are healthy and are tested negative.
- False Positive (FP): People who are healthy but are tested positive.
- False Negative (FN): People who have COVID but are tested negative.
For different problems, different metrics might be more important.

False positive rate is defined as the percentage of healthy people who are tested positive.
$FPR = \frac{FP}{FP + TN}$

True positive rate is defined as the percentage of people who have COVID and are correctly identified.
$TPR = \frac{TP}{TP + FN}$

Plots a graph with FPR on the x-axis and TPR on the y-axis.
Assumes the classifier assigns a probability $p<em>1$ that a given data point belongs to class 0 and a probability $p</em>2$ that it belongs to class 1, such that $p<em>1 + p</em>2 = 1$ .
AUC = 0.5 or less is equivalent to random guessing. A high AUC is desirable.

Create a table showing:
- The data point's actual class.
- The data point's predicted class.
- Probability of the data point being class 0.
- Probability of the data point being class 1.
Sort the table in descending order of the last column (predicted probability = 1).
Vary the threshold that determines whether a data point is predicted to be positive or not.
- For example, if the threshold = 0.9, then data points are predicted to belong to class 0.
- This prediction has a TPR and an FPR.
Example:
- Threshold = 0.9, TPR = 0, FPR = 0
- Threshold = 0.8, TPR = 0.2, FPR = 0
- Threshold = 0.7, TPR = 0.4, FPR = 0
- Threshold = 0.62, TPR = 0.6, FPR = 0
- Threshold = 0.5, TPR = 0.6, FPR = 0.333
- Threshold = 0.3, TPR = 0.8, FPR = 0.333
- Threshold = 0.25, TPR = 0.8, FPR = 0.667
- Threshold = 0.2, TPR = 0.8, FPR = 1
- Threshold = 0.1, TPR = 1, FPR = 1

Recall: What percentage of people who have COVID are correctly identified?
- $TPR = \frac{TP}{TP + FN}$
Precision: What percentage of people who are tested positive have COVID?
- $PPV = \frac{TP}{TP + FP}$