1/38
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
What is the purpose of the following equation? [image]
This equation calculates the risk of choosing the class Ci for a data point x. For a given point x, we can calculate the risk of choosing each class. Then, we choose the class with the lowest risk.
What is the role of λik?
This is the enforced loss when the class i is chosen instead of the class k.
Consider the following loss functions λs and the three possible actions of choosing C1, C2, or rejecting. [image] Calculate the expected risk for each of the possible actions. How do we decide which class to choose? What actions should we take if we assume that P(C1 | x) = 0.3 and P(C2 | x) = 0.7?
R(∝1 | x) = 0 * P(C1 | x) + 10 * P(C2 | x) = 10 * (1 - P(C1 | x))
R(∝2 | x) = 5 * P(C1 | x) + 0 * P(C2 | x) = 5 * P(C1 | x)
R(∝r | x) = 1
We choose the action with the lowest risk. Assume that P(C1 | x) = 0.3 and P(C2 | x) = 0.7.
R(∝1 | x) = 10 * (1 - P(C1 | x)) = 7
R(∝2 | x) = 5 * (1 - P(C2 | x)) = 1.5
R(∝r | x) = 1
Hence, we choose the “reject” option.
We have the following decision-making problem:
A data point is called x, and we have two classes: C1 and C2
P(x|C1)=0.3, P(x|C2)=0.6, P(C1)=0.9, P(C2)=0.1
We have the following loss values:
λ11=λ22=0, λ12=10, λ21=1
If we want to decide based on the expected risk of the class of data point, which class would be the answer?
R(choosing C1 | x) = P(C2 | x) λ12 ≈ P(x | C2) P(C2) λ12 = 0.6 × 0.1 × 10 = 0.6
R(choosing C2 | x) = P(C1 | x) λ21 ≈ P(x | C1) P(C1) λ21 = 0.3 × 0.9 × 1 = 0.27
Choosing C2 has a lower risk.
We have the following decision-making problem:
A data point is called x, and we have two classes: C1 and C2
P(x|C1)=0.3, P(x|C2)=0.6, P(C1)=0.9, P(C2)=0.1
We have the following loss values: λ11=λ22=0, λ12=λ21=1. If we want to decide based on the expected risk of the chosen class of data point, which class would be the answer?
R(x | C1) = risk of choosing C1 = P(x | C2) P(C2) λ12 = 0.6 × 0.1 × 1 = 0.66
R(x | C2) = risk of choosing C2 = P(x | C1) P(C1) λ21 = 0.3 × 0.9 × 1 = 0.27
Choosing C1 has a lower risk, so we classify x as C1.
(True or False?) The risk of choosing a class is calculated by summing the reinforced costs of misclassification multiplied by their respective Bayesian probabilities for that chosen class.
True
(True or False?) The optimal decision rule in the risk analysis framework is to always choose the class with the highest Bayesian probability P(Ci | x), regardless of the reinforced costs of misclassification.
False
(True or False?) If the reinforced cost of misclassifying data as class C1 when it is actually class C2 is very high, the decision rule might favor choosing C2 even if P(C2 | x) is lower than P(C1 | x).
True
(True or False?) If the reinforced cost of all misclassifications is equal, then the risk-minimizing decision rule simplifies to choosing the class with the highest Bayesian probability P(Ci | x).
True
(True or False?) The “reinforced cost λik” acts as a weight that amplifies the impact of certain type of misclassifications on the overall risk.
True
(True or False?) A risk of zero is achievable in this classification problem if and only if, for a given feature x, the Bayesian probability P(Ci | x) is 1 for some class Ci and 0 for all other classes, and there is no cost associated with correctly classifying that class.
True
(True or False?) The risk associated with choosing a particular class is always a value between 0 and 1, inclusive, as it is a weighted average of probabilities.
False
(True or False?) The goal of risk analysis in a classification scenario is to minimize the expected cost of misclassification over all possible feature values.
True
What are the main steps in classification?
Gathering of data, cleaning, and preprocessing.
Separating the data into test and train sets.
Select a model.
Tune model parameters through a training process.
Evaluate the model by testing it.
In the image below, red dots are negative cases, and green dots are positives. The circle shows a decision boundary. The data inside the circle are classified as positive cases. [image]
What is the model?
What are the model’s parameters that are to be learned through training?
Circular decision boundary.
Center and radius of the circular.
Identify and count TP, TN, FP, and FN cases.
Accuracy is the ratio of correctly predicted observations to the total observations. Based on this definition, write the formula for accuracy using TP, TN, FP, and FN parameters.
[image]
Under what circumstances is “accuracy” not a good criterion for evaluating a classification model?
If TP and TN cases are not balanced, high accuracy may not reveal the model’s ability to extract TP and TN cases correctly.
Suppose TP= 3000, TN= 50, FP= 50, and FN= 100. What is the accuracy of the model? What percentage of negative cases is correctly identified?
[image]
Consider the following matrix.
[image]
a) What is this matrix called? b) What does the sum of the top row of this matric represent? c) What does the sum of the left column of this matrix represent?
d) How many cases were correctly identified by the algorithm? e) How many cases are incorrectly identified? f) What is the accuracy of this model? g) What percentage of positive cases are correctly identified?
a) Confusion matrix
b) Actual negative cases
c) Model’s negative outputs
d) Correctly identified cases = TP + TN = 15 + 950 = 965
e) Incorrectly identified cases = FP + FN = 23 + 12 = 35
f) Accuracy = (950 + 15) / 1000
g) [image]
What is a confusion matrix?
A) A table used to describe the performance of a classification model
B) A graph showing the relationship between features
C) A tool for visualizing data distributions
D) A metric for regression models
A) A table used to describe the performance of a classification model
In a binary classification problem, how many cells does a confusion matrix have?
A) 2, B) 4, C) 6, D) 8
B) 4
Which of the following is NOT a component of a confusion matrix?
A) True Positives (TP)
B) False Negatives (FN)
C) True Negatives (TN)
D) False Positives (FP)
E) Partial Positives (PP)
E) Partial Positives (PP)
What does “True Positive (TP)” represent in a confusion matrix?
A) The model correctly predicted the negative class
B) The model incorrectly predicted the positive class
C) The model correctly predicted the positive class
D) The model incorrectly predicted the negative class
C) The model correctly predicted the positive class
What does “False Negative (FN)” represent?
A) The model predicted a negative outcome, but the actual outcome was positive
B) The model predicted a positive outcome, but the actual outcome was negative
C) The model correctly predicted the negative class
D) The model correctly predicted the positive class
A) The model predicted a negative outcome, but the actual outcome was positive
What is the term for the sum of True Positives and True Negatives divided by the total number of predictions?
A) Precision
B) Recall
C) Accuracy
D) F1 Score
C) Accuracy
Gives the following confusion matrix, what is the accuracy?
[image]
A) 85%, B) 90%, C) 80%, D) 75%
A) 85%
Using the same confusion matrix, what is the precision?
[image]
A) 90.9%
B) 83.3%
C) 87.5%
D) 92.5%
A) 90.9%
(TP/predicted positive cases)
What is the recall for the above confusion matrix?
[image]
A) 83.3%
B) 90.9%
C) 87.5%
D) 92.5%
A) 83.3%
(sensitivity = TP/actual positive cases)
What is the F1 score for the above confusion matrix?
[image]
A) 87.0%
B) 66.2%
C) 93.5%
D) 95.3%
A) 87.0%
(The F1 score is the accuracy for cases where TN is not essential, and we replace TN with TP in the accuracy formula.)
In which scenario is recall more critical than precision?
A) Spam detection
B) Fraud detection
C) Customer churn prediction
D) Sentiment analysis
B) Fraud detection
Recall is more important when missing positive cases (FN) are critical. Also, in medical diagnosis, the goal is to minimize the number of missed positive cases, even if it means accepting some false positives.
Which metric would you prioritize if false positives are more costly than false negatives?
A) Recall
B) Precision
C) Accuracy
D) F1 Score
B) Precision
Ex: Loan approval, criminal justice
What is the relationship between precision and recall?
A) They are inversely proportional
B) They are directly proportional
C) They are independent of each other
D) They are complementary but not directly proportional
D) They are complementary but not directly proportional
What does a high F1 Score indicate?
A) High precision and low recall
B) Low precision and high recall
C) A balance between precision and recall
D) High accuracy but low precision
C) A balance between precision and recall
How does a confusion matrix help in multi-class classification?
A) It extends to multiple dimensions for each class
B) It becomes a 3D matrix
C) It is split into multiple binary confusion matrices
D) It remains the same as in binary classification
A) It extends to multiple dimensions for each class
Consider the graph below. a) What is the name of this plot? b) What are graphs A, B, and C? c) Which of the three models performs better?
[image]
a) ROC: Receiver operating characteristic.
b) Graphs B and C represent the behavior of 2 classification models. Line A is a reference line (no-skill).
c) The model farther from A performs better; hence, C is better than B.
Can we draw an ROC curve for a multi-class classifier?
Yes. For example, the following ROC is drawn for a dataset with 3 classes. We have used a logistic regression classifier in a “one-versus-all” method. This means we used the classifier to distinguish class 1 from the other classes. The results of this classification are recorded, and an ROC curve is drawn. Then, we do the same process for the other 2 classes. Finally, as shown below, we will have 3 ROC curves and can draw all 3 in one frame.
[image]
What is the ‘dice’ or ‘F1 score’, and when is it used?
The Dice Score concentrates on TP.
Dice = 2TP / (2TP + FN + FP).
There are cases when the dataset is imbalanced, and the negative cases are much more than the positive ones. If positive cases’ importance is higher than negative cases, we use the F1 score.
For the following dataset, assume that a circular discriminator will classify red and green dots. To draw an ROC curve, we must find 2 extreme points where the TPR = FPR = 0 and TPR = FPR = 1. What kind of circular discriminant would create each of these points?
[image]
A circular discriminant, shown below, covers the whole data set and classified all data as positive, causing TPR to be equal to FPR = 1.
[image 1]
As shown below, a discriminant that does not contain any data points will produce TPR = FPR = 0. It has no TPs.
[image 2]
Other discriminators that have some of the green points inside and some of the red points outside would produce TPR and FPR values between 0 and 1. Using these results, we can draw an ROC curve.
Draw the ROC curve for the following classification model. Use the following three thresholds: 𝜃 = 0, 0.5, and 1.
[image]
The following 3 tuples are computed:
[image]