Review Questions, Week 5

0.0(0)

Studied by 0 people

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Card Sorting

1/38

There's no tags or description

Looks like no tags are added yet.

Study Analytics

Name	Mastery	Learn	Test	Matching	Spaced

No study sessions yet.

39 Terms

New cards

What is the purpose of the following equation? [image]

This equation calculates the risk of choosing the class C_i for a data point x. For a given point x, we can calculate the risk of choosing each class. Then, we choose the class with the lowest risk.

New cards

What is the role of λ_ik?

This is the enforced loss when the class i is chosen instead of the class k.

New cards

<p>Consider the following loss functions <span>λs and the three possible actions of choosing C1, C2, or rejecting. [image] Calculate the expected risk for each of the possible actions. How do we decide which class to choose? What actions should we take if we assume that P(C<sub>1 </sub>| x) = 0.3 and P(C<sub>2</sub> | x) = 0.7?</span></p>

Consider the following loss functions λs and the three possible actions of choosing C1, C2, or rejecting. [image] Calculate the expected risk for each of the possible actions. How do we decide which class to choose? What actions should we take if we assume that P(C₁| x) = 0.3 and P(C₂ | x) = 0.7?

R(∝₁ | x) = 0 * P(C₁| x) + 10 * P(C₂ | x) = 10 * (1 - P(C₁| x))

R(∝₂ | x) = 5 * P(C₁| x) + 0 * P(C₂ | x) = 5 * P(C₁| x)

R(∝_r | x) = 1

We choose the action with the lowest risk. Assume that P(C₁| x) = 0.3 and P(C₂ | x) = 0.7.

R(∝₁ | x) = 10 * (1 - P(C₁| x)) = 7

R(∝₂ | x) = 5 * (1 - P(C₂| x)) = 1.5

R(∝_r | x) = 1

Hence, we choose the “reject” option.

New cards

We have the following decision-making problem:

A data point is called x, and we have two classes: C1 and C2

P(x|C1)=0.3, P(x|C2)=0.6, P(C1)=0.9, P(C2)=0.1

We have the following loss values:

λ11=λ22=0, λ12=10, λ21=1

If we want to decide based on the expected risk of the class of data point, which class would be the answer?

R(choosing C1 | x) = P(C2 | x) λ12 ≈ P(x | C2) P(C2) λ12 = 0.6 × 0.1 × 10 = 0.6

R(choosing C2 | x) = P(C1 | x) λ21 ≈ P(x | C1) P(C1) λ21 = 0.3 × 0.9 × 1 = 0.27

Choosing C2 has a lower risk.

New cards

We have the following decision-making problem:

A data point is called x, and we have two classes: C1 and C2

P(x|C1)=0.3, P(x|C2)=0.6, P(C1)=0.9, P(C2)=0.1

We have the following loss values: λ11=λ22=0, λ12=λ21=1. If we want to decide based on the expected risk of the chosen class of data point, which class would be the answer?

R(x | C1) = risk of choosing C1 = P(x | C2) P(C2) λ12 = 0.6 × 0.1 × 1 = 0.66

R(x | C2) = risk of choosing C2 = P(x | C1) P(C1) λ21 = 0.3 × 0.9 × 1 = 0.27

Choosing C1 has a lower risk, so we classify x as C1.

New cards

(True or False?) The risk of choosing a class is calculated by summing the reinforced costs of misclassification multiplied by their respective Bayesian probabilities for that chosen class.

True

New cards

(True or False?) The optimal decision rule in the risk analysis framework is to always choose the class with the highest Bayesian probability P(Ci | x), regardless of the reinforced costs of misclassification.

False

New cards

(True or False?) If the reinforced cost of misclassifying data as class C1 when it is actually class C2 is very high, the decision rule might favor choosing C2 even if P(C2 | x) is lower than P(C1 | x).

True

New cards

(True or False?) If the reinforced cost of all misclassifications is equal, then the risk-minimizing decision rule simplifies to choosing the class with the highest Bayesian probability P(Ci | x).

True

New cards

(True or False?) The “reinforced cost λ_ik” acts as a weight that amplifies the impact of certain type of misclassifications on the overall risk.

True

New cards

(True or False?) A risk of zero is achievable in this classification problem if and only if, for a given feature x, the Bayesian probability P(Ci | x) is 1 for some class Ci and 0 for all other classes, and there is no cost associated with correctly classifying that class.

True

New cards

(True or False?) The risk associated with choosing a particular class is always a value between 0 and 1, inclusive, as it is a weighted average of probabilities.

False

New cards

(True or False?) The goal of risk analysis in a classification scenario is to minimize the expected cost of misclassification over all possible feature values.

True

New cards

What are the main steps in classification?

Gathering of data, cleaning, and preprocessing.
Separating the data into test and train sets.
Select a model.
Tune model parameters through a training process.
Evaluate the model by testing it.

New cards

<p>In the image below, red dots are negative cases, and green dots are positives. The circle shows a decision boundary. The data inside the circle are classified as positive cases. [image]</p><p>What is the model?</p><p>What are the model’s parameters that are to be learned through training?</p>

In the image below, red dots are negative cases, and green dots are positives. The circle shows a decision boundary. The data inside the circle are classified as positive cases. [image]

What is the model?

What are the model’s parameters that are to be learned through training?

Circular decision boundary.

Center and radius of the circular.

New cards

Identify and count TP, TN, FP, and FN cases.

Accuracy is the ratio of correctly predicted observations to the total observations. Based on this definition, write the formula for accuracy using TP, TN, FP, and FN parameters.

[image]

<p>Accuracy is the ratio of <u>correctly predicted observations</u> to <u>the total observations</u>. Based on this definition, write the formula for accuracy using TP, TN, FP, and FN parameters.</p><p>[image]</p>

New cards

Under what circumstances is “accuracy” not a good criterion for evaluating a classification model?

If TP and TN cases are not balanced, high accuracy may not reveal the model’s ability to extract TP and TN cases correctly.

New cards

Suppose TP= 3000, TN= 50, FP= 50, and FN= 100. What is the accuracy of the model? What percentage of negative cases is correctly identified?

[image]

New cards

Consider the following matrix.

[image]

a) What is this matrix called? b) What does the sum of the top row of this matric represent? c) What does the sum of the left column of this matrix represent?

d) How many cases were correctly identified by the algorithm? e) How many cases are incorrectly identified? f) What is the accuracy of this model? g) What percentage of positive cases are correctly identified?

a) Confusion matrix

b) Actual negative cases

c) Model’s negative outputs

d) Correctly identified cases = TP + TN = 15 + 950 = 965

e) Incorrectly identified cases = FP + FN = 23 + 12 = 35

f) Accuracy = (950 + 15) / 1000

g) [image]

<p>a) Confusion matrix</p><p>b) Actual negative cases</p><p>c) Model’s negative outputs</p><p>d) Correctly identified cases = TP + TN = 15 + 950 = 965</p><p>e) Incorrectly identified cases = FP + FN = 23 + 12 = 35</p><p>f) Accuracy = (950 + 15) / 1000</p><p>g) [image]</p>

New cards

What is a confusion matrix?

A) A table used to describe the performance of a classification model

B) A graph showing the relationship between features

C) A tool for visualizing data distributions

D) A metric for regression models

A) A table used to describe the performance of a classification model

New cards

In a binary classification problem, how many cells does a confusion matrix have?

A) 2, B) 4, C) 6, D) 8

B) 4

New cards

Which of the following is NOT a component of a confusion matrix?

A) True Positives (TP)

B) False Negatives (FN)

C) True Negatives (TN)

D) False Positives (FP)

E) Partial Positives (PP)

New cards

What does “True Positive (TP)” represent in a confusion matrix?

A) The model correctly predicted the negative class

B) The model incorrectly predicted the positive class

C) The model correctly predicted the positive class

D) The model incorrectly predicted the negative class

C) The model correctly predicted the positive class

New cards

What does “False Negative (FN)” represent?

A) The model predicted a negative outcome, but the actual outcome was positive

B) The model predicted a positive outcome, but the actual outcome was negative

C) The model correctly predicted the negative class

D) The model correctly predicted the positive class

A) The model predicted a negative outcome, but the actual outcome was positive

New cards

What is the term for the sum of True Positives and True Negatives divided by the total number of predictions?

A) Precision

B) Recall

C) Accuracy

D) F1 Score

C) Accuracy

New cards

Gives the following confusion matrix, what is the accuracy?

[image]

A) 85%, B) 90%, C) 80%, D) 75%

A) 85%

New cards

Using the same confusion matrix, what is the precision?

[image]

A) 90.9%

B) 83.3%

C) 87.5%

D) 92.5%

A) 90.9%

(TP/predicted positive cases)

New cards

What is the recall for the above confusion matrix?

[image]

A) 83.3%

B) 90.9%

C) 87.5%

D) 92.5%

A) 83.3%

(sensitivity = TP/actual positive cases)

New cards

What is the F1 score for the above confusion matrix?

[image]

A) 87.0%

B) 66.2%

C) 93.5%

D) 95.3%

A) 87.0%

(The F1 score is the accuracy for cases where TN is not essential, and we replace TN with TP in the accuracy formula.)

New cards

In which scenario is recall more critical than precision?

A) Spam detection

B) Fraud detection

C) Customer churn prediction

D) Sentiment analysis

B) Fraud detection

Recall is more important when missing positive cases (FN) are critical. Also, in medical diagnosis, the goal is to minimize the number of missed positive cases, even if it means accepting some false positives.

New cards

Which metric would you prioritize if false positives are more costly than false negatives?

A) Recall

B) Precision

C) Accuracy

D) F1 Score

B) Precision

Ex: Loan approval, criminal justice

New cards

What is the relationship between precision and recall?

A) They are inversely proportional

B) They are directly proportional

C) They are independent of each other

D) They are complementary but not directly proportional

New cards

What does a high F1 Score indicate?

A) High precision and low recall

B) Low precision and high recall

C) A balance between precision and recall

D) High accuracy but low precision

C) A balance between precision and recall

New cards

How does a confusion matrix help in multi-class classification?

A) It extends to multiple dimensions for each class

B) It becomes a 3D matrix

C) It is split into multiple binary confusion matrices

D) It remains the same as in binary classification

A) It extends to multiple dimensions for each class

New cards

Consider the graph below. a) What is the name of this plot? b) What are graphs A, B, and C? c) Which of the three models performs better?

[image]

a) ROC: Receiver operating characteristic.

b) Graphs B and C represent the behavior of 2 classification models. Line A is a reference line (no-skill).

c) The model farther from A performs better; hence, C is better than B.

New cards

Can we draw an ROC curve for a multi-class classifier?

Yes. For example, the following ROC is drawn for a dataset with 3 classes. We have used a logistic regression classifier in a “one-versus-all” method. This means we used the classifier to distinguish class 1 from the other classes. The results of this classification are recorded, and an ROC curve is drawn. Then, we do the same process for the other 2 classes. Finally, as shown below, we will have 3 ROC curves and can draw all 3 in one frame.

[image]

<p>Yes. For example, the following ROC is drawn for a dataset with 3 classes. We have used a logistic regression classifier in a “one-versus-all” method. This means we used the classifier to distinguish class 1 from the other classes. The results of this classification are recorded, and an ROC curve is drawn. Then, we do the same process for the other 2 classes. Finally, as shown below, we will have 3 ROC curves and can draw all 3 in one frame.</p><p>[image]</p>

New cards

What is the ‘dice’ or ‘F1 score’, and when is it used?

The Dice Score concentrates on TP.

Dice = 2TP / (2TP + FN + FP).

There are cases when the dataset is imbalanced, and the negative cases are much more than the positive ones. If positive cases’ importance is higher than negative cases, we use the F1 score.

New cards

<p>For the following dataset, assume that a circular discriminator will classify red and green dots. To draw an ROC curve, we must find 2 extreme points where the TPR = FPR = 0 and TPR = FPR = 1. What kind of circular discriminant would create each of these points?</p><p>[image]</p>

For the following dataset, assume that a circular discriminator will classify red and green dots. To draw an ROC curve, we must find 2 extreme points where the TPR = FPR = 0 and TPR = FPR = 1. What kind of circular discriminant would create each of these points?

[image]

A circular discriminant, shown below, covers the whole data set and classified all data as positive, causing TPR to be equal to FPR = 1.

[image 1]

As shown below, a discriminant that does not contain any data points will produce TPR = FPR = 0. It has no TPs.

[image 2]

Other discriminators that have some of the green points inside and some of the red points outside would produce TPR and FPR values between 0 and 1. Using these results, we can draw an ROC curve.

<p>A circular discriminant, shown below, covers the whole data set and classified all data as positive, causing TPR to be equal to FPR = 1.</p><p>[image 1]</p><p>As shown below, a discriminant that does not contain any data points will produce TPR = FPR = 0. It has no TPs.</p><p>[image 2]</p><p>Other discriminators that have some of the green points inside and some of the red points outside would produce TPR and FPR values between 0 and 1. Using these results, we can draw an ROC curve.</p>

New cards

Draw the ROC curve for the following classification model. Use the following three thresholds: 𝜃 = 0, 0.5, and 1.

[image]

The following 3 tuples are computed:

[image]