CS 412 Week 5

0.0(0)
studied byStudied by 0 people
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
Get a hint
Hint

Suppose we have built a classifier that identifies spam emails from regular emails. On the test set, we have the following confusion matrix:

Actual/Predicted

Spam

Normal

Spam

45

5

Normal

20

930

We treat spam as a negative label and normal as a positive label.

What is the accuracy of this classifier?

0.975

Get a hint
Hint

0.975

Get a hint
Hint

For evaluating the spam classifier from Question 1, which of the following metrics can capture how effectively the classifier identifies spam and can be derived from the confusion matrix?

Please select all that apply.

A) Sensitivity

B) Recall of the negative label

C) Accuracy

D) Specificity

Get a hint
Hint

C) Accuracy

D) Specificity

Card Sorting

1/8

encourage image

There's no tags or description

Looks like no tags are added yet.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

9 Terms

1
New cards

Suppose we have built a classifier that identifies spam emails from regular emails. On the test set, we have the following confusion matrix:

Actual/Predicted

Spam

Normal

Spam

45

5

Normal

20

930

We treat spam as a negative label and normal as a positive label.

What is the accuracy of this classifier?

0.975

0.975

2
New cards

For evaluating the spam classifier from Question 1, which of the following metrics can capture how effectively the classifier identifies spam and can be derived from the confusion matrix?

Please select all that apply.

A) Sensitivity

B) Recall of the negative label

C) Accuracy

D) Specificity

C) Accuracy

D) Specificity

3
New cards

Which of the following methods are suitable for datasets with imbalanced classes when the minority class is not very rare (for example accounts for 10% of the data)?

A) Undersampling from the majority class.

B) Stratified cross-validation.

C) Oversampling from the minority class.

D) Moving the classification threshold towards the minority class.

A) Undersampling from the majority class.

B) Stratified cross-validation.

D) Moving the classification threshold towards the minority class.

4
New cards

Which of the following statements about the ROC curve are correct?

Please select all that apply.

A) The ROC curve shows the tradeoff between sensitivity and specificity.

B) Different points on the ROC curve are obtained by changing the decision threshold or cutoff.

C) A single ROC curve can be used to represent the results of multi-class classifier.

D) The closer the curve is to the top left corner, the more accurate the classifier is.

A) The ROC curve shows the tradeoff between sensitivity and specificity.

B) Different points on the ROC curve are obtained by changing the decision threshold or cutoff.

D) The closer the curve is to the top left corner, the more accurate the classifier is.

5
New cards

Which of the following statements about ensemble methods are correct?

Please select all that apply.

A) Using bagging can generally make your model more robust to noise.

B) We can only ensemble classifiers of the same type together.

C) Ensemble models can be applied to classification tasks as well as regression tasks.

D) Ensemble models can help alleviate the class imbalance problem.

A) Using bagging can generally make your model more robust to noise.

C) Ensemble models can be applied to classification tasks as well as regression tasks.

D) Ensemble models can help alleviate the class imbalance problem.

6
New cards

Which of the following evaluation methods are suitable for small datasets?

Please select all that apply.

A) Holdout evaluation

B) .632 bootstrap

C) Leave-one-out cross-validation

D) Stratified cross-validation

B) .632 bootstrap

C) Leave-one-out cross-validation

7
New cards

Which of the following statements about ensemble methods are correct?

Please select all that apply.

A) We can ensemble different types of classification models(such as decision trees, naive bayes and SVMs together) to improve accuracy.

B) If the data is noisy, boosting models may overfit to the noise and not generalize well.

C) Ensemble models must be trained iteratively.

D) Random forest classifiers perform worse than decision trees since they may not split at the best attribute at each node

8
New cards

Which of the following evaluation methods are suitable for large datasets?

Please select all that apply.

A) Leave-one-out cross-validation

B) 10 fold cross-validation

C) .632 bootstrap

D) Holdout evaluation

B) 10 fold cross-validation

D) Holdout evaluation

9
New cards

Which of the following statements about the ROC curve are correct?

Please select all that apply.

A) The ROC curve displays the tradeoff between the true positive rate and the false positive rate.

B) Different points on the ROC curve correspond to different test tuples.

C) The larger the area under the ROC curve is, the more accurate the classifier is.

D) The diagonal represents a random classifier that determines the labels of the examples by flipping a coin.

A) The ROC curve displays the tradeoff between the true positive rate and the false positive rate.

C) The larger the area under the ROC curve is, the more accurate the classifier is.

D) The diagonal represents a random classifier that determines the labels of the examples by flipping a coin.