CS 412 Week 5

0.0(0)

Studied by 0 people

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Get a hint

Hint

Suppose we have built a classifier that identifies spam emails from regular emails. On the test set, we have the following confusion matrix:

Actual/Predicted	Spam	Normal
Spam	45	5
Normal	20	930

We treat spam as a negative label and normal as a positive label.

What is the accuracy of this classifier?

0.975

Get a hint

Hint

0.975

Get a hint

Hint

For evaluating the spam classifier from Question 1, which of the following metrics can capture how effectively the classifier identifies spam and can be derived from the confusion matrix?

Please select all that apply.

A) Sensitivity

B) Recall of the negative label

C) Accuracy

D) Specificity

Get a hint

Hint

C) Accuracy

D) Specificity

Card Sorting

1/8

There's no tags or description

Looks like no tags are added yet.

Study Analytics

Name	Mastery	Learn	Test	Matching	Spaced

No study sessions yet.

9 Terms

New cards

Suppose we have built a classifier that identifies spam emails from regular emails. On the test set, we have the following confusion matrix:

Actual/Predicted	Spam	Normal
Spam	45	5
Normal	20	930

We treat spam as a negative label and normal as a positive label.

What is the accuracy of this classifier?

0.975

New cards

For evaluating the spam classifier from Question 1, which of the following metrics can capture how effectively the classifier identifies spam and can be derived from the confusion matrix?

Please select all that apply.

A) Sensitivity

B) Recall of the negative label

C) Accuracy

D) Specificity

C) Accuracy

D) Specificity

New cards

Which of the following methods are suitable for datasets with imbalanced classes when the minority class is not very rare (for example accounts for 10% of the data)?

A) Undersampling from the majority class.

B) Stratified cross-validation.

C) Oversampling from the minority class.

D) Moving the classification threshold towards the minority class.

A) Undersampling from the majority class.

B) Stratified cross-validation.

D) Moving the classification threshold towards the minority class.

New cards

Which of the following statements about the ROC curve are correct?

Please select all that apply.

A) The ROC curve shows the tradeoff between sensitivity and specificity.

B) Different points on the ROC curve are obtained by changing the decision threshold or cutoff.

C) A single ROC curve can be used to represent the results of multi-class classifier.

D) The closer the curve is to the top left corner, the more accurate the classifier is.

A) The ROC curve shows the tradeoff between sensitivity and specificity.

B) Different points on the ROC curve are obtained by changing the decision threshold or cutoff.

D) The closer the curve is to the top left corner, the more accurate the classifier is.

New cards

Which of the following statements about ensemble methods are correct?

Please select all that apply.

A) Using bagging can generally make your model more robust to noise.

B) We can only ensemble classifiers of the same type together.

C) Ensemble models can be applied to classification tasks as well as regression tasks.

D) Ensemble models can help alleviate the class imbalance problem.

A) Using bagging can generally make your model more robust to noise.

C) Ensemble models can be applied to classification tasks as well as regression tasks.

D) Ensemble models can help alleviate the class imbalance problem.

New cards

Which of the following evaluation methods are suitable for small datasets?

Please select all that apply.

A) Holdout evaluation

B) .632 bootstrap

C) Leave-one-out cross-validation

D) Stratified cross-validation

B) .632 bootstrap

C) Leave-one-out cross-validation

New cards

Which of the following statements about ensemble methods are correct?

Please select all that apply.

A) We can ensemble different types of classification models(such as decision trees, naive bayes and SVMs together) to improve accuracy.

B) If the data is noisy, boosting models may overfit to the noise and not generalize well.

C) Ensemble models must be trained iteratively.

D) Random forest classifiers perform worse than decision trees since they may not split at the best attribute at each node

New cards

Which of the following evaluation methods are suitable for large datasets?

Please select all that apply.

A) Leave-one-out cross-validation

B) 10 fold cross-validation

C) .632 bootstrap

D) Holdout evaluation

B) 10 fold cross-validation

D) Holdout evaluation

New cards

Which of the following statements about the ROC curve are correct?

Please select all that apply.

A) The ROC curve displays the tradeoff between the true positive rate and the false positive rate.

B) Different points on the ROC curve correspond to different test tuples.

C) The larger the area under the ROC curve is, the more accurate the classifier is.

D) The diagonal represents a random classifier that determines the labels of the examples by flipping a coin.

A) The ROC curve displays the tradeoff between the true positive rate and the false positive rate.

C) The larger the area under the ROC curve is, the more accurate the classifier is.

D) The diagonal represents a random classifier that determines the labels of the examples by flipping a coin.