1/29
Flashcards covering key concepts in evaluating malware detection systems, including accuracy, precision, recall, F-score, ROC curves, and practical considerations.
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
What are common metrics used to evaluate malware detection systems?
Accuracy, Precision, Recall, F-score, and ROC Curves
What is a potential problem when evaluating malware detection algorithms on skewed datasets?
High accuracy can be misleading if the detector simply predicts 'clean' for all files, as malware files are typically a small fraction of all files.
What is a skewed dataset?
A dataset where the proportion of positive and negative examples are not equal (e.g., 95% positive, 5% negative).
Why is percentage classification accuracy not sufficient to evaluate classifier performance on a skewed dataset?
Because a classifier answering false all the time will almost always be correct, despite being useless because it does not identify malware.
Define True Positive in the context of malware detection.
A file is malware and the detector correctly predicts malware.
Define False Positive in the context of malware detection.
A file is clean, but the detector incorrectly predicts malware.
Define False Negative in the context of malware detection.
A file is malware, but the detector incorrectly predicts clean.
Define True Negative in the context of malware detection.
A file is clean, and the detector correctly predicts clean.
What is a Confusion Matrix?
A 2x2 table that arranges the unique combinations of each file’s true label and the classifier's prediction.
Why is a Confusion Matrix useful?
Helps us understand different aspects of the classifier’s performance.
What is the 'positive case' when doing malware analysis?
Identifying malware.
In a multi-class confusion matrix, what must be designated to make sense of the terms?
A particular class as positive.
What is the formula for Accuracy?
Accuracy = (True Positives + True Negatives) / (True Positives + True Negatives + False Positives + False Negatives)
When is Accuracy a useful way to measure system performance?
If the number of positive and negative examples in the test set are equal.
What is the formula for Precision?
Precision = True Positives / (True Positives + False Positives)
What does Precision measure?
Of all the files the classifier predicts to be malware, what fraction actually is malware.
What is the formula for Recall?
Recall = True Positives / (True Positives + False Negatives)
What does Recall measure?
Of all the files that are malware, what fraction did the detector correctly identify?
What is the formula for F1-Score?
F1-score = 2PR / (P + R) where P=Precision and R=Recall
Why is F1-Score useful?
It combines precision and recall into one number to ensure high recall and precision.
What do high precision and high recall indicate?
High Precision – Makes accurate decisions; High Recall – Correctly finds most malware
In practice, why should the number of false-positives be kept very low in a malware detector?
Users will become annoyed and disable the malware detector.
What is the purpose of calibrating classifier sensitivity?
To use a threshold to control when the classifier generates an alert.
What is plotted in a classifier output distribution?
A histogram of malware probability scores for all samples.
What does a Precision-Recall (PR) curve show?
The classifier’s performance in terms of precision and recall as the detection threshold is varied.
What does an ROC curve show?
The classifier’s performance as the detection threshold is varied, plotting the true positive rate (tpr) against the false positive rate (fpr).
What are the desired properties of a good classifier's ROC curve?
Low false-positive rate and a high true-positive rate; the ROC curve should bend towards the top-left corner of the graph.
What is 'base rate' in the context of malware detection?
The percentage of actual malware files encountered by a system.
What is the formula for Expected Precision?
Expected Precision = (True Positive Rate * Base Rate) / (True Positive Rate * Base Rate + False Positive Rate * (1 - Base Rate))
What key questions should be asked when evaluating a new malware detection method?
How do we know if the new system is better than the old system? How much better is the new system? What are the conditions where the system fails?