1/25
Flashcards covering key vocabulary and concepts related to data mining and model evaluation.
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
What are the two main types of data mining tasks?
Descriptive tasks and predictive tasks.
What do predictive data mining tasks involve?
They make predictions about unknown future events based on known past information.
What is Descriptive data mining tasks involve?
a descriptive task means finding patterns that describe or summarize the data without making predictions.
What is feature construction?
Creating new features from existing ones to improve model performance.
How is accuracy calculated in a classification model?
Accuracy = (TP + TN) / (TP + TN + FP + FN)
Why is accuracy alone often insufficient for healthcare applications?
Because it doesn't distinguish between types of errors, which can have different implications in healthcare contexts.
What is a confusion matrix?
A table used to evaluate classification model performance by comparing predicted and actual results.
What is a True Positive (TP)?
Cases predicted as positive that are indeed positive.
What is a True Negative (TN)?
Cases predicted as negative that are indeed negative.
What is a False Positive (FP)?
Cases predicted as positive but actually negative;
What is a False Negative (FN) ?
Cases predicted as negative but actually positive;
What is sensitivity ?
The likelihood that a diseased patient has a positive test; TP/(TP+FN)
What characterizes a desirable diagnostic test?
It has high sensitivity (TPR) and high specificity (TNR).
What is specificity ?
True-Negative Rate (TNR): likelihood that a healthy patient has a negative test
Why are thresholds needed in most prediction models?
Because most tests produce continuous output results that need to be interpreted as positive/negative.
How does changing the threshold affect sensitivity and specificity?
Lowering the threshold typically increases sensitivity (catches more true positives) but decreases specificity (more false positives)
Raising the threshold typically increases specificity (fewer false positives) but decreases sensitivity (more false negatives)
When would you prioritize sensitivity over specificity?
When the disease is serious and life-saving therapy is available (minimizing false negatives)
When would you prioritize specificity over sensitivity?
When the disease is not serious and the therapy has risks (minimizing false positives)
What are "black box" models?
Models that are not easily interpretable by humans, such as Artificial Neural Networks and Support Vector Machines.
What are "white box" models?
Models that provide clear reasoning for predictions, such as Decision Trees.
Give me examples of a predictive data mining
classification (svm) (regrestion)
give me a example of descriptive data mining
Clustering
is TPR (True Positive Rate) Specificity or sensitivity
Sensitivity
is TNR (True Negative Rate) Specificity or sensitivity
Specificity
How do you calculate Sensitivity?
TPR = TP/TP+FN
How do you calculate Specificity?
TNR = TN/TN+FP