1/9
Flashcards covering key concepts related to classification, decision trees, and their applications in business data mining.
Name | Mastery | Learn | Test | Matching | Spaced | Call with Kai |
|---|
No study sessions yet.
Classification
A supervised method where classes (categories) are pre-defined based on labels, requiring labeled data to train a model.
Decision Tree
A flowchart-like tree structure where each internal node represents an attribute, each branch represents a decision rule, and each leaf node represents an outcome.
Overfitting
A modeling error that occurs when a statistical model describes random error or noise instead of the underlying relationship, leading to poor performance on new data.
Pruning
The process of reducing the size of a decision tree by removing sections that provide little predictive power to improve model stability.
Training Set
A labeled data set used to train a classification model, allowing the algorithm to learn the patterns associated with each category.
True Positive Rate (Sensitivity)
The ratio of correctly predicted positive observations to all actual positives, indicating the model's ability to identify positive instances.
Confusion Matrix
A table used to evaluate the performance of a classification model by displaying true positive, false positive, true negative, and false negative values.
ROC Curve
A graphical representation of a classifier's performance by plotting the true positive rate against the false positive rate at various thresholds.
C4.5
An extension of the ID3 algorithm used for generating decision trees, capable of handling missing values and both categorical and continuous data.
Cross-Validation
A technique for assessing how the results of a statistical analysis will generalize to an independent data set, involving partitioning the data into k subsets.