Ch5: Decision trees

0.0(0)
studied byStudied by 0 people
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
Card Sorting

1/23

encourage image

There's no tags or description

Looks like no tags are added yet.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

24 Terms

1
New cards

What is the difference between supervised and unsupervised data mining?

  • Supervised data mining uses known labels to build predictive models (classification/regression)

  • While unsupervised data mining finds patterns or structures in data without predefined labels (e.g., clustering, association rules).

→ results of unsupervised method can be used for input of supervised method

2
New cards

What is a supervised method in data mining?

A learning method that uses a training set of instances (input data) with known target values to learn a function mapping features to outputs.

3
New cards

What are the two main subclasses in supervised learning, and how do they differ?

A: Classification assigns inputs to discrete categories (e.g., purchase or not), while regression predicts continuous numeric outcomes (e.g., customer spend).

4
New cards

In a decision tree, what do nodes, edges, and leaves represent?

A:

  • Nodes: features (attributes)

  • Edges: distinct value of that attribute

  • Leaves: predicted class labels/decision

5
New cards

which functions can decision trees represent?

A: Any boolean function of the input attributes:

  • AND function, OR function, XOR function

6
New cards

How to choose best attribute for decision tree?

We want the resulted groups to be as pure as possible:

  • Homogeneous wrt target variable

  • If every member of a group has the same value for target, then the group is pure

→ min uncertainty + max Information Gain

7
New cards

Why do we use formulas based on purity measure?

When choosing the best attribute, technical problems can be encountered:

  • real data rarely splits into pure groups

  • some attributes are not binary

  • some attributes are continuous

→ addressed by purity measure

8
New cards

Entropy

∑ₖ - pₖ log(pₖ), where pₖ is the proportion of examples in class k.

  • It measures impurity or disorder of a single dataset.

  • High entropy: uniform distribution&flat histogram (more uncertainty)

  • Low entropy: varied distribution & histogram w/ many lows and highs

→ want low!

Other principles exist, like Gini impurity

9
New cards

Information gain

IG(parent, children) = entropy(p) - Σ(cj ) * entropy(cj)

→ want attribute that decreases maximally uncertainty

10
New cards

What is the ID3 algorithm in decision trees?

The ID3 (Iterative Dichotomiser 3) algorithm builds a decision tree by:

  1. Calculating entropy of each attribute in the training data.

  2. Selecting the attribute with the highest information gain (i.e. that reduces uncertainty the most).

  3. Splitting the data on that attribute and creating a decision node.

  4. Recursively repeating the process on the subsets.

11
New cards

Termination criteria: when do we stop?

When there are no more attributes to be selected.

12
New cards

What is the problem with using information gain alone when selecting attributes?

Attributes with many values may have high information gain due to overfitting; they split the data too finely.

Especially for ID3:

  • can overfit

  • no guarantee of globally optimal tree (local optima)

  • can fail to generalize

13
New cards

Solutions to overfitting w/decision trees

  • Limit the length of the tree

  • Limit the number of leaves

  • Stop growing when the split is not statistically significant

  • Grow full tree & then prune subtrees

14
New cards

Differences btw C4.5 and ID3?

C4.5:

  • Is an improvement of ID3

  • Handles continuous & discrete attributes through gain ratio → splits based on ratios

  • Handles training data w/ missing values

  • Prunes the tree after creation

15
New cards

What are alternatives to a single decision tree?

Tree Bagging: Builds several trees using random training samples; predictions are averaged or voted.

Random Forest: Like bagging, but also selects a random subset of features at each split to reduce correlation between trees.

16
New cards

What are pros & cons of using ensembles over a single decision tree?

Advantages

  • Handles noisy training data

  • Interpretability of the model

  • They are fast & robust

Limitations

  • Decision tree will overfit

  • Less useful for continuous functions (regression)

17
New cards

How to evaluate performances (for classification)?

  • Training set: Trains the model using input-output pairs.

  • Validation set: Separate set, tunes hyper-parameters and evaluates model during training.

  • Test set: Assesses final model performance on unseen data.

18
New cards

Define a confusion matrix and its components.

  • True Positive (TP): Correctly predicted positive instances

  • True Negative (TN): Correctly predicted negative instances

  • False Positive (FP): Incorrectly predicted positives (actually negative)

  • False Negative (FN): Incorrectly predicted negatives (actually positive)

19
New cards

Define and explain precision.

A: Precision = TP / (TP + FP).

It measures the proportion of predicted positives that are actually positive.

20
New cards

Define and explain recall.

A: Recall = TP / (TP + FN).

It measures the proportion of actual positives correctly identified by the model.

21
New cards

What is the F1 score and when is it useful?

A: F1 = 2 * (Precision Recall) / (Precision + Recall).

It balances precision and recall, useful when classes are imbalanced.

22
New cards

Q: What is accuracy, and why might it be misleading?

A: Accuracy = (TP + TN) / (TP + TN + FP + FN).

Fraction of instances correctly classified by the model.

It may be misleading in imbalanced datasets because it ignores the distribution of false positives/negatives.

23
New cards

What does the ROC curve represent?

A: A curve showing the trade-off between true positive rate and false positive rate at various classification thresholds.

24
New cards

What is AUC and how is it interpreted?

A: AUC (Area Under the Curve) quantifies the overall ability of the classifier to discriminate between classes; 1.0 = perfect, 0.5 = random.