Decision Trees

0.0(0)
studied byStudied by 4 people
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
Card Sorting

1/14

encourage image

There's no tags or description

Looks like no tags are added yet.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

15 Terms

1
New cards

DT learning

Method for learning discrete-valued target functions in which the function to be learned is represented by a decision tree

Can have continuous or discrete features

2
New cards

Continuous features

  • Check if the feature is greater than or less than some threshold

  • The decision boundary is made up of axis-aligned planes

3
New cards

Internal nodes

test a feature

4
New cards

Branching

Determined by the feature value

5
New cards

Leaf nodes

outputs, predictions

6
New cards

Classification Tree

Discrete output

7
New cards

Goal with decision trees

We are guaranteed goof generalization if we are able to find a small decision tree that explains the data well

8
New cards

Choosing a good split

We could find the entropy of our training samples to generate our tree
Entropy formula:

<p>We could find the entropy of our training samples to generate our tree<br>Entropy formula:<br><br></p>
9
New cards

Entropy Rule of Thumb

High Entropy

  • Uniform like distribution over many outcomes

  • Flat histogram

  • Values sampled from it as less predictable

Low Energy

  • Distribution is concentrated on only a few outcomes

  • Histogram is concentrated in a few areas

  • Values samples from it are more predictable

If all Negative, entropy = 0

If all positive, entropy = 0

If 50/50 positive and negative, entropy = 1

10
New cards

Information Gain

  • Difference of the Entropy of our attribute, subtracted by the entropy of the attribute given the classification, in proportion to that attribute

  • The higher the IG for a particular category, the higher the precedence is given towards it

<ul><li><p>Difference of the Entropy of our attribute, subtracted by the entropy of the attribute given the classification, in proportion to that attribute</p></li><li><p>The higher the IG for a particular category, the higher the precedence is given towards it</p><img src="https://knowt-user-attachments.s3.amazonaws.com/c7e701a3-ca37-45b8-9b38-a09a78c4fcd9.png" data-width="100%" data-align="center"><p></p></li></ul><p></p>
11
New cards
<p>What is the Information Gain for this split? (Split A)</p><p></p>

What is the Information Gain for this split? (Split A)

Steps:

  1. H(Y) = -(5/7)log2(5/7)-(2/7)log2(2/7)

  2. H(Orange|Left) = -(2/2)log2(2/2)-(0/2)log2(0/2)

  3. H(Orange|Right) = -(3/5)log2(3/5)-(2/5)log2(2/5)

  4. I.G = H(Y) - ((2/7)H(Orange|Left) + (5/7)H(Orange|Right))

    Explanation: H(Y) is entropy of the split, H(Orange|Left) is the entropy of orange on the left, H(Orange|Right) is the entropy of orange on the right, I.G is the difference between the entropy of the split subtracted by the sum of the entropies of the regions proportional to how many elements in their domain compared to the whole.

12
New cards

Decision tree construction algorithm

13
New cards

What makes a good tree?

  • Not too small: Need to handle subtle distinctions in data

  • Not too big

    • Computational efficiency

    • Avoid Overfitting

  • We desire small trees with informative nodes near the root

14
New cards

Problems

  • You have exponentially less data in lower levels

  • Bigger the tree, the higher the overfitting

  • Greedy algorithm is not the most optimal solution

15
New cards

What do we consider overfitting