COMP3308/3608 Lecture 7: Decision Trees

0.0(0)

Studied by 0 people

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Card Sorting

1/26

Earn XP

Description and Tags

Flashcards about decision trees, entropy, information gain, overfitting, and pruning.

Study Analytics

Name	Mastery	Learn	Test	Matching	Spaced

No study sessions yet.

27 Terms

New cards

What are Decision Trees (DTs)?

Supervised classifiers that are a popular ML/DM technique.

New cards

What is the DT representation?

Each non-leaf node tests the values of an attribute; each branch corresponds to an attribute value; each leaf node assigns a class.

New cards

How do you predict the class for a new example using a Decision Tree?

Start from the root and test the values of the attributes until you reach a leaf node; return the class of the leaf node.

New cards

When constructing Decision Trees (ID3 algorithm), when do you stop?

All examples have the same class; make a leaf node corresponding to this class.

New cards

How can Decision Trees be expressed?

Mutually exclusive rules, where each rule is a conjunction of tests, connected with disjunction.

New cards

What is the main idea behind finding the best attribute for a decision tree?

Recursively choose the best attribute as the root of the sub-tree.

New cards

What does Entropy measure?

A measure of the homogeneity (purity) of a set of examples with respect to their class.

New cards

How does the amount of entropy relate to the purity of a set?

The smaller the entropy, the greater the purity of the set.

New cards

How does entropy apply to information theory?

Sender sends information to Receiver about the outcome of an event.

New cards

How do you calculate the entropy for a coin toss with equal probability?

Calculate H(coin_toss)= I(1/2, 1/2)= -(1/2)log(1/2) -(1/2)log(1/2) = 1 bit.

New cards

Describe high entropy.

The values of X are all over the place; flat historgram.

New cards

Describe low entropy

The values of X are more predictable; histogram has many lows and one or two highs.

New cards

What does Information Gain measure?

The reduction in entropy caused by using this attribute to partition the set of training examples.

New cards

What is the best attribute?

The best attribute is the one with the highest information gain.

New cards

What is the information gain of an attribute A?

The reduction in entropy caused by the partitioning of the set of examples S using A.

New cards

What happens if you use ID code as one of the attributes?

Split the training data into 1-example subsets.

New cards

What is Gain Ratio?

A modification of the Gain that reduces its bias towards highly branching attributes.

New cards

How do you stop pruning?

Separating available data into training set, validation set, and test data.

New cards

How do you deal with Numeric Attributes?

Convert numerical attributes into nominal ones through a discretization procedure.

New cards

How do you handle attributes with different costs?

Avoid overfitting and learn a DT using low-cost attributes.

New cards

What are some additional topics related to Decision Trees?

Missing attribute values, different costs, highly-branching attributes, and numeric attributes.

New cards

What is Overfitting?

The error on training data is very small, but the error on test data is high.

New cards

What are two main approaches to avoid overfitting in Decision Trees?

Stop growing the tree earlier or fully grow the tree and then prune it.

New cards

Why use validation set?

Random errors and coincidental regularities within the training set.

New cards

Describe Tree Post-Pruning by Sub-Tree Replacement.

Start from leaves, work towards the root, and replace each candidate node with a leaf having the majority class.

New cards

Describe Rule Post-Pruning.

Convert the tree into an equivalent set of rules and prune each rule by removing the pre-conditions that are not harmful.

New cards

Back to DTs, what does entropy measure?

It measures the disorder of a set of training examples with respect to their class Y.

Explore top notes

Chemical Coordination and Integration

Updated 950d ago

Note

Enzymes

Updated 564d ago

Note

Chapter 16- The Periodic Table

Updated 999d ago

Note

48 Laws of Power

Updated 735d ago

Note

Chapter 17: Domestic Policy

Updated 748d ago

Note

Human Anatomy Lecture 1:

Updated 1008d ago

Note

3.2 Costs & revenues

Updated 1051d ago

Note

Unit 7: Differential Equations

Updated 796d ago

Note

Explore top flashcards

French Revolution study guide

Updated 804d ago

Flashcards (37)

IB Geo - Climate Change

Updated 493d ago

Flashcards (55)

Intro to Psychology | Chapter 4

Updated 286d ago

Flashcards (30)

English

Updated 651d ago

Flashcards (146)

Rośliny wykłady wszystko

Updated 828d ago

Flashcards (35)

DECA - Marketing

Updated 561d ago

Flashcards (56)

PLTW Engineering course review

Updated 401d ago

Flashcards (56)

Mental and Physical Health (AP)

Updated 190d ago

Flashcards (30)