1/26
Flashcards about decision trees, entropy, information gain, overfitting, and pruning.
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
What are Decision Trees (DTs)?
Supervised classifiers that are a popular ML/DM technique.
What is the DT representation?
Each non-leaf node tests the values of an attribute; each branch corresponds to an attribute value; each leaf node assigns a class.
How do you predict the class for a new example using a Decision Tree?
Start from the root and test the values of the attributes until you reach a leaf node; return the class of the leaf node.
When constructing Decision Trees (ID3 algorithm), when do you stop?
All examples have the same class; make a leaf node corresponding to this class.
How can Decision Trees be expressed?
Mutually exclusive rules, where each rule is a conjunction of tests, connected with disjunction.
What is the main idea behind finding the best attribute for a decision tree?
Recursively choose the best attribute as the root of the sub-tree.
What does Entropy measure?
A measure of the homogeneity (purity) of a set of examples with respect to their class.
How does the amount of entropy relate to the purity of a set?
The smaller the entropy, the greater the purity of the set.
How does entropy apply to information theory?
Sender sends information to Receiver about the outcome of an event.
How do you calculate the entropy for a coin toss with equal probability?
Calculate H(coin_toss)= I(1/2, 1/2)= -(1/2)log(1/2) -(1/2)log(1/2) = 1 bit.
Describe high entropy.
The values of X are all over the place; flat historgram.
Describe low entropy
The values of X are more predictable; histogram has many lows and one or two highs.
What does Information Gain measure?
The reduction in entropy caused by using this attribute to partition the set of training examples.
What is the best attribute?
The best attribute is the one with the highest information gain.
What is the information gain of an attribute A?
The reduction in entropy caused by the partitioning of the set of examples S using A.
What happens if you use ID code as one of the attributes?
Split the training data into 1-example subsets.
What is Gain Ratio?
A modification of the Gain that reduces its bias towards highly branching attributes.
How do you stop pruning?
Separating available data into training set, validation set, and test data.
How do you deal with Numeric Attributes?
Convert numerical attributes into nominal ones through a discretization procedure.
How do you handle attributes with different costs?
Avoid overfitting and learn a DT using low-cost attributes.
What are some additional topics related to Decision Trees?
Missing attribute values, different costs, highly-branching attributes, and numeric attributes.
What is Overfitting?
The error on training data is very small, but the error on test data is high.
What are two main approaches to avoid overfitting in Decision Trees?
Stop growing the tree earlier or fully grow the tree and then prune it.
Why use validation set?
Random errors and coincidental regularities within the training set.
Describe Tree Post-Pruning by Sub-Tree Replacement.
Start from leaves, work towards the root, and replace each candidate node with a leaf having the majority class.
Describe Rule Post-Pruning.
Convert the tree into an equivalent set of rules and prune each rule by removing the pre-conditions that are not harmful.
Back to DTs, what does entropy measure?
It measures the disorder of a set of training examples with respect to their class Y.