1/14
Flashcards based on Data Mining Lecture 5, focusing on classification methods like ID3, KNN, and ensemble learning.
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
What are the two main styles of machine learning?
Predictive/Supervised and Descriptive/Unsupervised
What is the goal of classification techniques in machine learning?
Predicting a discrete attribute/class.
What is the OneR algorithm?
A simple algorithm where one attribute does all the work.
What is the strategy for constructing decision trees?
Top down Recursive divide-and-conquer fashion. First select attribute for root node, create branch for each attribute value, split instances into subsets, then repeat recursively.
What criterion is used for attribute selection in decision trees?
Choose the attribute that produces the “purest” nodes, based on information value.
How is the information value of a node measured?
Using entropy.
What is ID3?
A top-down induction of decision trees algorithm developed by Ross Quinlan.
What is Random Forest?
An ensemble machine learning algorithm that uses bagging and feature randomness to create an uncorrelated forest of decision trees.
What is ensemble learning?
A group of base ML algorithms that work collectively to achieve better predictive performance.
What are two main types of ensemble learning methods?
Bagging and boosting.
What is Instance-Based Learning?
Training instances are searched for the top k instances that most closely resemble a new instance to classify that instance by a majority vote.
What is another name for Instance-Based Learning?
rote learning, lazy learning, or k-nearest-neighbor (k-NN, kNN, KNN)
What does the distance function define in instance-based learning?
What is learned
What distance metric is commonly used with numeric attributes in Instance-Based Learning?
Euclidean distance.
Name some ways to deal with kNN assuming all features are equally important?
Feature selection or weights.