Data Mining Lecture 5: Classification Basics

0.0(0)
studied byStudied by 0 people
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
Card Sorting

1/14

flashcard set

Earn XP

Description and Tags

Flashcards based on Data Mining Lecture 5, focusing on classification methods like ID3, KNN, and ensemble learning.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

15 Terms

1
New cards

What are the two main styles of machine learning?

Predictive/Supervised and Descriptive/Unsupervised

2
New cards

What is the goal of classification techniques in machine learning?

Predicting a discrete attribute/class.

3
New cards

What is the OneR algorithm?

A simple algorithm where one attribute does all the work.

4
New cards

What is the strategy for constructing decision trees?

Top down Recursive divide-and-conquer fashion. First select attribute for root node, create branch for each attribute value, split instances into subsets, then repeat recursively.

5
New cards

What criterion is used for attribute selection in decision trees?

Choose the attribute that produces the “purest” nodes, based on information value.

6
New cards

How is the information value of a node measured?

Using entropy.

7
New cards

What is ID3?

A top-down induction of decision trees algorithm developed by Ross Quinlan.

8
New cards

What is Random Forest?

An ensemble machine learning algorithm that uses bagging and feature randomness to create an uncorrelated forest of decision trees.

9
New cards

What is ensemble learning?

A group of base ML algorithms that work collectively to achieve better predictive performance.

10
New cards

What are two main types of ensemble learning methods?

Bagging and boosting.

11
New cards

What is Instance-Based Learning?

Training instances are searched for the top k instances that most closely resemble a new instance to classify that instance by a majority vote.

12
New cards

What is another name for Instance-Based Learning?

rote learning, lazy learning, or k-nearest-neighbor (k-NN, kNN, KNN)

13
New cards

What does the distance function define in instance-based learning?

What is learned

14
New cards

What distance metric is commonly used with numeric attributes in Instance-Based Learning?

Euclidean distance.

15
New cards

Name some ways to deal with kNN assuming all features are equally important?

Feature selection or weights.