CAP 4770 - Lecture 7: Classification

0.0(0)
studied byStudied by 0 people
GameKnowt Play
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
Card Sorting

1/26

flashcard set

Earn XP

Description and Tags

Flashcards covering key concepts, definitions, and algorithms related to classification tasks in data mining, including evaluation metrics, cross-validation, K-NN, and Naïve Bayes.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

27 Terms

1
New cards

Classification Task

The task of mapping an input attribute set (x) into its discrete class label (y).

2
New cards

Class Label

A discrete attribute that a classification model aims to predict.

3
New cards

Training Set

A collection of records with known class labels used to find or build a classification model.

4
New cards

Test Set

A set of previously unseen records used to determine the accuracy of a classification model, validating its performance.

5
New cards

Confusion Matrix

A table used to evaluate the performance of a classification model by summarizing correct and incorrect predictions for each class.

6
New cards

True Positive (TP)

Instances correctly predicted as positive (Class=Yes).

7
New cards

False Negative (FN)

Instances incorrectly predicted as negative when they are actually positive (Class=Yes, but predicted Class=No).

8
New cards

False Positive (FP)

Instances incorrectly predicted as positive when they are actually negative (Class=No, but predicted Class=Yes).

9
New cards

True Negative (TN)

Instances correctly predicted as negative (Class=No).

10
New cards

Accuracy

The proportion of correct predictions out of the total predictions, calculated as (TP+TN) / (TP+TN+FP+FN).

11
New cards

Error Rate

The proportion of wrong predictions out of the total predictions, calculated as (FP+FN) / (TP+TN+FP+FN).

12
New cards

Precision

The proportion of true positive predictions among all positive predictions, calculated as TruePos / (TruePos + FalsePos).

13
New cards

Recall

The proportion of true positive predictions among all actual positive instances, calculated as TruePos / (TruePos + FalseNeg).

14
New cards

Training Error (Re-substitution error)

The error rate of a model when evaluated on the training data used to build it.

15
New cards

Generalization Error

The error rate of a model when evaluated on unseen testing data, indicating its ability to perform new, unseen records.

16
New cards

Holdout Method

A technique where the original dataset is split into a training set and a test set (e.g., 2/3 for training, 1/3 for testing) to evaluate model performance.

17
New cards

Cross Validation

A technique to evaluate model performance by partitioning data into k disjoint subsets, training on k-1 subsets, and testing on the remaining one, repeating k times.

18
New cards

k-fold Cross Validation

A specific type of cross-validation where data is partitioned into k disjoint subsets, and each subset is used as a test set once while the others form the training set.

19
New cards

Leave-one-out Cross Validation

A type of k-fold cross validation where k is equal to the number of data points, meaning each data point serves as the test set once.

20
New cards

Instance-Based Classifiers

Classifiers that store the training records and use them directly to predict the class label of unseen cases, rather than building an explicit model.

21
New cards

K-Nearest Neighbor (K-NN)

An instance-based classification algorithm that classifies an unknown record based on the majority class of its 'k' closest training records.

22
New cards

Lazy Learner

A classification system (like K-NN) that does not build a model explicitly during training but delays generalization until a classification query is made, making classification relatively expensive.

23
New cards

Bayes Classifier

A probabilistic framework for solving classification problems based on conditional probability and Bayes' Theorem.

24
New cards

Conditional Probability

The probability of an event A occurring given that another event B has already occurred, denoted P(A|B).

25
New cards

Bayes Theorem

A statistical formula that calculates conditional probability: P(C|A) = (P(A|C) * P(C)) / P(A).

26
New cards

Naïve Bayes Classifier

A Bayesian classifier that assumes independence among attributes given the class, simplifying the calculation of P(A1, A2, …, An | C) as a product of individual conditional probabilities P(Ai | C).

27
New cards

m-estimate

A technique used in Naïve Bayes to prevent zero probabilities for conditional attributes by smoothing the probability estimation, especially when a count for an attribute value in a given class is zero.