CAP 4770 - Lecture 7: Classification

0.0(0)

Studied by 0 people

Knowt Play

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Card Sorting

1/26

Earn XP

Description and Tags

Flashcards covering key concepts, definitions, and algorithms related to classification tasks in data mining, including evaluation metrics, cross-validation, K-NN, and Naïve Bayes.

Computer Science

Software Engineering

Study Analytics

Name	Mastery	Learn	Test	Matching	Spaced

No study sessions yet.

27 Terms

New cards

Classification Task

The task of mapping an input attribute set (x) into its discrete class label (y).

New cards

Class Label

A discrete attribute that a classification model aims to predict.

New cards

Training Set

A collection of records with known class labels used to find or build a classification model.

New cards

Test Set

A set of previously unseen records used to determine the accuracy of a classification model, validating its performance.

New cards

Confusion Matrix

A table used to evaluate the performance of a classification model by summarizing correct and incorrect predictions for each class.

New cards

True Positive (TP)

Instances correctly predicted as positive (Class=Yes).

New cards

False Negative (FN)

Instances incorrectly predicted as negative when they are actually positive (Class=Yes, but predicted Class=No).

New cards

False Positive (FP)

Instances incorrectly predicted as positive when they are actually negative (Class=No, but predicted Class=Yes).

New cards

True Negative (TN)

Instances correctly predicted as negative (Class=No).

New cards

Accuracy

The proportion of correct predictions out of the total predictions, calculated as (TP+TN) / (TP+TN+FP+FN).

New cards

Error Rate

The proportion of wrong predictions out of the total predictions, calculated as (FP+FN) / (TP+TN+FP+FN).

New cards

Precision

The proportion of true positive predictions among all positive predictions, calculated as TruePos / (TruePos + FalsePos).

New cards

Recall

The proportion of true positive predictions among all actual positive instances, calculated as TruePos / (TruePos + FalseNeg).

New cards

Training Error (Re-substitution error)

The error rate of a model when evaluated on the training data used to build it.

New cards

Generalization Error

The error rate of a model when evaluated on unseen testing data, indicating its ability to perform new, unseen records.

New cards

Holdout Method

A technique where the original dataset is split into a training set and a test set (e.g., 2/3 for training, 1/3 for testing) to evaluate model performance.

New cards

Cross Validation

A technique to evaluate model performance by partitioning data into k disjoint subsets, training on k-1 subsets, and testing on the remaining one, repeating k times.

New cards

k-fold Cross Validation

A specific type of cross-validation where data is partitioned into k disjoint subsets, and each subset is used as a test set once while the others form the training set.

New cards

Leave-one-out Cross Validation

A type of k-fold cross validation where k is equal to the number of data points, meaning each data point serves as the test set once.

New cards

Instance-Based Classifiers

Classifiers that store the training records and use them directly to predict the class label of unseen cases, rather than building an explicit model.

New cards

K-Nearest Neighbor (K-NN)

An instance-based classification algorithm that classifies an unknown record based on the majority class of its 'k' closest training records.

New cards

Lazy Learner

A classification system (like K-NN) that does not build a model explicitly during training but delays generalization until a classification query is made, making classification relatively expensive.

New cards

Bayes Classifier

A probabilistic framework for solving classification problems based on conditional probability and Bayes' Theorem.

New cards

Conditional Probability

The probability of an event A occurring given that another event B has already occurred, denoted P(A|B).

New cards

Bayes Theorem

A statistical formula that calculates conditional probability: P(C|A) = (P(A|C) * P(C)) / P(A).

New cards

Naïve Bayes Classifier

A Bayesian classifier that assumes independence among attributes given the class, simplifying the calculation of P(A1, A2, …, An | C) as a product of individual conditional probabilities P(Ai | C).

New cards

m-estimate

A technique used in Naïve Bayes to prevent zero probabilities for conditional attributes by smoothing the probability estimation, especially when a count for an attribute value in a given class is zero.