DM UNIT 7 : Classification & Evaluation

0.0(0)

Studied by 0 people

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Card Sorting

1/25

Earn XP

Description and Tags

Data Mining

Study Analytics

Name	Mastery	Learn	Test	Matching	Spaced

No study sessions yet.

26 Terms

New cards

K-Nearest Neighbors (K-NN)

A lazy learning algorithm that classifies a data point based on the majority label of its k closest neighbors.

New cards

Voronoi Diagram

A partition of the space into regions where each region contains points closer to one training example.

New cards

Euclidean Distance

A commonly used proximity metric to compute distance between data points in K-NN.

New cards

Naïve Bayes Classifier

A probabilistic classifier that applies Bayes' theorem assuming feature independence.

New cards

Support Vector Machine (SVM)

A classifier that finds the optimal hyperplane that maximizes the margin between two classes.

New cards

Kernel Trick

A method in SVM to transform data into higher-dimensional space to make it linearly separable.

New cards

Rule-Based Classifier

A model that classifies data using a set of “If...Then...” rules.

New cards

Coverage (Rule)

The fraction of records in the dataset that satisfy the condition of a rule.

New cards

Accuracy (Rule)

The proportion of records that satisfy both the condition and conclusion of a rule.

New cards

Confusion Matrix

A table used to evaluate the performance of a classification model with TP, FP, FN, and TN.

New cards

ROC Curve

A graph showing the trade-off between True Positive Rate (TPR) and False Positive Rate (FPR).

New cards

AUC (Area Under Curve)

Metric representing the entire ROC curve. 1 = perfect, 0.5 = random.

New cards

Bagging

Bootstrap Aggregating — trains multiple classifiers on different random samples and aggregates the results.

New cards

Boosting

An ensemble method that adapts by giving more weight to misclassified instances in each round.

New cards

AdaBoost

A boosting algorithm where each weak learner is weighted by its accuracy (alpha), updating weights each round.

New cards

Random Forest

A collection of decision trees trained on random subsets of data and features; improves accuracy and reduces overfitting.

New cards

Gradient Boosting

Builds models sequentially to reduce loss by correcting errors of the previous model via gradient descent.

New cards

Apriori Algorithm

A classic algorithm for mining frequent itemsets using a generate-and-test approach.

New cards

FP-Growth Algorithm

A fast pattern mining technique that uses a compact FP-tree to avoid candidate generation.

New cards

Support (Frequent Patterns)

Proportion of transactions that contain a particular itemset.

New cards

Expectation-Maximization (EM)

An iterative method to estimate missing data or latent variables through E-step and M-step.

New cards

Smart Technology Ethics

Concerns around data collection (e.g., Siri recordings) and user consent in improving AI systems.

New cards

Facial Recognition Ethics

Ethical issues regarding bias, surveillance, and privacy violations from facial data.

New cards

Social Media Algorithm Ethics

Issues of manipulation, emotional impact, and lack of transparency in content curation.

New cards

Replika Chatbot

Raises concerns about emotional dependency, data use, and mental health in human-AI relationships.

New cards