1/25
Data Mining
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
K-Nearest Neighbors (K-NN)
A lazy learning algorithm that classifies a data point based on the majority label of its k closest neighbors.
Voronoi Diagram
A partition of the space into regions where each region contains points closer to one training example.
Euclidean Distance
A commonly used proximity metric to compute distance between data points in K-NN.
Naïve Bayes Classifier
A probabilistic classifier that applies Bayes' theorem assuming feature independence.
Support Vector Machine (SVM)
A classifier that finds the optimal hyperplane that maximizes the margin between two classes.
Kernel Trick
A method in SVM to transform data into higher-dimensional space to make it linearly separable.
Rule-Based Classifier
A model that classifies data using a set of “If...Then...” rules.
Coverage (Rule)
The fraction of records in the dataset that satisfy the condition of a rule.
Accuracy (Rule)
The proportion of records that satisfy both the condition and conclusion of a rule.
Confusion Matrix
A table used to evaluate the performance of a classification model with TP, FP, FN, and TN.
ROC Curve
A graph showing the trade-off between True Positive Rate (TPR) and False Positive Rate (FPR).
AUC (Area Under Curve)
Metric representing the entire ROC curve. 1 = perfect, 0.5 = random.
Bagging
Bootstrap Aggregating — trains multiple classifiers on different random samples and aggregates the results.
Boosting
An ensemble method that adapts by giving more weight to misclassified instances in each round.
AdaBoost
A boosting algorithm where each weak learner is weighted by its accuracy (alpha), updating weights each round.
Random Forest
A collection of decision trees trained on random subsets of data and features; improves accuracy and reduces overfitting.
Gradient Boosting
Builds models sequentially to reduce loss by correcting errors of the previous model via gradient descent.
Apriori Algorithm
A classic algorithm for mining frequent itemsets using a generate-and-test approach.
FP-Growth Algorithm
A fast pattern mining technique that uses a compact FP-tree to avoid candidate generation.
Support (Frequent Patterns)
Proportion of transactions that contain a particular itemset.
Expectation-Maximization (EM)
An iterative method to estimate missing data or latent variables through E-step and M-step.
Smart Technology Ethics
Concerns around data collection (e.g., Siri recordings) and user consent in improving AI systems.
Facial Recognition Ethics
Ethical issues regarding bias, surveillance, and privacy violations from facial data.
Social Media Algorithm Ethics
Issues of manipulation, emotional impact, and lack of transparency in content curation.
Replika Chatbot
Raises concerns about emotional dependency, data use, and mental health in human-AI relationships.