1/11
This set of flashcards covers key concepts from the Data Warehousing with Mining Techniques lecture notes, focusing on definitions and the processes involved in data mining and clustering.
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
Knowledge Discovery in Databases (KDD)
The broader process of extracting useful knowledge from large datasets, which includes several systematic steps such as data cleaning, integration, and mining.
Data Mining
A step in the KDD process that focuses on extracting patterns and knowledge from large amounts of data.
Data Cleaning
The process of fixing or removing incorrect and incomplete data to improve data quality.
Data Integration
The merging of data from multiple heterogeneous sources to create a unified dataset.
Frequent Itemset
A group of items that appear together frequently in a dataset and must meet a minimum support threshold.
Support (in data mining)
The percentage of transactions that contain a specific itemset, used for determining frequent itemsets.
Association Rule
An implication of the form A → B, indicating that when item A is purchased, item B is also likely to be purchased.
Cluster Analysis
An unsupervised learning technique that groups data items so that items in the same cluster are similar.
K-Means Clustering
A partitioning method that divides data into k non-overlapping clusters, each with a centroid, to minimize intra-cluster distance.
Hierarchical Clustering
Builds a tree-like structure (dendrogram) of nested clusters and can be agglomerative or divisive.
DBSCAN (Density-Based Spatial Clustering of Applications with Noise)
A clustering method that forms clusters based on high-density areas and can identify noise or outliers.
EM Algorithm (Expectation-Maximization)
A model-based clustering algorithm that fits data to a mixture of probability distributions for soft clustering.