Data Warehousing with Mining Techniques – Unit Test Notes

0.0(0)

Studied by 0 people

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Card Sorting

1/11

Earn XP

Description and Tags

This set of flashcards covers key concepts from the Data Warehousing with Mining Techniques lecture notes, focusing on definitions and the processes involved in data mining and clustering.

Study Analytics

Name	Mastery	Learn	Test	Matching	Spaced

No study sessions yet.

12 Terms

New cards

Knowledge Discovery in Databases (KDD)

The broader process of extracting useful knowledge from large datasets, which includes several systematic steps such as data cleaning, integration, and mining.

New cards

Data Mining

A step in the KDD process that focuses on extracting patterns and knowledge from large amounts of data.

New cards

Data Cleaning

The process of fixing or removing incorrect and incomplete data to improve data quality.

New cards

Data Integration

The merging of data from multiple heterogeneous sources to create a unified dataset.

New cards

Frequent Itemset

A group of items that appear together frequently in a dataset and must meet a minimum support threshold.

New cards

Support (in data mining)

The percentage of transactions that contain a specific itemset, used for determining frequent itemsets.

New cards

Association Rule

An implication of the form A → B, indicating that when item A is purchased, item B is also likely to be purchased.

New cards

Cluster Analysis

An unsupervised learning technique that groups data items so that items in the same cluster are similar.

New cards

K-Means Clustering

A partitioning method that divides data into k non-overlapping clusters, each with a centroid, to minimize intra-cluster distance.

New cards

Hierarchical Clustering

Builds a tree-like structure (dendrogram) of nested clusters and can be agglomerative or divisive.

New cards

DBSCAN (Density-Based Spatial Clustering of Applications with Noise)

A clustering method that forms clusters based on high-density areas and can identify noise or outliers.

New cards

EM Algorithm (Expectation-Maximization)

A model-based clustering algorithm that fits data to a mixture of probability distributions for soft clustering.