Chapter 11: Data Mining Vocabulary

0.0(0)
studied byStudied by 0 people
GameKnowt Play
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
Card Sorting

1/25

flashcard set

Earn XP

Description and Tags

Vocabulary flashcards covering key terms from the lecture notes on data mining, CRISP-DM, similarity measures, and related concepts.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

26 Terms

1
New cards

Data mining

The process of applying analytical techniques to extract insights from data, uncover hidden patterns, and support decisions; a building block of machine learning and AI.

2
New cards

Artificial intelligence

Computer systems that exhibit human-like intelligence and cognitive abilities, including deduction, pattern recognition, and complex data interpretation.

3
New cards

Machine learning

Techniques that enable computers to learn automatically from data using self-learning algorithms, improving performance over time and revealing hidden patterns.

4
New cards

CRISP-DM

Cross-Industry Standard Process for Data Mining; a six-phase methodology: Business understanding, Data understanding, Data preparation, Modeling, Evaluation, Deployment; emphasizes business goals first.

5
New cards

SEMMA

A data mining methodology: Sample, Explore, Modify, Model, Assess.

6
New cards

KDD

Knowledge Discovery in Databases; a data mining approach focused on extracting knowledge from large data sets.

7
New cards

Supervised learning

Learning where the target variable is known; used to build predictive models; examples include regression and classification.

8
New cards

Unsupervised learning

Learning without a target variable; used for exploration, dimension reduction, and pattern recognition.

9
New cards

Classification

A supervised learning task where the target is categorical; assigns new cases to classes.

10
New cards

Regression

A supervised learning task where the target is numerical; predicts continuous values; model trained with known outcomes.

11
New cards

Dimension reduction

Reducing high-dimensional data to fewer dimensions while preserving important information; helps reduce redundancy and improve model stability.

12
New cards

Pattern recognition

Identifying recurring sequences, frequent itemsets, or recognizable features in data.

13
New cards

Similarity measures

Quantitative methods to assess how similar or dissimilar observations are, typically based on pairwise distances.

14
New cards

Euclidean distance

The straight-line distance between two points; widely used; sensitive to outliers.

15
New cards

Manhattan distance

The sum of absolute differences across dimensions; often called taxicab distance; less sensitive to outliers than Euclidean.

16
New cards

Standardization (z-score)

Transforming data to z-scores by subtracting the mean and dividing by the standard deviation; makes variables unit-free.

17
New cards

Min-max normalization

Rescaling values to the 0–1 range; preserves relationships but eliminates units.

18
New cards

Binary variable

A categorical variable with only two possible values (e.g., yes/no).

19
New cards

Matching coefficient

A similarity measure for categorical data based on matches; higher values indicate more similarity; equals 1 for a perfect match; does not differentiate positive vs negative outcomes.

20
New cards

Jaccard coefficient

A similarity measure for binary/categorical data that ignores negatives and focuses on shared positives.

21
New cards

Business understanding (CRISP-DM phase)

CRISP-DM phase focusing on clarifying objectives, context, schedule, and deliverables.

22
New cards

Data understanding (CRISP-DM phase)

CRISP-DM phase involving collection and exploration of raw data, initial insights, and hypotheses.

23
New cards

Data preparation (CRISP-DM phase)

CRISP-DM phase consisting of record/variable selection, cleaning, wrangling, and transformation.

24
New cards

Modeling (CRISP-DM phase)

CRISP-DM phase where modeling techniques are selected and applied, data is transformed as needed, and cross-validation is documented.

25
New cards

Evaluation (CRISP-DM phase)

CRISP-DM phase to assess model performance, compare alternatives, interpret results, and develop recommendations.

26
New cards

Deployment (CRISP-DM phase)

CRISP-DM phase to translate insights into actionable deliverables and establish deployment/monitoring/feedback.