Crafting Data Mining Problem Statements & Data-Mining Workflow

0.0(0)
studied byStudied by 0 people
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
Card Sorting

1/29

flashcard set

Earn XP

Description and Tags

Vocabulary flashcards covering central concepts, methods and ethical issues from the lecture on crafting data-mining problem statements and the seven-step analytical workflow.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

30 Terms

1
New cards

Practical Motivation and Problem Identification

Stage where one verifies that the challenge is data-related and solvable with available or collectable data (e.g., reducing customer churn).

2
New cards

Data Collection

Process of gathering the relevant, representative, valid and reliable data required to address the formulated problem.

3
New cards

Relevance (in data collection)

The alignment of the collected data’s purpose, scope and specificity with the analytical question being asked.

4
New cards

Representativeness

Extent to which a sample reflects the diversity of the entire population, helping to avoid selection bias.

5
New cards

Validity and Reliability

Qualities that ensure data measures what it claims to measure (validity) and does so consistently (reliability).

6
New cards

Simple Random Sample

Sampling technique where every member of the population has an equal and independent chance of selection.

7
New cards

Systematic Sample

Sampling method that selects every k-th item from an ordered list to create the sample.

8
New cards

Stratified Sample

Sampling approach that divides the population into subgroups (strata) and samples from each to maintain proportional representation.

9
New cards

Cluster Sample

Sampling technique that randomly selects entire groups or clusters, then studies all or a subset of elements within those clusters.

10
New cards

Problem Formulation (Data Mining)

Crafting a clear, specific, answerable analytical question that guides the mining process (e.g., “Can usage patterns predict churn?”).

11
New cards

Data Preparation

Cleaning, integrating, structuring and formatting raw data so it becomes suitable for mining and modeling tasks.

12
New cards

Data from Different Sources

Integrated datasets (e.g., demographics, usage, feedback) that give a comprehensive view but may require complex merging.

13
New cards

Data in a Grid Format

Tabular (rows-columns) organization of data that simplifies analysis but may limit capture of complex relationships.

14
New cards

Exploratory Data Analysis (EDA)

Initial analytical step aimed at discovering patterns, anomalies and basic statistics such as mean, median, variance, distribution.

15
New cards

Mean

Arithmetic average of a numerical data set.

16
New cards

Median

Middle value of an ordered data set, splitting it into two equal halves.

17
New cards

Variance

Statistical metric expressing the degree to which data points spread out from the mean.

18
New cards

Distribution

Overall pattern of values, indicating the shape, center and spread of the data points.

19
New cards

Pattern Recognition

Process of detecting meaningful trends, correlations or structures within data.

20
New cards

Analytical Visualization

Use of charts (e.g., scatter plots, bar charts, heat maps) to display statistical characteristics and support pattern recognition and decisions.

21
New cards

Descriptive Analytics

Techniques that summarize and describe historical data to reveal current patterns and trends.

22
New cards

Inferential Analytics

Statistical techniques that draw conclusions about a larger population based on a sample, often via hypothesis testing or regression.

23
New cards

Statistical Inference

Discipline of drawing reliable, uncertainty-aware conclusions from data analyses.

24
New cards

Generalization (of a model)

Ability of a predictive model to perform accurately on new, unseen data rather than just the training set.

25
New cards

Cross-Validation

Model-evaluation strategy that partitions data into training and testing folds to estimate out-of-sample performance.

26
New cards

Confidence Interval

Range of values that, with a specified probability, contains the true parameter or prediction of interest.

27
New cards

Actionable Intelligence

Insights derived from analysis that directly inform strategic, real-world decisions (e.g., targeting retention campaigns).

28
New cards

Ethical Considerations in Data Mining

Practices ensuring privacy, fairness and compliance with regulations during data handling and analysis.

29
New cards

GDPR (General Data Protection Regulation)

EU legislation governing how personal data must be collected, processed and protected.

30
New cards

Customer Churn Prediction

Data-mining application that identifies customers likely to leave, enabling proactive retention strategies.