intro to data mining

0.0(0)
studied byStudied by 0 people
0.0(0)
full-widthCall Kai
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
GameKnowt Play
Card Sorting

1/46

encourage image

There's no tags or description

Looks like no tags are added yet.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

47 Terms

1
New cards

What is KDD?

Knowledge Discovery in Databases - the broader process of discovering useful knowledge from data

2
New cards

What is the relationship between AI, ML, and Data Mining?

AI is the broadest (creating intelligent machines), ML is a subset of AI (learning from data), Data Mining is a subset of ML (extracting patterns from large datasets)

3
New cards

What is Machine Learning?

A field of study that gives computers the ability to learn without being explicitly programmed

4
New cards

Why is Data Mining different from Machine Learning?

Data Mining focuses on discovering patterns in large-scale datasets, while ML focuses on algorithms that learn from data regardless of size

5
New cards

Why has data growth accelerated?

Advances in data generation and collection technologies (sensors, social media, e-commerce, simulations)

6
New cards

What is the new data collection mantra?

Gather whatever data you can whenever and wherever possible

7
New cards

What are expectations about gathered data?

That it will have value either for the initial purpose or other purposes not envisioned

8
New cards

Give 5 examples of large-scale data sources

Social Networking (Twitter), Sensor Networks, Traffic Patterns, Cyber Security, E-Commerce, Computational Simulations

9
New cards

From commercial viewpoint, why do we need data mining?

Lots of data is collected (web, social media, e-commerce), cheaper computers, competitive pressure is strong, need for strategic advantage through better customized services

10
New cards

Give an example of strategic advantage through data mining

Customer Relationship Management (CRM) - offering better, customized services

11
New cards

From scientific viewpoint, why do we need data mining?

Rapid data accumulation (NASA petabytes, telescope sky surveys), biological insights (high-throughput data), simulating the unseen (scientific simulations generating terabytes)

12
New cards

How does data mining empower scientists?

Automates analysis of massive datasets and aids hypothesis formation

13
New cards

Give 5 great opportunities where data mining can help society

Improving healthcare and reducing costs, Finding alternative/green energy sources, Predicting impact of climate change, Reducing hunger and poverty by increasing agriculture production, Enhancing education

14
New cards

What is Data Mining? (Definition 1)

Extract non-trivial, implicit, previously unknown and potentially useful information from data

15
New cards

What is Data Mining? (Definition 2)

Exploration and analysis, by automatic or semi-automatic means, of large quantities of data to discover meaningful patterns

16
New cards

What fields contribute ideas to Data Mining?

Machine learning/AI, Pattern recognition, Statistics, Database systems

17
New cards

Why are traditional techniques unsuitable for modern data?

Due to characteristics: Large-scale, High dimensional, Heterogeneous, Complex, Distributed

18
New cards

What is Data Mining a key component of?

Data science and data-driven discovery

19
New cards

What are the steps in the Data Mining implementation process?

  1. Data collection, 2. Data preprocessing, 3. Data mining, 4. Pattern evaluation, 5. Knowledge presentation
20
New cards

What are the 2 main categories of Data Mining tasks?

Prediction Methods (predict unknown/future values) and Description Methods (find human-interpretable patterns)

21
New cards

What are Prediction Methods in data mining?

Use some variables to predict unknown or future values of other variables (e.g., sales forecasting)

22
New cards

What are Description Methods in data mining?

Find human-interpretable patterns that describe the data (e.g., analyzing historical criminals data for profiling)

23
New cards

What are the 4 main Data Mining tasks?

Classification, Clustering, Association Rules, Anomaly Detection

24
New cards

What is Classification in data mining?

Finding a model for class attribute as a function of the values of other attributes

25
New cards

Give 5 examples of classification tasks

Credit card fraud detection (legitimate vs fraudulent), Satellite land cover classification, News story categorization, Cyberspace security (intruder identification), Medical diagnosis (tumor benign vs malignant)

26
New cards

What is Regression in data mining?

Predicting a continuous variable by considering relationships with other variables using linear or nonlinear models

27
New cards

Give 3 examples of regression tasks

Forecasting sales (analyzing advertising expenditure), Wind velocity prediction (using temperature, humidity, air pressure), Stock market forecast (time series analysis)

28
New cards

What is Clustering in data mining?

Finding groups of objects such that objects in a group are similar to one another and different from objects in other groups

29
New cards

What are 2 applications of clustering for understanding?

Custom profiling for targeted marketing, Group related documents for browsing, Group genes with similar functionality, Group stocks with similar price fluctuations

30
New cards

What is a clustering application for summarization?

Reduce the size of large data sets

31
New cards

What are Association Rules?

Given records with items from a collection, produce dependency rules predicting occurrence of an item based on other items

32
New cards

Give 3 applications of Association Rules

Market-basket analysis (sales promotion, shelf management, inventory), Telecommunication alarm diagnosis (find combination of alarms), Medical informatics (find combination of symptoms/test results with diseases)

33
New cards

What is Anomaly/Deviation Detection?

Detecting significant deviations from normal behavior

34
New cards

Give 4 applications of Anomaly Detection

Detecting changes in global forest cover, Credit card fraud detection, Network intrusion detection, Identify anomalous behavior from sensor networks

35
New cards

What are 5 motivating challenges in data mining?

Scalability (managing massive datasets), High dimensionality (complex structures with numerous attributes), Heterogeneous and complex data, Data ownership and distribution, Non-traditional analysis (statistical methods unable to deal with current data)

36
New cards

What are challenges with data ownership and distribution?

Data owned by multiple entities, geographically distributed, need to minimize communication, consolidate results, ensure security and privacy

37
New cards

Is Classification descriptive or predictive?

Predictive (predicts class labels for new instances)

38
New cards

Is Clustering descriptive or predictive?

Descriptive (describes structure and patterns in data)

39
New cards

Are Association Rules descriptive or predictive?

Descriptive (describes relationships and patterns)

40
New cards

Is Anomaly Detection descriptive or predictive?

Can be both (describes anomalies but can predict future anomalies)

41
New cards

To detect defective products in production chain, which technique?

Anomaly Detection (detecting deviations from normal)

42
New cards

To predict customers likely to cancel subscriptions, which technique?

Classification (predicting a categorical outcome)

43
New cards

To analyze viewing habits in streaming platform for patterns, which technique?

Association Rules (finding patterns in behavior)

44
New cards

To group similar products based on features, which technique?

Clustering (finding similar groups)

45
New cards

To categorize emails into Spam/not Spam, which technique?

Classification (predicting predefined categories)

46
New cards

To discover patterns in user navigation without predefined groups, which technique?

Clustering (finding natural groupings)

47
New cards

To analyze patient vital signs for sudden changes, which technique?

Anomaly Detection (detecting deviatio