1/46
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced |
|---|
No study sessions yet.
What is KDD?
Knowledge Discovery in Databases - the broader process of discovering useful knowledge from data
What is the relationship between AI, ML, and Data Mining?
AI is the broadest (creating intelligent machines), ML is a subset of AI (learning from data), Data Mining is a subset of ML (extracting patterns from large datasets)
What is Machine Learning?
A field of study that gives computers the ability to learn without being explicitly programmed
Why is Data Mining different from Machine Learning?
Data Mining focuses on discovering patterns in large-scale datasets, while ML focuses on algorithms that learn from data regardless of size
Why has data growth accelerated?
Advances in data generation and collection technologies (sensors, social media, e-commerce, simulations)
What is the new data collection mantra?
Gather whatever data you can whenever and wherever possible
What are expectations about gathered data?
That it will have value either for the initial purpose or other purposes not envisioned
Give 5 examples of large-scale data sources
Social Networking (Twitter), Sensor Networks, Traffic Patterns, Cyber Security, E-Commerce, Computational Simulations
From commercial viewpoint, why do we need data mining?
Lots of data is collected (web, social media, e-commerce), cheaper computers, competitive pressure is strong, need for strategic advantage through better customized services
Give an example of strategic advantage through data mining
Customer Relationship Management (CRM) - offering better, customized services
From scientific viewpoint, why do we need data mining?
Rapid data accumulation (NASA petabytes, telescope sky surveys), biological insights (high-throughput data), simulating the unseen (scientific simulations generating terabytes)
How does data mining empower scientists?
Automates analysis of massive datasets and aids hypothesis formation
Give 5 great opportunities where data mining can help society
Improving healthcare and reducing costs, Finding alternative/green energy sources, Predicting impact of climate change, Reducing hunger and poverty by increasing agriculture production, Enhancing education
What is Data Mining? (Definition 1)
Extract non-trivial, implicit, previously unknown and potentially useful information from data
What is Data Mining? (Definition 2)
Exploration and analysis, by automatic or semi-automatic means, of large quantities of data to discover meaningful patterns
What fields contribute ideas to Data Mining?
Machine learning/AI, Pattern recognition, Statistics, Database systems
Why are traditional techniques unsuitable for modern data?
Due to characteristics: Large-scale, High dimensional, Heterogeneous, Complex, Distributed
What is Data Mining a key component of?
Data science and data-driven discovery
What are the steps in the Data Mining implementation process?
What are the 2 main categories of Data Mining tasks?
Prediction Methods (predict unknown/future values) and Description Methods (find human-interpretable patterns)
What are Prediction Methods in data mining?
Use some variables to predict unknown or future values of other variables (e.g., sales forecasting)
What are Description Methods in data mining?
Find human-interpretable patterns that describe the data (e.g., analyzing historical criminals data for profiling)
What are the 4 main Data Mining tasks?
Classification, Clustering, Association Rules, Anomaly Detection
What is Classification in data mining?
Finding a model for class attribute as a function of the values of other attributes
Give 5 examples of classification tasks
Credit card fraud detection (legitimate vs fraudulent), Satellite land cover classification, News story categorization, Cyberspace security (intruder identification), Medical diagnosis (tumor benign vs malignant)
What is Regression in data mining?
Predicting a continuous variable by considering relationships with other variables using linear or nonlinear models
Give 3 examples of regression tasks
Forecasting sales (analyzing advertising expenditure), Wind velocity prediction (using temperature, humidity, air pressure), Stock market forecast (time series analysis)
What is Clustering in data mining?
Finding groups of objects such that objects in a group are similar to one another and different from objects in other groups
What are 2 applications of clustering for understanding?
Custom profiling for targeted marketing, Group related documents for browsing, Group genes with similar functionality, Group stocks with similar price fluctuations
What is a clustering application for summarization?
Reduce the size of large data sets
What are Association Rules?
Given records with items from a collection, produce dependency rules predicting occurrence of an item based on other items
Give 3 applications of Association Rules
Market-basket analysis (sales promotion, shelf management, inventory), Telecommunication alarm diagnosis (find combination of alarms), Medical informatics (find combination of symptoms/test results with diseases)
What is Anomaly/Deviation Detection?
Detecting significant deviations from normal behavior
Give 4 applications of Anomaly Detection
Detecting changes in global forest cover, Credit card fraud detection, Network intrusion detection, Identify anomalous behavior from sensor networks
What are 5 motivating challenges in data mining?
Scalability (managing massive datasets), High dimensionality (complex structures with numerous attributes), Heterogeneous and complex data, Data ownership and distribution, Non-traditional analysis (statistical methods unable to deal with current data)
What are challenges with data ownership and distribution?
Data owned by multiple entities, geographically distributed, need to minimize communication, consolidate results, ensure security and privacy
Is Classification descriptive or predictive?
Predictive (predicts class labels for new instances)
Is Clustering descriptive or predictive?
Descriptive (describes structure and patterns in data)
Are Association Rules descriptive or predictive?
Descriptive (describes relationships and patterns)
Is Anomaly Detection descriptive or predictive?
Can be both (describes anomalies but can predict future anomalies)
To detect defective products in production chain, which technique?
Anomaly Detection (detecting deviations from normal)
To predict customers likely to cancel subscriptions, which technique?
Classification (predicting a categorical outcome)
To analyze viewing habits in streaming platform for patterns, which technique?
Association Rules (finding patterns in behavior)
To group similar products based on features, which technique?
Clustering (finding similar groups)
To categorize emails into Spam/not Spam, which technique?
Classification (predicting predefined categories)
To discover patterns in user navigation without predefined groups, which technique?
Clustering (finding natural groupings)
To analyze patient vital signs for sudden changes, which technique?
Anomaly Detection (detecting deviatio