Data Mining (Prelim Module 1) Reviewer

0.0(0)
studied byStudied by 1 person
full-widthCall with Kai
GameKnowt Play
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
Card Sorting

1/94

flashcard set

Earn XP

Description and Tags

Module 1

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

95 Terms

1
New cards

Machine Learning (ML)

A branch of Artificial Intelligence (AI). It has become an extensively used tool for websites to classify users and address them appropriately.

2
New cards
Example of tasks difficult for rule-based programming
Photo tagging, spam classification, web page ranking.
3
New cards
Training data
Examples provided to a program so it can generate its own rules.
4
New cards

Arthur Samuel in 1959.

Origin of the term "machine learning"

5
New cards
Arthur Samuel’s definition of Machine Learning
"The field of study that gives computers the ability to learn without being explicitly programmed."
6
New cards
Tom Mitchell’s formal definition of Machine Learning (1998)
"A computer program is said to learn from experience E with respect to some class of tasks T and performance measures P, if its performance at tasks in T, as measured by P, improves with experience E."
7
New cards

Virtual Personal Assistants, Traffic Predictions, Online Transportation Networks, Video Surveillance, Social Media Services, Email Spam and Malware Filtering, Online Customer Support, Search Engine Result Refining, Product Recommendations, Online Fraud Detection, Medicine, Computational Biology, Handwriting Recognition, Machine Translation, Driverless Cars and Autonomous Helicopters.

Applications of Machine Learning

8
New cards
Virtual Personal Assistants
Assistants like Siri, Alexa, and Google Assistant use ML to discover information via voice commands, answer queries, and perform tasks like setting alarms or reminders.
9
New cards
Smart speakers as ML products
Amazon Echo and Google Home are products of ML-driven virtual personal assistants.
10
New cards
Traffic Predictions using ML
Google Maps utilizes ML to predict estimated arrival times and model real-time traffic congestion.
11
New cards
Online Transportation Networks and ML
Apps like Ola and Uber employ ML to estimate ride prices and predict rider demand for surge pricing.
12
New cards
ML in Video Surveillance
ML-based systems help in monitoring CCTV cameras by training computers to detect unusual behavior and alert human attendants.
13
New cards
Social Media Services and ML personalization
ML personalizes news feeds for better advertisement targeting.
14
New cards
Email Spam Filtering with ML
Identifies spam based on previously marked emails.
15
New cards
Malware detection using ML
Security programs detect new malware by understanding coding patterns, even when malware code is 90-98% similar to older versions.
16
New cards
Online Customer Support with ML chatbots
Extracts information and provides responses to customer queries without live officials.
17
New cards
Search Engine Result Refining with ML
Improves search results by observing user responses to displayed results.
18
New cards
E-commerce Product Recommendations with ML
Amazon and Flipkart recommend products based on user behavior, past purchases, liked items, cart additions, and brand preferences.
19
New cards
Online Fraud Detection with ML
PayPal compares millions of transactions to distinguish between legitimate and illegitimate ones.
20
New cards
ML in Medicine
Algorithms help doctors understand diseases better by transforming electronic medical records into medical knowledge.
21
New cards
ML in Computational Biology
Assists in understanding and identifying relationships between genes and human features from vast human DNA data.
22
New cards
ML in Handwriting Recognition
Recognizes and reads different handwritings, useful in postal mail routing.
23
New cards
Machine Translation with ML
Technologies like Google Translate instantly translate text between numerous human languages.
24
New cards
ML in Driverless Cars
Crucial for self-driving cars where defining explicit rules is challenging.
25
New cards
ML in Autonomous Helicopters
Enables autonomous helicopters to learn through experience.
26
New cards
Supervised Learning definition
The algorithm is provided with labeled data (inputs with corresponding outputs) to learn and predict unseen data.
27
New cards
Regression problems definition
Predict results within a continuous output.
28
New cards
Examples of regression problems
Predicting student marks, cricket match scores, tomorrow's temperature.
29
New cards
Classification problems definition
Predict results in a discrete output.
30
New cards
Examples of classification problems
Predicting student grades/divisions, whether a cricket team will win or lose, or if tomorrow will be cooler or hotter.
31
New cards
Unsupervised Learning definition
Deals with problems with no labeled data, aiming to derive structure via grouping or clustering.
32
New cards
Google News
Clusters news items based on content attributes like word frequency, sentence length, and page count.
33
New cards
Social Network Analysis
Clusters users into groups based on interests, connections, and communication patterns.
34
New cards
Market Segmentation
Groups customers based on purchase history to target promotions.
35
New cards
Astronomical Data Analysis
Uses light patterns to identify galaxies, planets, and satellites.
36
New cards
Reinforcement Learning definition
Works through trial and error to maximize specified rewards over time.
37
New cards
Key components of reinforcement learning
Agent, Environment, Actions.
38
New cards
Agent in reinforcement learning
Responsible for learning and decision-making.
39
New cards
Environment in reinforcement learning
External world with which the agent interacts.
40
New cards
Actions in reinforcement learning
Tasks performed by the agent.
41
New cards
Data Mining
is the result of the natural evolution of information technology, specifically through the utilization of data management systems.
42
New cards
Data Mining
is the process of digging through a collection of data to discover hidden connections and patterns for the prediction of future trends.
43
New cards
Data mining
Also referred to as knowledge discovery from data.
Data Mining as Interdisciplinary Subfield
44
New cards
Predictive analytics
In business analytics, data mining is referred to as?
45
New cards
Applications of Data Mining: Target Marketing
Data mining is used to understand customers better and segment market groups for customized promotional campaigns.
46
New cards
Applications of Data Mining: Credit Risk Management
Used by financial institutions to predict a borrower’s ability to repay debt and assign interest rates based on risk.
47
New cards
Applications of Data Mining: Fraud Detection
Applied to detect and stop fraudulent transactions automatically by tracking spending habits.
48
New cards
Applications of Data Mining: Healthcare
Predicts a patient’s likelihood of different health conditions based on risk factors like demographic, family, and genetic data.
49
New cards
Applications of Data Mining: Sentiment Analysis
Gathers understanding of how a group feels towards a topic, often using social media data.
50
New cards
Applications of Data Mining: Recommender Systems
Recommends products and services based on consumer behavior.
51
New cards
Applications of Data Mining: Spam Filtering
Analyzes characteristics of malicious messages.
52
New cards
Applications of Data Mining: Education
Predicts students’ future learning behavior and analyzes outcomes for teaching strategy improvement.
53
New cards
Applications of Data Mining: Criminal Investigation
Detects crimes and their relationships with criminals.
54
New cards
Benefits of Data Mining: Automated Decision-Making
Enables continual analysis and automation of routine and critical decisions without human judgment delay.
55
New cards
Benefits of Data Mining: Accurate Prediction and Forecasting
Provides reliable forecasts based on past trends and current conditions.
56
New cards
Benefits of Data Mining: Cost Reduction
Allows efficient resource allocation for maximum cost reduction.
57
New cards
Benefits of Data Mining: Customer Insights
Creates customer personas and personalizes touchpoints to improve experience.
58
New cards
Data Quality
Challenges of Data Mining: Accuracy, completeness, and consistency of data affect results.
59
New cards
Causes of data quality issues
Data entry errors, storage issues, integration problems, transmission errors.
60
New cards
Methods to address data quality issues
Data cleaning (detecting/correcting errors) and data preprocessing (transforming data for mining).
61
New cards
Data Complexity
Challenges of Data Mining: Large, varied data from sources like sensors, social media, IoT, often in different formats.
62
New cards
Techniques for handling data complexity
Clustering, classification, association rule mining.
63
New cards
Data Privacy and Security
Challenges of Data Mining: Increased risk of breaches and cyber-attacks with more data collection.
64
New cards
Scalability
Challenges of Data Mining: Algorithms must handle large datasets and streaming data efficiently.
65
New cards
Interpretability
Challenges of Data Mining: Complex models can be difficult to understand.
66
New cards
Ethics
Challenges of Data Mining: Concerns about discrimination, privacy violation, perpetuating bias.
67
New cards
Volume
Six Vs of Data: Refers to the huge amount of data; size determines if it qualifies as Big Data.
68
New cards
Example of projected Volume in 2020
Almost 40,000 Exabytes of data.
69
New cards
Velocity
Six Vs of Data: High speed of data accumulation from sources like machines, networks, social media, mobile phones.
70
New cards
Variety
Six Vs of Data: Refers to structured, semi-structured, and unstructured data from heterogeneous sources.
71
New cards
Structured data
Organized data with defined length and format.
72
New cards
Semi-structured data
Semi-organized data not conforming to formal structures; example: log files.
73
New cards
Unstructured data
Unorganized data not fitting into relational database tables; examples: text, pictures, videos.
74
New cards
Veracity
Six Vs of Data: Inconsistencies and uncertainty in data quality and accuracy.
75
New cards
Value
Six Vs of Data: Most important “V”; bulk data is useless unless turned into something valuable.
76
New cards
Variability
Six Vs of Data: Refers to changes in data structure or meaning over time.
77
New cards
KDD (Knowledge Discovery in Databases)
Process of extracting useful, unknown, and valuable information from large datasets.
78
New cards
KDD Step 1: Data Selection
Identifying and gathering relevant data from available sources, filtering out irrelevant data or noise.
79
New cards
KDD Step 2: Data Cleaning
Improving data quality by correcting errors, handling missing values, removing outliers, fixing duplicates.
80
New cards
KDD Step 3: Data Integration
Combining data from multiple sources into a unified dataset, resolving mismatches and redundancies.
81
New cards
KDD Step 4: Data Selection and Transformation
Selecting relevant data and preparing it by converting into suitable formats.
82
New cards
KDD Step 5: Data Mining
Applying algorithms like classification, clustering, association rule mining, regression, anomaly detection.
83
New cards
KDD Step 6: Pattern Evaluation
Assessing discovered patterns for relevance and usefulness.
84
New cards
KDD Step 7: Knowledge Representation
Presenting knowledge in charts, reports, dashboards for decision-making.
85
New cards
Improves decision-making
Advantages of KDD: Provides valuable insights to help organizations make better decisions.
86
New cards
Increased efficiency
Advantages of KDD: Automates repetitive tasks and prepares data for analysis.
87
New cards
Better customer service
Advantages of KDD: Helps understand customer needs and preferences.
88
New cards
Fraud detection
Advantages of KDD: Identifies patterns or anomalies indicating fraud.
89
New cards
Predictive modeling
Advantages of KDD:Builds models to forecast trends.
90
New cards
Privacy concerns
Disadvantages of KDD: May involve sensitive personal information.
91
New cards
Complexity
Disadvantages of KDD: Requires specialized skills and knowledge.
92
New cards
Unintended consequences
Disadvantages of KDD: May perpetuate bias or discrimination.
93
New cards
Data Quality
Disadvantages of KDD: Relies heavily on quality; inaccurate data misleads results.
94
New cards
High cost
Disadvantages of KDD: Requires large investments in resources.
95
New cards
Overfitting
Disadvantages of KDD: Model learns noise in training data, reducing performance on new data.