1/48
Modules 1A and 1B
Name | Mastery | Learn | Test | Matching | Spaced | Call with Kai |
|---|
No analytics yet
Send a link to your students to track their progress
Data Mining (DM)
Process of analyzing large datasets to discover patterns and insights. Uses AI and statistics. Examples include fraud detection and customer behavior analysis.
Business Analytics (BA)
Transforms data into actionable insights that improve business decisions. Combines statistical methods and DM techniques.
Business Intelligence (BI)
Converts data into meaningful information for executives and managers.
Supervised Learning
Has DEPENDENT variable. S
Supervised Learning
Red for prediction and classification.
Supervised Learning
Examples: Regression, Logistic Regression, Decision Trees.
Unsupervised Learning
NO dependent variable.
Unsupervised Learning
Used to find hidden patterns and groups.
Unsupervised Learning
Examples: Clustering and Association Analysis
CRISP-DM
Cross Industry Standard Process for Data Mining
How many phases are there in CRISP-DM?
Six
What are the phases of CRISP-DM?
Business Understanding
Data Understanding
Data Preparation
Modeling
Evaluation
Deployment
Business Understanding
Define business objectives, assess the situation, convert business goals into technical DM goals, and create an initial hypothesis. Neglecting this phase can result in solving the wrong problem.
Data Understanding
Collect data, identify relevant variables, explore datasets, and understand available data sources. Sources include internal systems, external data providers, and surveys/research.
Data Preparation
Tasks include cleaning data, handling missing values, handling outlier, standardizing formats, removing redundancies, and transforming variables.
ETL Process
Extract, Transform, Load
Modeling
Choose models based on the problem type.
Continuous depending variable
Regression, Forecasting
Categorical dependent variable
Logistic Regression, Classification Trees
No dependent variable
Clustering, Association Models
Descriptive analysis type
WHAT happened?
Predictive analysis type
What WILL happen?
Prescriptive analysis type
What SHOULD happen?
Training set
Build the model
Validation set
Tune and refine the model
Test set
Measure final performanceO
Overfitting
Model performs extremely well on training data but poorly on new data.
Evaluation
Determines whether the model answers the business question and provides value.
Deployment
Includes implementing the model, reporting findings, monitoring performance, and maintaining systems.
Success Metric
Predictive accuracy, balanced with Cost-Benefit Analysis
BA turns data into ___
Business Insights
DM finds ___ in large data sets.
Patterns
Supervised Learning has a ___ variable.
Dependent
Unsupervised Learning has ___ dependent variable.
No
Regression predicts ___ outcomes.
Continuous
Logistic Regression predicts ___ outcomes
Categorical
Clustering is ___
Unsupervised
Association Analysis finds ___.
Product Relationships
CRISP-DM has ___ phases.
Six
Business Understanding created the ___.
Initial Hypothesis
ETL
Extract, Transform, Load
Training ___ the model
Builds
Validation ___ the model.
Fine-tunes
Test data measures ___.
Final Accuracy
Overfitting ___ performance on new data.
Hurts
Which learning method has a dependent variable?
Supervised Learning
What does ETL stand for?
Extract, Transform, Load
Which technique is used when no dependent variable exists?
Business Understanding
What is the most common DM success metric?
Predictive Accuracy