Phase 3: Model Planning Theory and Techniques

0.0(0)
Studied by 0 people
call kaiCall Kai
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
GameKnowt Play
Card Sorting

1/32

flashcard set

Earn XP

Description and Tags

These flashcards cover the key concepts, activities, and techniques involved in Phase 3 (Model Planning) of the analytics lifecycle based on the provided lecture notes.

Last updated 7:50 PM on 5/16/26
Name
Mastery
Learn
Test
Matching
Spaced
Call with Kai

No analytics yet

Send a link to your students to track their progress

33 Terms

1
New cards

What is the Model Planning Phase in the analytics lifecycle?

Phase 3, which sits between data preparation and model building, where decisions are made about what to build and how to evaluate success before writing modeling code.

2
New cards

What is the primary benefit of pre-specifying evaluation criteria and techniques during Phase 3?

It prevents the common modeling failure of deciding metrics after seeing results, thereby eliminating bias and ensuring that outcomes do not influence how they are judged.

3
New cards

According to the transcript, what are the three specific purposes of the Model Planning Phase?

  1. Select the right analytical technique for the problem type; 2. Define how the model will be evaluated before results are seen; 3. Establish the structure of the dataset used for modeling.

4
New cards

In Activity 1, how do the Exploratory Data Analysis (EDA) findings directly shape technique selection?

Findings such as heavily imbalanced target variables, distributions, correlations, and missing data patterns determine which algorithms are appropriate and which metrics matter.

5
New cards

Match the following problem types to their target characteristics: Classification, Regression, Clustering, and Association.

  • Target is a category → Classification

  • Target is a continuous number → Regression

  • No target, finding natural groups → Clustering

  • Finding co-occurrence patterns → Association

6
New cards

How do you apply Constraint Filters in phase 3 of the model planning phase?

  • Interpretability required? Eliminates black-box methods. Regulated industries (banking, insurance) often require explainable decisions.

  • How much data? Very small datasets need simple models. Complex models overfit on small data.

  • Real-time scoring needed? Large ensemble models may be too slow for millisecond decisions.

  • Feature types? Mix of numeric and categorical favors tree-based methods.

7
New cards

What constraint often limits the use of 'black-box' methods in activity 3 of the model planning phase in regulated industries like banking and insurance?

The requirement for interpretability.  Regulated industries (banking, insurance) often require explainable decisions.This means that the models and their results must be understandable to stakeholders, as decisions based on these models can significantly impact individuals and organizations.

8
New cards

What strategy is suggested when selecting candidate models in Phase 4 of Model Planning?

Select 2–3 candidate models and always start with a simple baseline (e.g., logistic regression or linear regression) before building complex models. Always start with a simple baseline (logistic regression, linear regression) before building more complex models.

9
New cards

What types of variables are typically excluded during phase 5 the variable selection process in model planning?

Variables that are redundant (highly correlated with each other), represent leakage (unavailable at prediction time), or add noise (no plausible relationship to the target).

10
New cards

what happens when you pre specify evaluation metrics in phase 6 of model planning ?

Decide how models will be compared before running any of them.

For classification: AUC, accuracy, precision, recall.
For regression: RMSE, MAE, R².

Setting the metric in advance prevents unconsciously picking the metric that makes your preferred model win.

11
New cards

What is 'Step 4' of the Model Planning Phase according to Linoff & Berry aka phase 7 in the model planning phase?

Construct the dataset that will be used for modeling:

  • Define the eligible population — which records are valid training examples?

  • Define the observation date — features measured before this date, outcomes after

  • Plan the train/test split structure

  • Plan handling of class imbalance (oversampling, class weights)

12
New cards

What is data structure, and what does it mean in the model planning phase/

Dataset structure refers to the arrangement and organization of data used in the analysis process. In model planning this means:

  • What is the grain? (one row = one customer, one transaction, one store?)

  • What is the observation date for each record?

  • Which columns are features (inputs) and which is the target (output)?

  • How is the data split for training vs. evaluation?

13
New cards

In the context of dataset structure, what does 'grain' refer to?

The organization of the data, where one row represents one unit of study, such as one customer, one transaction, or one store.

14
New cards

Distinguish between structured and unstructured data based on the transcript.

Structured data is organized in a specific format or schema (like rows and columns)

Unstructured data (text, images, audio, video) lacks a specific format and requires specialized processing like NLP or computer vision to extract meaningful information from it.

15
New cards

what is analytical techniques?

Analytical techniques are the methods and tools used to analyze and process data to achieve business objectives.

16
New cards

In model planning the analytical technique classification has what?

For Classification (predicting categories):

  • Logistic Regression — interpretable, good baseline

  • Decision Trees — visual, interpretable, handles mixed feature types

  • Random Forest — ensemble of trees, high accuracy, feature importance

  • Gradient Boosting (XGBoost) — often highest accuracy on tabular data

17
New cards

In model planning the analytical technique regression has what?

For Regression (predicting continuous values):

  • Linear Regression — interpretable, fast, assumes linearity

  • Ridge/Lasso — linear regression with regularization for many features

  • Random Forest Regression — handles nonlinearity

  • Gradient Boosting Regression — high accuracy

18
New cards

Which classification technique is described as an ensemble of trees that provides high accuracy and feature importance?

Random Forest.

19
New cards

In model planning the analytical technique clustering has what?

For Clustering (finding natural groups):

  • K-Means — partitions records into k groups by minimizing within-cluster distance

  • Hierarchical Clustering — builds a tree of nested clusters

20
New cards

In model planning the analytical technique association has what?

For Association (finding co-occurrences):

  • Apriori Algorithm — finds frequent itemsets and association rules

21
New cards

What is the specific purpose of the Apriori Algorithm?

It is an Analytical Technique for Association used to find frequent itemsets and association rules.

22
New cards

Which R package is described as providing a unified interface where the same code structure works for hundreds of different algorithms?

R with caret.

<p>R with caret.</p>
23
New cards

Which R package is described as the industry-standard ML library? Consistent .fit() / .predict() interface across all algorithms. Pipelines chain preprocessing and modeling steps?

Python scikit-learn

24
New cards

Define the analytical technique 'K-Means'.

A clustering method that partitions records into k groups by minimizing within-cluster distance.

25
New cards

What modern alternative to caret in R includes recipes (feature engineering), parsnip (model specification), rsample (data splitting), yardstick (evaluation metrics)?

R with tidymodels.

26
New cards

Why are SAS and SPSS highlighted as tools for model planning?

They are enterprise tools common in regulated industries (finance, pharma) due to their strong audit trail and validation capabilities.

27
New cards

How does dataset size influence model selection in Phase 3?

Very small datasets require simple models because complex models tend to overfit on small data.

28
New cards

term for potential models for clustering, classifying, or finding relationships in data. Selected during planning before any model is built?

Candidate models

29
New cards

Data Structure

The arrangement and organization of data used in the analysis process.

30
New cards

Analytical techniques

Methods and tools used to analyze and process data to achieve business objectives.

31
New cards

Variable selection

The process of identifying essential predictors and variables to include in the model.

32
New cards

Structured data

 Data organized in a specific format or schema, making it easier to analyze.

33
New cards

Unstructured data

Data that lacks a specific format or structure, often requiring additional processing before analysis.