Classification and Model Selection

0.0(0)

Studied by 0 people

Call Kai

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Knowt Play

Card Sorting

1/71

Earn XP

Description and Tags

CAPSTONE class.

Last updated 9:49 PM on 4/3/26

Name	Mastery	Learn	Test	Matching	Spaced	Call with Kai

No analytics yet

Send a link to your students to track their progress

72 Terms

New cards

Classification is…

the process of assigning data points to predefined categories or classes

supervised learning task

New cards

Supervised learning task

the model learns from labeled data where each data point already belongs to a specific class

New cards

Goal of Classification

build a model that can then accurately predict the class of new, unseen data points based on their features and characteristics

New cards

What are categories?

clearly defined, distinct groups that data points can be assigned to

Examples:

classifying emails as spam or not, predicting medical image shows a tumor or not, or grouping customers into buying segments

New cards

What are features?

characteristics or attributes of the data points that the model uses to make predictions

Examples:

features for classifying emails might include keywords, sender info, text length

New cards

What is model training?

the model is trained on a set of labeled data points where true class of each point is known. The model learns from these examples to identify patterns and relationships between features and class labels

New cards

What are the predictions?

once trained, the model is used to predict the class of new, unseen data points points based on their features. the specific algorithm used for classification vary depending on type of data and complexity

New cards

Logistic Regression

popular algorithm for binary classification problems (two classes)

Examples: customer churn? (yes/no), email spam? (spam/notspam)

New cards

Support Vector Machines (SVMs)

Effect for both binary and multi-class problems, good for handling high-dimensional data

Examples: classifying handwritten digits, text classification

New cards

Decision Trees

easy to interpret, good for visualizing the decision-making process, can handle both binary and multi-class problems

Examples: loan approval (income, marriage, debt), medical triage, customer purchase behavior

New cards

Random Forests

ensemble method combining multiple decision trees, often leads to improved accuracy and robustness

Examples: employee attrition, credit risk classification

New cards

Neural Networks

powerful and flexible for complex proble,s especially with large amounts of data

Examples: image classification, speech recognition, deep text classification

New cards

Main difference in regression and classification

classification predicts discrete categories with each data point belonging to a specific pre-defined group, spam/not spam, healthy/unhealthy

regression predicts continuous numerical values, these values can fall anywhere within a given range, like predicting housing prices, stock market trends, or temp changes

New cards

Classification model types

logistic regression, decision trees, SVMs, random forests

New cards

Regression model types

linear regression, polynomial regression, ridge regression, lasso regression, neural networks

New cards

Classification evaluation metrics

accuracy, precision, recall, F1 score

New cards

Regression evaluation metrics

mean squared error (MSE), root mean squared error (RMSE), R-squared coefficient

New cards

model selection for classification (part 1)

understanding the data and problem:

types of features: numerical, categorical, text, images, or mixed?
data quality: missing values, outliers, imbalance, noise
problem complexity: linearly separable or complex relationships
desired interpretability: need to understand model decisions?

New cards

model selection for classification (Part 2)

choose candidate models:

start with simpler models: logistic regression, decision trees, Naive Bayes
Consider powerful options: random forests, SVMs, neural networks
match model capabilities to data: tree-based for mixed data, linear models for numerical SVMs for high-dimensional

New cards

Model selection for classification (Part 3)

split data into training and testing sets:

use robust splitting method like stratified sampling to ensure representative class proportions in both sets
Common split proportions: 80% for training, 20% for testing

New cards

Model selection for classification (part 4)

train and evaluate models

train each model on the training set

New cards

What is accuracy?

Overall correct predictions

New cards

What is precision?

proportion of true positives among predicted positives

New cards

What is recall?

proportion of true positives correctly identified

New cards

What is F1-score?

balances precision and recall

New cards

blank

New cards

What’s an ROC curve?

plots true positive rate vs false positive rate

New cards

What’s an AUC?

higher AUC indicates better performance

New cards

Model selection for classification (part 5)

optimize and tune hyperparameters:

adjust model parameters to improve performance (tree depth, regularization strength, kernel type)
Use techniques like grid search or randomized search to explore hyperparameter combinations effectively

New cards

Model selection for classification (Part 6)

consider ensemble methods:

combine multiple models for potentially better performance:
- Random forests: ensemble of decision trees
- Gradient boosting: improves models sequentially

New cards

Model selection for classification (part 7)

Address Data Issues:

handle missing values: imputation, deletion, or model-specific strategies
balance imbalanced classes: resampling, cost-sensitive learning, or specialized algorithms
feature selection: remove irrelevant or redundant features for efficiency and better generalization

New cards

model selection for classification (part 8)

validate and select the final model:

use cross-validation for more robust evaluation: train and test multiple times on different data folds
assess performance on a separate validation set if available
choose the model with the best performance on unseen data, considering interpretability, computational cost, and deployment requirements.

New cards

Why is linear regression usually not appropriate for classification?

Because it can produce predictions below 0 or above 1, and for multiclass problems it imposes an artificial numeric ordering on categories.

New cards

What does logistic regression model?

The probability that an observation belongs to a particular class.

New cards

Why is logistic regression better than linear regression for binary classification?

Because logistic regression keeps predicted probabilities between 0 and 1

New cards

What shape does the logistic function have?

An S-shaped curve

New cards

What are the odds in logistic regression?

Odds = p / (1-p), where p is the probability of the event.

New cards

What is the logit?

The log-odds, or log(p/(1-p)).

New cards

In logistic regression, how is a coefficient interpreted?

A one-unit increase in a predictor changes the log-odds by the coefficient.

New cards

How are logistic regression coefficients estimated?

By maximum likelihood, not least squares.

New cards

What does a positive logistic regression coefficient mean?

Increasing that predictor increases the probability of the event/class of interest.

New cards

What is multiple logistic regression?

Logistic regression using more than one predictor.

New cards

Why can a variable’s sign change between simple and multiple logistic regression?

Because of confounding or correlation among predictors.

New cards

What is confounding?

When the relationship between a predictor and the response is distorted because another related predictor is left out.

New cards

What is the main idea of LDA?

Instead of modeling P(Y|X) directly like logistic regression, LDA models the distribution of predictors within each class and then uses Bayes’ theorem for classification.

New cards

When can LDA be better than logistic regression?

When classes are approximately Gaussian, sample size is small, or classes are well separated.

New cards

What key assumption does LDA make?

Each class has a normal distribution and all classes share a common covariance matrix.

New cards

Why is LDA called “linear”?

Because it produces linear decision boundaries

New cards

What is QDA?

Quadratic Discriminant Analysis, a discriminant method like LDA but with more flexibility.

New cards

What key assumption differs between LDA and QDA?

LDA assumes a common covariance matrix, while QDA allows each class to have its own covariance matrix.

New cards

Why is QDA more flexible than LDA?

Because it can produce quadratic/nonlinear decision boundaries.

New cards

What is the tradeoff between LDA and QDA?

LDA has lower variance but can have more bias; QDA has lower bias but higher variance.

New cards

When is LDA usually preferred over QDA?

When the training set is small or the common covariance assumption is reasonable.

New cards

When is QDA usually preferred over LDA?

When the training set is large and the true boundary is nonlinear or class covariances differ.

New cards

What is KNN?

K-Nearest Neighbors, a nonparametric classifier that assigns a class based on the majority class among the K closest training observations.

New cards

Why is KNN called nonparametric?

Because it does not assume a specific functional form for the decision boundary

New cards

When does KNN tend to perform well?

When the true decision boundary is highly nonlinear.

New cards

What is one downside of KNN compared with logistic regression or LDA?

It does not give easy coefficient-based interpretation of which predictors matter.

New cards

What happens when K in KNN is very small?

The model is more flexible, with low bias and high variance.

New cards

What happens when K in KNN is large?

The model is smoother, with higher bias and lower variance.

New cards

What is a confusion matrix?

A table comparing predicted classes to true classes.

New cards

What does the diagonal of a confusion matrix represent?

Correct classifications.

New cards

What is sensitivity?

The proportion of actual positives correctly identified.

New cards

What is specificity?

The proportion of actual negatives correctly identified.

New cards

Why can overall error rate be misleading?

Because a classifier can have low overall error but still perform poorly on the class you care most about, especially with class imbalance.

New cards

What happens when you lower the classification threshold?

You usually catch more positives, increasing sensitivity, but also create more false positives.

New cards

What kind of decision boundary do logistic regression and LDA usually produce?

Linear decision boundaries

New cards

What kind of decision boundary can QDA produce?

Quadratic/nonlinear decision boundaries.

New cards

Which methods are easiest to interpret: logistic regression/LDA or KNN?

Logistic regression and LDA are easier to interpret

New cards

Which method is more likely to win when the boundary is strongly nonlinear?

KNN or sometimes QDA, depending on the situation.

New cards

Which methods tend to do better when data are limited and the boundary is close to linear?

Logistic regression or LDA.

New cards

Main comparison to memorize?

Logistic regression and LDA are similar and often good for simpler/linear problems; QDA is more flexible but higher variance; KNN is most flexible and can do well for nonlinear boundaries but is less interpretable.