1/53
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced |
|---|
No study sessions yet.
Continuous vs Class Response Variables
Continuous responses are numerical (like income or height), while class responses are categorical (like “yes/no” or “spam/not spam”)
Why Logistic Regression Improves Over Linear Regression:
Logistic regression models probabilities between 0 and 1 for classification instead of predicting continuous values.
Linear Discriminant Analysis (LDA):
LDA assumes each class has the same variance and finds a straight line (or boundary) that best separates them.
Quadratic Discriminant Analysis (QDA):
QDA allows each class to have its own variance, creating curved (quadratic) decision boundaries.
Naive Bayes:
Naive Bayes uses probability rules assuming predictors are independent, making it simple and fast for text or categorical data.
K-Nearest Neighbors (KNN):
KNN classifies a new point based on the majority class of its closest neighbors in the data.
Confusion Matrix:
A table that shows how many predictions were correct or incorrect (True/False Positives and Negatives).
Area Under the Curve (AUC):
AUC measures how well a model separates classes — higher AUC means better overall classification performance.
Linear Regression
Fits a straight line by minimizing the total prediction error (RSS) between predicted and actual values.
Linear Regression - Relationship
Assumes a linear relationship between predictors and the response variable.
Linear Regression - Flexibility
Lowest flexibility; fits only straight lines, no curves.
Linear Regression - Coefficients Meaning
Coefficients show how much Y changes when a predictor increases by 1 unit.
Linear Regression - Key Feature
Assumes a straight-line relationship — no curves allowed.
Ridge Regression
Minimizes RSS plus a penalty on the squared coefficient values (L2 penalty).
Ridge Regression - Effect
Shrinks coefficients toward zero to reduce overfitting but never makes them exactly zero.
Ridge Regression - Key Feature
Simplifies the model by shrinking coefficients but keeps all predictors.
Lasso Regression
Minimizes RSS plus a penalty on the absolute values of coefficients (L1 penalty).
Lasso Regression - Effect
Can shrink some coefficients all the way to zero, removing less important predictors.
Lasso Regression - Key Feature
Performs variable selection automatically by keeping only the most important predictors.
Polynomial Regression
Adds powers of X (like X², X³) to capture curved relationships while still fitting a linear model.
Polynomial Regression - Key Feature
Fits one smooth curved line across all data instead of a straight line.
Splines
Combine polynomials and step functions to fit flexible curves that change shape at specific points called knots.
Splines - Key Feature
Fits smooth, piecewise curves that join together smoothly at knots.
Mean Squared Error (MSE)
Measures average squared prediction error; smaller is better; units are the square of Y.
R² (R-squared)
Shows how much of Y’s variation is explained by the model; closer to 1 means better fit; unitless.
β (Beta) in Linear Regression
β shows how much Y changes when X increases by 1 (the slope).
β (Beta) in Ridge/Lasso Regression
β still means change in Y per unit of X, but it’s smaller because of the penalty that shrinks coefficients.
β (Beta) in Polynomial Regression
β controls how curved the line is — β₁ affects slope, β₂ and higher bend the curve up or down.
β (Beta) in Splines
β controls the curve’s shape in one section of X, describing local slope between knots.
How to Explain β in Linear Regression
If β = 3, when X goes up by 1, Y goes up by 3.
How to Explain β in Ridge or Lasso Regression
β is smaller than in linear regression (e.g., 2.5 instead of 3) to reduce overfitting.
How to Explain β in Polynomial Regression
β₁ sets the direction, and β₂ (on X²) bends the curve up if positive or down if negative.
How to Explain β in Splines
Each β shapes how Y changes with X in one range — different β’s for different sections.
Resampling
A method to estimate model accuracy by repeatedly training and testing the model on different splits of the data.
Leave-One-Out Cross Validation (LOOCV)
Trains on all data except one point, tests on that point, and repeats for every observation.
Pros: Uses almost all data for training (low bias).
Cons: Very slow to compute and can vary a lot (high variance).
K-Fold Cross Validation
Splits data into K parts, trains on K–1 parts, and tests on the remaining one, repeating K times.
Pros: Faster and gives more stable results (lower variance).
Cons: Uses less data each time, so slightly higher bias.
LOOCV vs K-Fold Summary
LOOCV = slow, low bias, high variance.
K-Fold = faster, slightly higher bias, lower variance.
Basics of Decision Trees
Decision Trees split data into regions using predictor values to make predictions or classifications. They are easy to interpret but can overfit and have high variance.
Method of Splitting and Building Trees
The model chooses the predictor and cutoff that best separate the data at each step, splitting until no further improvement is made or a stopping rule is reached.
Improvements from Bagging and Random Forests
Bagging builds many trees on different data samples and averages them to reduce variance, while Random Forests also randomize predictor selection at each split to make trees less correlated and more accurate.
Tree Sketching
Each region in a plot represents a split in the tree; you can draw one from the other by matching splits to the boundaries of the regions
difference between Parametric and Non-Parametric Methods
Parametric methods assume a specific equation or shape for the model (like linear regression), while non-parametric methods make fewer assumptions and adapt more flexibly to the data (like KNN or trees).
issues with High Collinearity Between Predictors
When predictors are highly correlated, it becomes difficult to separate their individual effects, leading to unstable or unreliable coefficient estimates.
the purpose of and methods for Dimension Reduction or Feature Selection
Methods like PCA or Lasso simplify models by reducing the number of predictors while keeping the most relevant information.
how Degrees of Freedom relate to Model Complexity
More degrees of freedom mean a more flexible and complex model that can capture data patterns better but is more likely to overfit.
what Major Assumptions Various Methods May Make About the Input Data
Different models assume certain properties (like linearity, independence, or normality); breaking these assumptions can reduce model accuracy.
Attributes of “Sound Models” (No Data Leakage Between Train and Test Data)
A sound model keeps training and testing data separate so no information from the test set influences the training process.