1/43
Vocabulary flashcards covering fundamental terms and concepts from the lecture on regression algorithms.
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
Regression
Task of predicting a continuous target variable by learning an approximation of an unknown function from data.
Predictor Variables (Features)
The input variables x that describe each case and are used to predict the target.
Target Variable
The continuous outcome Y that the model aims to predict.
Regression Model
A function hθ(x) that maps a vector of predictor values x to a real-valued prediction y.
Residual
The observation error εi = yi − hθ(xi) for instance i.
Bias
Systematic error introduced by approximating the true function with a simpler model; error due to wrong assumptions.
Variance
Amount by which a model’s predictions would vary if it were trained on different data sets.
Bias-Variance Trade-off
Relationship where decreasing bias often increases variance and vice versa; total error is their sum.
Loss Function
A function L(y,ŷ) that quantifies the cost of predicting ŷ when the true value is y.
Squared Loss
Loss defined as (y − ŷ)²; heavily penalizes large errors.
Absolute Loss
Loss defined as |y − ŷ|; treats all errors linearly.
Zero-One Loss
Loss equal to 0 if prediction equals true value, 1 otherwise; mainly for classification.
Expected Loss
The average loss E[L(y,ŷ)] over the data distribution; used to assess models.
Mean Squared Error (MSE)
Average of squared prediction errors across N cases; measured in squared units of Y.
Root Mean Squared Error (RMSE)
Square root of MSE; expressed in the same units as Y.
Mean Absolute Error (MAE)
Average of absolute prediction errors; same units as Y.
Relative Error Metrics
Unit-less scores obtained by comparing a model’s error to a baseline model’s error.
Normalized Mean Squared Error (NMSE)
Ratio of model SSE to SSE of constant mean predictor; ranges 0–1 (lower is better).
Normalized Mean Absolute Error (NMAE)
Ratio of model SAE to SAE of constant mean predictor; ranges 0–1 (lower is better).
Correlation Coefficient
Statistic ρŷ,y measuring linear association between predictions and true values; ranges −1 to 1.
Coefficient of Determination (R²)
Proportion of variance in Y explained by the model; ranges 0–1, higher is better.
Simple Linear Regression
Regression with one predictor; model of form y = β0 + β1x + ε.
Multiple Linear Regression
Regression with several predictors; model y = β0 + ΣβjXj + ε.
Sum of Squared Errors (SSE)
Total of squared residuals; minimized to estimate linear regression coefficients.
Multicollinearity
Situation where predictors are highly correlated, leading to unstable and hard-to-interpret coefficients.
Regularization
Technique that adds a penalty term to the loss to keep coefficients small and reduce overfitting.
Ridge Regression
L2-regularized linear regression that adds λΣβj²; shrinks coefficients but none to zero.
Lasso Regression
L1-regularized linear regression that adds λΣ|βj|; can shrink some coefficients to zero, performing feature selection.
Gradient Descent
Iterative optimization algorithm that updates parameters in the negative gradient direction to minimize loss.
Learning Rate
Step-size parameter α controlling how far each gradient descent update moves.
Batch Gradient Descent
Variant that computes gradients using the whole training set before each update.
Stochastic Gradient Descent
Variant that updates parameters after each training example.
Mini-batch Gradient Descent
Variant that updates parameters after processing a small batch of examples.
Regression Tree
Decision tree whose leaves output continuous values; built by recursively partitioning predictor space.
Recursive Partitioning
Process of splitting data into subsets by tests to build a decision or regression tree.
Pre-pruning
Stopping tree growth early using criteria like minimum cases or maximum depth to prevent overfitting.
Post-pruning
Growing a large tree then cutting back branches using error estimates (e.g., cross-validation).
Error-Complexity Pruning
CART method that generates a sequence of sub-trees and chooses the best via cross-validation and the x-SE rule.
Support Vector Regression (SVR)
Support vector machine for regression that fits a function within an ε-insensitive tube and minimizes margin.
Epsilon-insensitive Loss
SVR loss that ignores errors smaller than ε and penalizes only the excess.
LOESS
Locally Estimated Scatterplot Smoothing; non-parametric method that fits locally weighted least-squares models.
MARS
Multiple Additive Regression Splines; non-parametric technique using piecewise linear regressions to model nonlinearities and interactions.
k-Nearest Neighbors Regression
Instance-based method that predicts the average target value of the k closest training instances.
Artificial Neural Network (Regression)
Model composed of layers of neurons with activation functions used to approximate complex continuous functions.