Linear Regression

0.0(0)
Studied by 0 people
call kaiCall Kai
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
GameKnowt Play
Card Sorting

1/26

encourage image

There's no tags or description

Looks like no tags are added yet.

Last updated 11:20 PM on 6/17/26
Name
Mastery
Learn
Test
Matching
Spaced
Call with Kai

No analytics yet

Send a link to your students to track their progress

27 Terms

1
New cards

Definition: Model Formulation

Intuition: How do different variables relate to an outcome

Definition: The core assumption is a linear relationship between a dependent variable Y and one or more independent variables X_i

Components:

  • Y: Dependent variable

  • X_i: Independent variables (predictors)

  • B_0, B_i, e: Intercept, coefficients, error term (residual

Used: The basis for models like CAPM, factor models, and many trading strategies

2
New cards

Definition: Ordinary Least Squares (OLS) Estimation - Matrix Form

Intuition: Finds the smallest error term by trying different coefficients (B)

Definition: Finds the coefficients B that minimize the Residual Sum of Squares

Components:

  • Data matrix X

  • Response vector y

  • Done using transformations and inverses

Uses: B is needed in linear regression model

3
New cards

Definition: OLS Assumption - Linearity

Intuition: 1:1 movement in coefficient

Definition: The model is linear in the parameters B

Components:

  • Coefficients B

  • Not observable random variable ϵ\epsilon

Uses: Make sure we don’t have a non-linear relationship

4
New cards

Definition: OLS Assumption - No multicollinearity

Intuition: There isn’t a strong relationship between any predictors

Definition: X^TX is invertible. There is no perfect linear relationship between predictors.

Components:

  • Data matrix X

  • Data matrix X transposed (^T)

  • Matrix multiplication between the two is invertible

Uses: Prevents inflated standard errors and unstable coefficient estimates

5
New cards

Definition: OLS Assumption - Homoscedasticity

Intuition: The standard error (true range of a coefficient such as β1\beta_1) value doesn’t change if we use different data

Definition: The error variance is constant across all observations

Components:

  1. Error term for each X (take the variance error term given X)

  2. Ensure (1) equals variance (σ2\sigma^2 )

Uses: High-return periods that also have high volatility (OLS is already unbiased)

6
New cards

Definition: OLS Assumption - No autocorrelation

Intuition: Error terms (true range of a coefficient such as β1\beta_1) aren’t related to each other at all

Definition: Errors are uncorrelated across observations

Components:

  1. Covariance between two error terms given X

  2. (1) is equal to zero

  3. The two error terms aren’t the same

Uses: Common in time series data (e.g., momentum strategies)

7
New cards

Definition: OLS Assumption - Maximum Likelihood Estimator

Intuition: The OLS estimator could be the MLE depending if the errors (true range of a coefficient such as β1\beta_1) are a bell curve around 0

Definition: If we add the assumption that ϵ\epsilon ~ N(0, σ2\sigma^2 ), the OLS estimator is also the MLE

Components:

  • First five assumptions (Linearity, exogenous, no multicollinearity, homoscedasticity, no autocorrelation)

  • Error term is normally distributed

Uses: Know which B to use, t-test and F-statistic can only be carried out if this is true

8
New cards

Definition: OLS Assumption - Strictly Exogenous

Intuition: Error term (true range of a coefficient such as β1\beta_1) always has expected value of 0 no matter the value of the independent variables

Definition: The error term is uncorrelated with the predictors

Components:

  • The expectation of each error term given the data set X is equal to zero

Uses: Crucial violation in finance, can lead to biased estimators

9
New cards

Definition: R2R^2

Intuition: How much variance in the result is explained by the data matrix

Definition: Proportion of the variance in Y that is predictable from X

Components:

1 - (RSS / TSS)

  • RSS: Residual Dum of Squares (variation in error between observed data and modeled values)

  • TSS: Total Sum of Squares (variation in the observed data)

Uses: Compare models with same number of predictors

10
New cards

Definition: Adjusted R2R^2

Intuition: How much variance in the result is explained by the data matrix, prioritized models with less irrelevant predictors

Definition: Proportion of the variance in Y that is predictable from X

Components:

1 - ( (RSS / (m - p - 1)) / ( TSS / (m - 1) ) )

  • RSS and TSS

  • Number of predictors (p)

Uses: Compare models with different numbers of predictors, lower is better

11
New cards

Definition: Standard Error (SE) of βi\beta_{i}

Intuition: How far the beta can change the predicted values from being the actual true value

Definition: Estimated standard deviation of a parameter estimate

Components:

  • Square root of variance of coefficient

Uses: Construct confidence intervals and perform hypothesis tests on individual coefficients

12
New cards

Definition: t-statistic βi\beta_{i}

Intuition: if the true βi\beta_{i} were actually 0, how far away is our estimate of βi\beta_{i} from 0? That distance is the t-statistic, and a high one means a low probability that βi\beta_{i} is actually 0.

Definition: the ratio of the difference in a number’s (coefficient’s) estimated value from its assumed value (0) to its standard error

Components:

  • t = (β\beta / SE(β\beta))

Uses: test null hypothesis H0: βi\beta_{i} = 0. Follows a t-distribution with m-p-1 degrees of freedom

13
New cards

Definition: F-statistic

Intuition: Does regression model explain a meaningful amount of variation in the dependent variable compared to noise

Definition: Ratio that compares explained variance per parameter to unexplained variance per remaining degree of freedom

Components:

  • Numerator: How large the sum of squared residuals becomes in %

  • Denominator: Accounts for sampling variability

Uses: tests the null hypothesis that all slope coefficients are jointly equal to zero

14
New cards

Definition: Ridge Regression

Intuition: Improves prediction by shrinking coefficient magnitudes to reduce variance at the cost of introductions some bias

Definition: Regularized linear regression that minimizes squared errors plus an L2 penalty on the coefficients

Components:

  • Loss function: RSS measuring fit to the data

  • L2 penalty: Squared magnitude of coefficients that discourages large weights

  • Regularization parameter (lambda): Controls strength of coefficient shrinkage

Uses: Handles multicollinearity and improve out of sample performance in high dimensional regressions

15
New cards

Definition: Lasso Regression

Intuition: performs both shrinkage and variable selection by forcing some coefficients exactly to zero

Definition: regularized linear regression that minimizes squared errors plus an L1 penalty on the coefficients

Components:

  • Loss function: residual sums of squared capturing model fit

  • L1 penalty: Absolute values of coefficients that promote sparsity

  • Regularization parameter (lambda): Determines shrinkage and variable elimination

Uses: Feature selection and choosing when predictors may be irrelevant

16
New cards

Definition: Bias

Intuition: Error from approximating a real-world function with a simpler model

Definition: Error when the expected value of an estimator does not equal true parameter value

Components:

  • True parameter: The actual coefficient values generating the data

  • Estimator expectation: Average value of the estimated coefficients across samples

  • Model constraints: Assumptions or regularization that distort the estimator toward simpler models

Uses: Understand the bias-variance trade off and to justify regularization methods like ridge and lasso

17
New cards

Definition: Variance

Intuition: Error from model being too sensitive to training data

Definition: Expected squared deviation of a model’s prediction from its own average prediction across different training datasets

Components:

  • Training sample randomness: Different datasets drawn from the same process lead to different fitted models.

  • Estimator instability: Sensitivity of coefficients or prediction to changes in the data

Uses: Understand overfitting risk and to motivate regularization methods that stabilize model estimates

18
New cards

Definition: Bias-Variance Tradeoff

Intuition: finding the middle ground of complex or simple a model should be

Definition: finding the optimal balance between complex models

Components:

  • Complex model (high degree polynomial): low bias but high variance (overfitting)

  • Simpler model (OLS): high bias but low variance (underfitting)

Uses: Obtain the least amount of prediction error

19
New cards

Definition: Decision Tree (Regression)

Intuition: A sequence of if-else rules that split the data into groups, where each group predicts the average outcome of the observations inside it.

Definition: Partitions the feature space into a set of non-overlapping regions. For any given observation, the model predicts the mean of the response values of the training points that fall into the same region.

Components:

  • Splits: At each node, the algorithm chooses a feature and a split point. The goal is to make the response values in the resulting child nodes as similar as possible.

  • Splitting criterion: For regression, splits are chosen to minimize squared error, usually measured by RSS or MSE. In short, we split it where it reduces variability the most.

  • Leaves: Each terminal node outputs a constant prediction, equal to the mean of the response values in that node.

  • Training: The tree is built greedily, one split at a time, choosing the locally best split at each step

Uses: Nonlinear regression, baseline model before moving onto more complex ones

20
New cards

Decision Tree Pros

  • Easy to interpret (white-box model)

    • Start at root, ask a series of yes/no questions, end in a lead with a fixed numerical prediction (the mean)

    • You can interpret locally and don’t need to understand entire model, just path your observation took

  • Can handle non-linear relationship

    • Represents the response function as a piece-wise constant function over a partitioned feature space

      • Instead of fitting one smooth curve everywhere, the tree says “in this part of the space, behave this way; in another part, behave differently.”

21
New cards

Decision Tree Cons

  • High variance (small changes in data can lead to a very different tree)

  • Prone to overfitting

  • Generally lower predictive accuracy than ensemble methods

22
New cards

Ensemble Methods

Intuition: Reduces variance and bias

Definition: Combines multiple individual decision trees

Types:

  • Bagging (bootstrapping aggregation)

  • Random forests (improve bagging)

  • Boosting (sequentially building trees)

Uses: Improve overall predictive performance and robustness

23
New cards

Bagging (Bootstrapping Aggregating)

Intuition: Decision trees are noisy - small changes in the data can change them a lot. Bagging reduces that noise by training many trees on slightly different versions of the data and then averaging their predictions

Definition: A technique that reduces the variance of a learning algorithm by repeatedly resampling the training data with replacement, fitting the model on each resample, and aggregating the predictions

Components:

  • Bootstrapped samples: Each model is trained on a dataset created by sampling with replacement from the original data

  • Base learner (decision trees): Uses full, unpruned decision trees, which are high-variance learners

  • Aggregation: predictions are averaged (regression)

  • Out-of-bag-error: Each tree sees 2/3 of the data; the remaining 1/3 can be used to estimate test error without cross-validation

Uses: Used primarily to reduce variance in unstable models like decision trees

24
New cards

Random Forests

Intuition: Improve on bagging by making individual tress less similar. They do this by randomly limiting which features a tree is allowed to consider at each split. so different trees learn different structures.

Definition: An ensemble of decision trees trained on bootstrapped samples, where each split considers only a random subset of predictors. The final prediction is the average of the individual tree predictions.

Components:

  • Bootstrap sampling: same as bagging - each tree is trained on a resampled dataset

  • Random feature selection: At each split, only m out of p predictors are considered

  • Hyper parameter m: controls how aggressive the decorrelstion is

    • Smaller m → more randomness → lower correlation → lower variance

  • Aggregation: Predictions are averaged over trees

  • Feature importance: R.F.s naturally reduce variable importance measures by tracking how much splits reduce errors across trees

Uses: When you need a strong, general-purpose regression model that works well when relationships are nonlinear, interactions are important, and interpretability is secondary to performance

25
New cards

Boosting

Intuition: Reduces bias by fitting weak learners sequentially, where each learner is trained to correct the residual errors of the current ensemble

Definition: Builds models sequentially by fitting new trees to the residuals of the current model, with each tree’s contribution scales by a learning rate to control overfitting.

Components:

  • Residuals: The errors made by the current ensemble

  • Weak learners: Typically small, shallow trees

  • Learning rate (gamma): controls how much each new tree contributes to the ensemble

    • Smaller learning rate → slower learning → better generalization

  • Number of trees: controls model complexity

Uses:

  • Often used when maximum predictive accuracy is required and interpretability is secondary

26
New cards

Definition: p-value

Intuition: If the true slope were actually zero, how surprising would this result be?

Definition: Measures how unlikely the observed t-statistic for the coefficient is if the true coefficient were zero.

Components:

  • Null Hypothesis H0H_0 : Typically β1\beta_1 = 0, meaning no linear relationship between predictor and response

  • Test statistic (t-statistic): how many standard errors away from zero is the estimate

Uses: if p-value is less than 0.05, observed effect is unlikely if coefficient = 0 and we can reject null hypothesis

27
New cards

Definition: Stationarity

Intuition: A time series is stationary if its statistical behavior does not change over time.

Definition: Stationary if its mean, variance, and autocovariance structure do not depend on time.

Components:

  • Constant mean: no drift/trend

  • Constant volatility: variability stays the same and won’t explode

  • Autocovariance depends on lag: Dependence between today and tomorrow is the same regardless of when today occurs (yesterday affects today the same way in 2010 as in 2025)

Uses: ARMA and ARIMA models which assume this, and forecasting (stable prediction)