1/18
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced |
|---|
No study sessions yet.
multiple linear regression
more than one independent variable (x) to predict dependent variable (y)
simple linear regression
one x
epsilon
what is not accounted for by predictors (variance)
explanatory modeling
quantify average effect of inputs on an output variable (x changes y by how much)
- explanatory: causation
- descriptive: correlation
fits data closely and focuses on coefficents (B)
predictive modeling
predict new individual observations
predicts data accurately and focuses on predictions (y_hat)
mult linear regression assumptions
1. errors (and y values) follow norm dist
2. the choiec of variables and their form is correct (linearly)
3. the cases are independent of each other
4. the variability in y values for a given x is the same across all predictions
even if assumption #1 is violated, the resulting estimates may still be good for prediction
large variable predictors
- expensive
- higher change of missing data
- multicollinearity: high correlation among predictors (not good)
- fewer variables are accurate to measure 5(p+2)
occam's razor
prefer simpler models, all else equal
bias variance tradeoff
prefer models w relatively high bias in training so that we have less variability in predictions or newer data
variable selection problem
100 x variables -- how to arrive to 10 variables?
How to reduce num of predictors
- domain knowledge to pick relevant predictors
- frequency/corr tables, summary stats, graphs and missing value counts
- computational power and statistical significance
variable selection methods
- adjusted r^2 and mallows C_p
- evaluates possible subsets and picks best model
- for p predictors (2^p-1) models
adjusted r^2
- r^2: proportion of variation in y explained by x
- r^2 doesn't account for num o fpredictors in model
- adj r^2 uses penalty on num of variables used
mallows c_p
c_p is closer to p+1 for models that fit well
AIC and BIC
- if model is too simplistic and doesn't include importance parameters, it's underfit
- measure info lost by fitting a given model
- smaller = better
use of metrics
- higher adj r^2
- lower RMSE
- c_p is closer to p+1
- lower AIC
- lower BIC
forward selection
- starts with no predictors and adds them one by one (add the one w largest contribution)
- stop at p-value threshold: no other potential predictors has statistically significant contribution
- max validation r^2: stop when r^2 on validation set stops improving when predictors are added
backward elimination
- starts with all predictors and eliminates least useful predictor one by one based on statistical significance
- stop at p-value threshold or max validation r^2
mixed stepwise regression
- like forward selection but drop non-significant predictors at each step
- stop at p-value threshold