1/29
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
Linear Regression
Predicts a quantitative response
Assumes a linear relationship between predictor variables and the response variable
Parametric method
Estimate the parameters by minimizing the residual
𝛽0
Intercept
Starting point of the line
Unknown and Estimated
𝛽1
Slope
Unkown and estimated
Simple Linear Regression
Predicts a quantitative (numeric) response Y using a single predictor (independent variable) X
^
Indicates predicted value
ϵ
Error term (the difference between the predicted and actual Y)
Estimating Residuals
Residual = Actual Y – Predicted Y^
Residual Sum of Squares
We square the residuals so negatives don’t cancel out and to penalize large errors more heavily
Sample Mean
Average among the given samples, measurable, but generally a good estimate of the population mean
Population Mean
Average among the entire population, not always measurable
Standard Error (SE)
Statistic that measures the uncertainty of using the sample mean to estimate the population mean
Confidence Intervals
Region in which we are confident that n% of the population lies within
Typically confidence level is 95%
Null Hypothesis (H0)
No hypothesis between X and Y
H0 : 𝛽 ^ 1 = 0
Alternative Hypothesis
There exists a relationship between X and Y
Ha : 𝛽 ^ 1 != 0
T-statistic
Measures the number of standard deviations away 𝛽 ^ 1 is from 0
Essentially is a ratio
Larger ratio, further away
P-value
Probability of observing the given t-statistic (or larger)
The smaller the p-value, the less likely this observed association between X and Y occurred randomly
Can reject the null hypothesis (ie, claim there is a relationship) if the p-value is small enough
Typically 5% or 1% cut off (stylized as p < 0.05 or p < 0.01)
Residual Standard Error (RSE)
Standard deviation of the error
“Lack of fit”
Units of Y (number of units sold)
R squared statistic
Proportion of variance explained
[0,1], 1 is perfect fit
TSS - total sum of squares
Coefficient interpretation
Average effect on Y of one unit increase in Xi holding all other Xs fixed
Check all coefficients
H0 : All coefficients are zero
𝛽 ^ 1 = 𝛽 ^ 2 = … = 𝛽 ^ p = 0
Ha : At least one [𝛽 ^ 1 , 𝛽 ^ 2 , … , 𝛽 ^ p ] is non-zero
Check subset of coefficients
H0 : All coefficients except those in q are zero
𝛽 ^ p-q+1 = 𝛽 ^ p-q+2 = … = 𝛽 ^ p = 0
Ha : At least one in subset of [𝛽 ^ p-q+1 , 𝛽 ^ p-q+2 , … , 𝛽 ^ p ] is non-zero
F-Statistic
Close to 1, no relationship between all predictors and Y
Far greater than one, relationship exists between at least one of the predictors and Y
p >> n
Too many coefficients to predict, not enough samples
You want to predict exam scores (YYY) from 100 predictors (study time, sleep hours, diet habits, stress levels, etc).
But you only have 10 students’ data (n=10n = 10n=10).
Here, p=100p = 100p=100, n=10n = 10n=10. Since p>np > np>n, the model has too many parameters and not enough data to reliably estimate them.
Forward Selection
Begin with null model (intercept/𝛽0 ) only. Fit simple linear regression for each predictor, add only the predictor that has lowest RSS. Model now has two coefficients. Continue until stopping parameter.
Can always be used
Backward Selection
Begin with model containing all predictors/coefficients. Remove predictors one by one, beginning with the one with the largest p-value
Cannot be used with p >> n
Mixed Selection
Combination of the two
Check that all predictors included have a low p-value, and all predictors would have a high p-value if added to the model
Qualitative Variables
Predictors with two levels (binary)
○ Create a new dummy variable that captures information
○ One hot encoding
Predictors with more than two levels/n levels (categorical)
○ Create n-1 dummy variables
○ Select one class to be the “neutral”
Additive Assumption
○ The association between a predictor Xj and the response Y does not depend on the values of the other predictors
○ Can relax the additive assumption by adding interaction terms “synnergy”
Linear Assumption
○ The change in the response Y associated with a one-unit change in Xj is constant, regardless of the value of Xj
○ Can relax the linear assumption by adding polynomial terms
○ Still technically a linear model, but has a quadratic shape