What is supervised learning
regroups methods to attempt to learn about distributions where the variables that can be split into categories
What are X variables
explanatory variables, predictors, regressors, independent
What are Y variables
outcomes, response variables, labels, dependent
what is a fitted regression equation
quantifies a linear relationship between two variables, y= intercept + slope * X
Log Liklihood, Equation? When is it used? Higher or lower?
higher is better, discrete y cases, P(Y test/ X test)
Mean Absolute Error, Equation? Lower or Higher?
lower, (1/n)E I Yi - Yhat I
Mean Absolute Percentage Error, Equation? Lower or Higher?
lower, (1/n)E I (Yi-Yhat)/(Yi) I
Root Mean Square Error, higher or lower, outliers, equation?
lower, greatly prenalized by outliers, sqr root((1/n)E(Yi-Yhat)²)
what is R²
how much accurately we can estimate the outcome variable with the explanatory variable, R²= 1-(SSErr-SSTot)
what is SSErr
sum of squared error from the regression, Represents the total amount of variation that we can’t explain with our regression, SSE trend line were plotted at the average = SST (Sum of Squares in Total)
how do you maximize R²
minimize the SSE loss
What is the range of R²
Closer to 1 = explain a lot of the variations in Y with our regression
Closer to 0 = can’t explain the variations in Y better with our regression
how do you interpret the slope
On average, an increase in study time by 1 hour is associated with an increase in grade by 5.2 points, everything else being equal.
how do you interpret the coefficient
On average, when a student spent 0 hours studying and skipped 0 classes, we expect their grade to be 57 points, everything else being equal.
what are p-value
how likely our data has no effect/relationship, low p-value = more confidence
What is OLS and what does it assume?
ordinary lease square regression, relationship between X & Y is linear, estimates are predictions are denoted with a hat, coefficient are obtained by minimizing the sum of squared residuals
what do you do when x=0 doesn’t make sense
could be outside range of data or unrealistic, or both then extrapolate
when are p-values significant?
Statistically significant at a confidence level if p-value < alpha
Generalized Linear Models
Extends the linear regression approach by allowing the distribution to be non-normal
for change of units when the variable is in log the change becomes ____? and if the varaible is standardized?
becomes % and standard deviations
how do you interpret R²
we can explain 24.5% of the variations in grades by looking at the variations in both the number of hours of study and in the number of class skipped
what is LINE?
linearity, independence, normality (errors), equal variance
how does GLM extend linear regression?
allows distribution to be non-normal, the mean Y to be function of a linear combination of Xs
what is the inverse of the mean function?
link function
the link identity what is it used for
linear relationships
what link log used for
when the mean needs to be positive
what link power used for
cured relationships
choosing the right distribution for continuous Y what is the normal distribution
a lot of averages, bell shaped, can be negative
choosing the right distribution for continuous Y what is the gamma distribution
a lot of times, potentiall skewed, always positive
choosing the right distribution for continuous Y what is the bernoulli distribution
probability of an event happening, binary, either 0 or 1
choosing the right distribution for continuous Y what is the poisson distribution
used for a lot of counts, positive integers
what is akaike information criterion
For cases with different number of variables across models, lower is better
what is overfitting?
the model is too flexible, great fit on training data, poor fit on new data
what is underfitting?
not flexible enough, poor fitting on training and new data
consequences of underfitting
bias, poor prediction performance, inability to capture the complexity of some patterns
what is regularization?
restricting the flexibility of a model
how do you regularize a dataset
estimate on a training set, adjust on a validation set, test prediction performance with a test set.
what do you do with too many variables?
use dimension reduction, solve overfitting issues, interpretation is still difficult, keep extra variables with variables selection
what is lasso?
Method where variable selection is performed through regularization. It shrinks the coefficients towards 0
what does 𝜆 control?
the strength of regularization, if 𝜆 is large the coefficient will be different from 0 𝜆 controls𝜆 controls
what are the drawback to lasso?
sensitive to x, issues with small datasets, scale sensitivity, loss of interpretability, bias
decision trees
create groups based on thresholds on X values
what are the advantages of decision trees?
don’t need to specify the relation between x and y, works for regression and classification, very easy to explain, mirrors decision making, graphs
what are the disadvantages of decision trees?
don’t have the same prediction accuracy as other methods