1/33
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
Supervised Learning
regroups methods that attempt to learn about
the conditional distribution
Conditional Distribution equation
P(Yl𝑋) = 𝑃(𝑦1, 𝑦2, ... |𝑥1, 𝑥2, ... )
Y variables
The outcomes, response variables, or labels.
In the case of randomized experiments, also called dependent variables.
X variables
the explanatory variables, predictors, or
regressors. In the case of randomized experiments, also called independent variables.
The linear regression method is based on:
The relationship between the expected value of Y and Xs is assumed to be linear
Estimates and predictions are denoted with a hat
The coefficients are obtained by minimizing the Sum of Squared Residuals (or “errors”)
expected value equation
𝐸(𝑌|𝑋) = 𝛽0 + 𝛽1x1 + 𝛽2x2 + ⋯ + 𝛽𝑛X𝑛
Estimates equation
ŷ= መ𝛽0 + መ𝛽1x1 + መ𝛽2x2 + ⋯ + መ𝛽𝑛𝑋𝑛
Sum of Squared Residuals Equation
SS𝐸𝑟𝑟 = ∑ (𝑌𝑖 − ŷ) ^2
𝑔𝑟𝑎𝑑𝑒 = 57 + 5.2 × 𝑆𝑡𝑢𝑑𝑦𝑇𝑖𝑚𝑒𝐻 − 8.7 #𝐶𝑙𝑎𝑠𝑠𝑆𝑘𝑖𝑝𝑝𝑒𝑑
• Intercept: Expected ŷ when every 𝑿 = 𝟎
“On average, when a student spent 0 hours studying and skipped 0 classes, we
expect their grade to be 57 points, everything else being equal.”
“On average, an increase in study time by 1 hour is associated with an increase in grade by 5.2 points, everything else being equal.”
If 𝑋 = 0 makes no sense or is not in the range of the data (out-of-domain), for at least
one explanatory variable
• “A study time of 0 is not in the range of our data and we shouldn’t extrapolate”
• “A study time of 0 is unrealistic and we shouldn’t extrapolate”
• “A study time of 0 is not in the range of our data and unrealistic and we shouldn’t extrapolate”
Slope
Expected change in ŷ for a change in the corresponding 𝑋, while every single other 𝑋 stays the same
𝐿𝑖𝑓𝑒 𝐸𝑥𝑝. = −2.92 + 8.1 × ln 𝐺𝐷𝑃
On average, an increase of GDP by 1% is associated with an increase in Life Expectancy by 0.081 years, everything else being equal.
ln(^𝐿𝑖𝑓𝑒 𝐸𝑥𝑝) = 1.23 + 0.02 × 𝐺𝐷𝑃
On average, an increase of GDP by 1 Million USD is associated with an increase in Life Expectancy by 2 %, everything else being equal.
ln(^𝐿𝑖𝑓𝑒 𝐸𝑥𝑝) = 1.23 + 2.1 × ln 𝐺𝐷𝑃
On average, an increase of GDP by 1 % is associated with an increase in Life Expectancy by 2.1 %, everything else being equal.
If a variable is standardized (mean 0 and s.dev. 1)
the change is in standard deviations
𝑆𝑡𝑑 𝐿𝑖𝑓𝑒 𝐸𝑥𝑝 = 1.23 + 0.1 × 𝐺𝐷𝑃
On average, an increase of GDP by 1 Million USD is associated with an increase in Life Expectancy by 0.1 Standard Deviations, everything else being equal
R^2
the share of variations in Y that we can explain with the model, when we know the value of every single explanatory variable
Example: R2 = 0.245
With this model, we can explain 24.5% of the variations in grades by looking at the variations in both the number of hours of study and in the
number of class skipped”
p-value
The probability that we find an estimated coefficient at least that far from the population value, if the population value were the one in H0
(usually 0).
Coefficient P-value (𝑯𝟎: 𝜷 = 𝟎)
Intercept < 0.001
StudyTimeH < 0.001
ClassSkipped 0.042
“If the true population coefficient 𝜷StudyTimeH= 𝟎, there is a probability of less than 0.1% that the estimated coefficient for the Study Time in hours is that far from 0.”
“If the true population coefficient 𝜷ClassSkipped = 𝟎, there is a probability of 4.2% that the estimated
coefficient for the number of class skipped is that far from 0.”
If p-value < a
A result is statically significant at a confidence level
𝑅𝑒𝑣 = 11,671,521,696 − 5,816,822 × 𝑌𝑒𝑎𝑟 + 3 × 𝐵𝑢𝑑𝑔𝑒𝑡
• What is the expected revenue for a film released in 1992 without a budget?
A film without a budget is unrealistic, we cannot extrapolate.
• What is the expected revenue for a film produced in 1970 with a $10MM budget?
11,671,521,696 − 5,816,822 × 1970 + 3 × 10,000,000 = 242,382,356
Incremental Value Isn’t Constant
In linear models, a small change in X results in a constant change in Y.
In non-linear models, the effect of changing X varies depending on where you are in the model. For example, in a quadratic function like Y=X2, increasing X from 1 to 2 has a smaller effect than increasing X from 5 to 6.
Local Incremental Changes Matter
Since the effect of X is not uniform, we need to consider local changes—the impact of a small increase in X at a specific point.
This is crucial in economics, business, and data analysis, where small changes can have different effects depending on the situation.
Derivative equation
ŷ = መ𝛽0 + መ𝛽1X1 + መ𝛽2log(𝑋1) + መ𝛽3𝑋1
2 + መ𝛽4 𝑋1𝑋2 + መ𝛽5 𝑋2
Derivative of a sum is the sum of the derivatives
𝜕ŷ1/𝜕𝑋1= 𝜕መ𝛽0/𝜕𝑋1+ 𝜕መ𝛽1X1/𝜕𝑋1+ 𝜕መ𝛽2log(𝑋1)/𝜕𝑋1+…
For all values of 𝑋 the population errors
• Have mean zero (Linearity)
• Are statistically independent (Independence)
• Are normally distributed (Normality)
• Have equal variance (Equal Variance)
Can be remembered using the acronym L.I.N.E.
Even though we can’t know the population errors
we can guess their behavior by looking at the sample residuals
Mean Zero Population Errors (Linearity)
• Population errors assumed to
be mean zero for any given
𝑋s.
• There should be roughly as
many points below and above
the straight line.
Independent Population Errors
• Knowing the value of the errors for any (set of) 𝑋 value(s) provides no information on the value of the others errors
No relation between X and e
No relation between different e
• Commonly violated with time series
data
Errors may display trends over time called autocorrelation
Normally Distributed Population Errors
• Distribution of the errors is
normal
• Note: This is not the same as saying that Y is normally distributed
• Check by plotting the residuals
Histogram
QQ Plot
• Histogram should be (roughly)
bell shaped
Equal Error Variance
• The variance of the errors does not depend on the values of 𝑋
Variance of 𝑒 constant across 𝑋 ➔ homoskedasticity
Variance of 𝑒 depends on 𝑋 ➔ heteroskedasticity
• Can be seen by plotting 𝑒 on 𝑋
(Relatively) consistent ➔ homoskedastic
Fan shape ➔ heteroskedastic
If assumptions are violated
• Bias: Estimates and predictions might not be equal to the true value on
average.
• Wrong uncertainty estimation: The standard errors could be off
• Inefficient: Even if there were neither bias nor wrong standard errors, there might exist a more accurate method to perform estimation and
prediction.