1/15
week 2
Name | Mastery | Learn | Test | Matching | Spaced | Call with Kai |
|---|
No analytics yet
Send a link to your students to track their progress
linear regression
examining the association between at least 1 predictor variable and an outcome variable
Y = a+bX
X relates to Y in an additive, linear way
line of best fit
key components of linear regression
must be able to measure the predictor and outcome variables
eg. attendance over level of motivation
outcome should be continuous and measured on interval/ratio scale
linear vs multiple
linear = 1 predictor
multiple = 2+ predictors
why use multiple linear regression?
lets us acknowledge and statistically control for the contribution of other variables when examining our measure of interest
can look for relationship between predictor and outcome while aware of other factors potentially affecting it
allows us to know:
what predictors are associated with the outcome variable
to what extent they predict the outcome, while controlling for the other predictor variables
being able to predict scores on the outcome measure if scores on all predictors are known
line of best fit
Y= a+bX
Y = outcome/DV
a = intercept
b = slope
X = specific value on predictor/IV
regression won’t line up with each point exactly as it includes a lot of variability
an e is added for error of residuals
Y=a+bX+e
the intercept
a
value of outcome variable when predictor is 0
where the line of best fit intercepts the Y axis
slope
b
the rate of change in Y due to X
gradient of the line
straight line is drawn through the data so the sum of squared residuals is minimal
most regressions use an ordinary least squares approach
multiple linear regression - f-test
used to evaluate the overall significance of the regression model
compares our model against a baseline model
which assumes none of the predictors have an effect on the outcome
aka a model with just the intercept
tells us if the full model explains significantly more variance in the outcome than a model that explains nothing
should expect a decent model to outperform a model with no predictors
if the p-value for the test is <0.05, the overall model is significant
R2
statistic used for evaluating model fit in linear regression
tells us the proportion of variance in the DV, that is accounted for by the model
R2 of 0
model explains none of the variation in the DV
R2 of 1
model perfectly explains the variation in the DV
R2 of 0.7
70% of variation in the DV is explained by the model
the other 30% is due to factors not captured by the model eg. error/unmeasured variables
adjusted R2
modified version that adjusts for the number of predictors in the model
takes number of IVs and sample size into account
contrasts to R2 which can artificially inflate as more variables are added
the adjusted value can’t be expressed as a percentage, whereas R2 can be
predictor variables
each one has a:
p-value - significance
estimate - relationship between the predictor and outcome
unstandardised coefficients
produced from lm() function
the original output from a regression analysis
expressed in the same units as the variables in the model
represents the actual change in the DV for each unit increase in the IV
eg. when predicting exam scores based on hours studied and attendance, the coefficient may indicate that for each hour studied, the final exam score increases by 0.5 points
standardised coefficients
expressed in terms of SDs, rather than original units
lets us compare the relative impact of different variables, even when they’re measured on different scales
indicate how many SDs the DV will change for a one SD change in the IV
holding other variables constant
highest one is the one with most influence
ignore the sign, just the number
to compute standardised coefficients
both IV and DV are standardised
transformed to have a mean of 0 and SD of 1
typically done by subtracting the mean of the variable from each value, then dividing by the SD
relationship between predictors and outcome variable
good ability to predict a specific value of Y, given the slope and intercept
accounting for error makes the estimate more accurate
so use the predict() function rather than manual calculation