1/48
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced | Call with Kai |
|---|
No analytics yet
Send a link to your students to track their progress
what is regression
allows researchers to make a prediction about a dependent variable based on one or more independent variables
what does a regression model show
whether and how much changes observed in the DV are associated with changes in one or more IV by determining a best fit line and looking at how data is dispersed around it
linear regression
analyzes linear relationship between one or more IV and one DV (interval or ratio level)
2 types of linear regression
simple regression and multiple regression
simple regression
type of linear regression
One independent variable (simple linear regression)
multiple regression
type of linear regression
done when there is multiple independent variables (multiple linear regression)
linearity
the relationship between the IV and the DV are linear
assumptions of linear regression
linearity, independence, normality, equal variance (LINE)
simple linear regression
one continuous outcome (DV) at the interval or ratio level
one continuous or categorical predictor (IV)
B1 in regression formula
the slope - amount of change in Y or each unit change in X
what does a steep slope indicate
strong relationship because X is quickly effecting Y
positive slope (B1)
as the predictor increases the dependent variable also increases
negative slope (B1)
as the predictor increases the dependent variable decreases
slope (B1) magnitude
the strength of the relationship between the predictor and the outcome
large coefficient means a stronger impact on dependent variable
slope (B1) units
the coefficient is expressed in units of the dependent variable per one unit change in the predictor (IV)
mathematical slope formula
assumes no variability or error, relationship is fixed between x and y
statistical slope formula
includes an error term
properties of a best fit line
straight line that minimizes the discrepancy between the observed data points and the predicted data point
passes through the mean of X and Y
residuals sum to zero
residual
difference between where the data actually falls and where the linear regression line predicts they will fall (vertical distances from each point to the line)
residual formula
residual = Yi - Yhat i
Yi - the actual value of the outcome DV
Yhat - the predicted value of the outcome for the ith term based on the regression line
how do we determine the best fit line
by going through the ordinary least squares (OLS) method
how does the ordinary least squares (OLS) method work
it minimizes the sum of the required residuals (SSR)
multiple linear regression
deals with one continuous outcome (DV) and two or more predictors (IVs - continuous or categorical)
what does the B1 indicate in multiple linear regression
since there are multiple slopes each slope indicates the amount of change in Y for each unit change in X, holding other predictor constant
what does r2 = 0 mean in linear regression
the regression model explains none of the variance in Y
what does r2 = 1 mean in linear regression
the model explains all of the variance in Y
why do we want a higher r2
is generally indicates a better fit of the model to the data
what does r2 show in simple linear regression
how well X predicts the Y
what does r2 show in multiple linear regression
shows how well all Xs explain the Y
what is the adjusted r2
used in multiple regression
accounts for the number of predictors in the model so that we don’t overestimate the models explanatory power
what does r2 indicate
the overall performance of the model, not significance. Even a high r2 may not be statistically significant
if the Confidence interval (CI) crosses zero is the test statistically significant
NO
what is the DV value in logistic regression
since its binary (yes no) it only has a value of 0 or 1 no matter the value of the IV
goal of logistic regression
to explain the probability of the event occurrence
how does logistic regression work
it uses logit transformation which ensures that the predicted probabilities lie between 0 and 1
Odds ratio (OR) in logistic regression
compares the odds of an event occurring between two groups or for different values of a predictor
odds
compares the probability of an event happening (P) to the probability of it not happening (1-P)
odds formula
P (probability of event happening) / (1-P) - probability of event not happening
odds ratio formula
odds of event in group 1/odds of event in group 2
what gives the odds ratio
Exponential of Bi - Exp(B) in graph
OR > 1
the odds of the event increase as the predictor increases
increased odds of developing DV given exposure
OR = 1
the predictor has no effect on the odds of the event
OR < 1
the odds of the event decrease as the predictor increases
decreasing odds of developing DV given exposure
multinomial logistic regression
used when dependent variable is a categorical variable that has more than two categories
what can we use to determine significance
p value or confidence interval
multicollinearity
occurs in regression analysis when two or more IV (predictors) are highly correlated with each other
what happens if two or more IV (predictors are highly correlated)
it makes it difficult to separate their individual effects on the outcome and making the interpretation of coefficients less reliable
this is multicollinearity
what do we use to deal with multicollinearity
VIF (variance inflation factor)
it measures how much the variance of the coefficient is inflated due to multicollinearity
what would a high VIF (variance inflation factor) indicate
we would need to take out some variables or change something