1/39
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced |
|---|
No study sessions yet.
Multiple Regression Formula
y = b0 + b1x1 + b2x2 + error
MR Formula — y =
outcome
MR Formula — e =
error/residuals
MR Formula — x1 =
predictor 1
MR Formula — x2 =
predictor 2
MR Formula — b1 (and b2) =
partial regression coefficient
MR Formula — b0 =
y-intercept
value of y when x is 0
The Best Fitting Line
intersection of plane with y-axis = intercept (b0)
slope of the plane with respect to x1 defines b1
slope of the plane with respect to x2 defines b2
Use of Residual Plots
graph residuals against predicted values
if the assumptions are tenable, then the residuals should scatter randomly about a horizontal line of 0
any systematic pattern or clustering of residuals suggest violations of assumption
Hypotheses
Overall Model
Specific Predictors
Comparison Among Predictors
Hypotheses — Overall Model
testing all predictor variables:
examines whether a model including all of the predictor variables is better than the model with none of the predictor variables
y = b0 + b1x1 + b2x2 + e VS y = b0 + error
the F reported in MR tests this hypothesis
H0: r2 = 0
r2
proportion of variance accounted for by the specified model
R
a multiple correlation coefficient and is the correlation between the observed values of y and the values of y predicted by the models (large values represent a large correlation between observed and predicted values)
if R = 1: the model perfectly predicts the observed data (gauge of how well the model predicts)
R2
represents the amount of variation in the outcome variable (y) that is accounted for by the model
Adjusted R2
tells us the amount of variation in the outcome variable that would be accounted for if the model had been derived from the population from which the sample was taken
considers the number of predictors in the model and penalizes excessive variables, providing a more accurate measure of the model’s goodness of fit, especially with multiple predictors
adjusted R2 gives a more conservative estimate — more accurate estimate
Hypotheses — Specific Predictors
testing of individual predictor variables
examines whether the inclusion of each predictor variable improves prediction
H0: b1 = 0
H0: b2 = 0
Partial Regression Coefficient
can be thought of as the predicted units of change in y for each unit of change in the independent variable when the value for all other independent variables in the model are held constant
b1 = .50 and b2 = 2.00
holding x1, there is on average a 2.00 point increase in the outcome for every 1 unit increase in x2
Confidence Intervals
for any partial regression coefficient (slope) we can calculate a confidence interval
the CI for a regression coefficient is calculated and interpreted in the same way as it is in simple linear regression
the interpretation is that we’re 95% confident that the population regression line will fall within that range
when a value of 0 falls within the range, the statistical test will not be significant (fail to reject the null)
Hypotheses — Comparison Among Predictors
partial standardized regression coefficient: the partial regression coefficients that are obtained after IV and DV have been standardized allow for comparison among predictors
H0: b1 = b2
H0: b2 = b3
H0: b1 = b3
Assumptions of Multiple Regression
independence of residuals
assumptions of linearity
homoschedasticity
normal distribution: multivariate normality
Assumptions of Multiple Regression — Independence of Residuals
errors are independent of each other
to test:
plot residuals and look for patterns
Assumptions of Multiple Regression — Linearity
linear relationship between predictors and outcome (no linear relationship between predictors)
to test:
check correlations (predictors individually with the outcome)
create scatterplot to compare IV and DV
residual plot
Assumptions of Multiple Regression — Homoschedasticity
variance of errors should be similar across values
to test:
plot data (cone shape indicates an issue)
Assumptions of Multiple Regression — Multivariate Normality
each variable is normally distributed on its own and still normally distributed when you bring them all together
Issues in Multiple Regression
number of predictors
multicolinearity
outliers and influential cases
sample size
Issues in Multiple Regression — Number of Predictors
too many predictors can increase chances of multicollinearity, a bunch of noise, and can make it difficult to determine what is predicting what
overfitting
too many predictors can lead to an inflated R2
Overfitting
if you throw a bunch of stuff in, you will find something that will stick but you might just be fitting the noise
Issues in Multiple Regression — Multicollinearity
redundancy among predictors
when 2 predictors are 100% related they have perfect collinearity (makes it impossible to get an accurate measure of variance in the outcome being caused by the predictor
can be tested using VIF or tolerance
Testing Collinearity
if
VIF > 10
Tolerance < 0.1
then there is an issue with multicollinearity
Issues in Multiple Regression — Outliers and Influential Cases
can pull the regression line
Issues in Multiple Regression — Sample Size
if sample size is too small you are making it more difficult to find an effect even if it is there
if you do manage to find an effect it might be hard to generalize to a larger population
Determining Ideal Sample Size
overall fit: 50 + 8(k)
individual fit: 104 + (k)
k = number of predictors
Categorical Variables in Multiple Regression
where no meaningful order can be defined for the levels, different levels do not reflect equal distances between one another
e.g., religion, ethnicity, gender, academic major
Representing Categorical Variables
use dummy coding
Categorical Variables with only 2 groups/levels in Simple Regression
the constant = mean of y for the group designated as 0
the regression coefficient = difference between the 2 means
Categorical Variables
any categorical variable can be included in a regression analysis
if a categorical variable has more than 2 levels, the number of predictor variables needed will always be:
# of groups - 1
Methods of Multiple Regression
Simultaneous
Hierarchical
Stepwise (usually avoided because they’re hard to replicate)
forward
bidirectional
backward
Strategies for Selecting Predictor Variables — Hierarchical Analyses
order determined by
causal priority
research relevance
Hierarchical Multiple Regression
specifies a series of equations (a priori)
Hierarchical Multiple Regression — Hypotheses
overall model
R21 = 0
R22 = 0
comparison among predictors
M1: b1 = b2
M2: b1 = b2
specific predictors
M1: b1 = 0
M1: b2 = 0
M2: b1 = 0
M2: b2 = 0
difference between models
deltaF = 0