Multiple Regression

0.0(0)
studied byStudied by 0 people
0.0(0)
full-widthCall Kai
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
GameKnowt Play
Card Sorting

1/39

encourage image

There's no tags or description

Looks like no tags are added yet.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

40 Terms

1
New cards

Multiple Regression Formula

y = b0 + b1x1 + b2x2 + error

2
New cards

MR Formula — y =

  • outcome

3
New cards

MR Formula — e =

  • error/residuals

4
New cards

MR Formula — x1 =

  • predictor 1

5
New cards

MR Formula — x2 =

  • predictor 2

6
New cards

MR Formula — b1 (and b2) =

  • partial regression coefficient

7
New cards

MR Formula — b0 =

  • y-intercept

  • value of y when x is 0

8
New cards

The Best Fitting Line

  • intersection of plane with y-axis = intercept (b0)

  • slope of the plane with respect to x1 defines b1

  • slope of the plane with respect to x2 defines b2

9
New cards

Use of Residual Plots

  • graph residuals against predicted values

  • if the assumptions are tenable, then the residuals should scatter randomly about a horizontal line of 0

  • any systematic pattern or clustering of residuals suggest violations of assumption

10
New cards

Hypotheses

  • Overall Model

  • Specific Predictors

  • Comparison Among Predictors

11
New cards

Hypotheses — Overall Model

  • testing all predictor variables:

    • examines whether a model including all of the predictor variables is better than the model with none of the predictor variables

      • y = b0 + b1x1 + b2x2 + e VS y = b0 + error

  • the F reported in MR tests this hypothesis

    • H0: r2 = 0

12
New cards

r2

  • proportion of variance accounted for by the specified model

13
New cards

R

  • a multiple correlation coefficient and is the correlation between the observed values of y and the values of y predicted by the models (large values represent a large correlation between observed and predicted values)

    • if R = 1: the model perfectly predicts the observed data (gauge of how well the model predicts)

14
New cards

R2

  • represents the amount of variation in the outcome variable (y) that is accounted for by the model

15
New cards

Adjusted R2

  • tells us the amount of variation in the outcome variable that would be accounted for if the model had been derived from the population from which the sample was taken

  • considers the number of predictors in the model and penalizes excessive variables, providing a more accurate measure of the model’s goodness of fit, especially with multiple predictors

    • adjusted R2 gives a more conservative estimate — more accurate estimate

16
New cards

Hypotheses — Specific Predictors

  • testing of individual predictor variables

    • examines whether the inclusion of each predictor variable improves prediction

      • H0: b1 = 0

      • H0: b2 = 0

17
New cards

Partial Regression Coefficient 

  • can be thought of as the predicted units of change in y for each unit of change in the independent variable when the value for all other independent variables in the model are held constant

    • b1 = .50 and b2 = 2.00

      • holding x1, there is on average a 2.00 point increase in the outcome for every 1 unit increase in x2

18
New cards

Confidence Intervals

  • for any partial regression coefficient (slope) we can calculate a confidence interval

  • the CI for a regression coefficient is calculated and interpreted in the same way as it is in simple linear regression

  • the interpretation is that we’re 95% confident that the population regression line will fall within that range

  • when a value of 0 falls within the range, the statistical test will not be significant (fail to reject the null)

19
New cards

Hypotheses — Comparison Among Predictors

  • partial standardized regression coefficient: the partial regression coefficients that are obtained after IV and DV have been standardized allow for comparison among predictors

    • H0: b1 = b2

    • H0: b2 = b3

    • H0: b1 = b3

20
New cards

Assumptions of Multiple Regression

  • independence of residuals

  • assumptions of linearity

  • homoschedasticity

  • normal distribution: multivariate normality

21
New cards

Assumptions of Multiple Regression — Independence of Residuals

  • errors are independent of each other

  • to test:

    • plot residuals and look for patterns

22
New cards

Assumptions of Multiple Regression — Linearity

  • linear relationship between predictors and outcome (no linear relationship between predictors)

  • to test:

    • check correlations (predictors individually with the outcome)

    • create scatterplot to compare IV and DV

    • residual plot

23
New cards

Assumptions of Multiple Regression — Homoschedasticity

  • variance of errors should be similar across values

  • to test:

    • plot data (cone shape indicates an issue)

24
New cards

Assumptions of Multiple Regression — Multivariate Normality

  • each variable is normally distributed on its own and still normally distributed when you bring them all together

25
New cards

Issues in Multiple Regression

  • number of predictors

  • multicolinearity

  • outliers and influential cases

  • sample size

26
New cards

Issues in Multiple Regression — Number of Predictors

  • too many predictors can increase chances of multicollinearity, a bunch of noise, and can make it difficult to determine what is predicting what

  • overfitting

  • too many predictors can lead to an inflated R2

27
New cards

Overfitting

  • if you throw a bunch of stuff in, you will find something that will stick but you might just be fitting the noise

28
New cards

Issues in Multiple Regression — Multicollinearity

  • redundancy among predictors

  • when 2 predictors are 100% related they have perfect collinearity (makes it impossible to get an accurate measure of variance in the outcome being caused by the predictor

  • can be tested using VIF or tolerance

29
New cards

Testing Collinearity

  • if 

    • VIF > 10

    • Tolerance < 0.1

  • then there is an issue with multicollinearity

30
New cards

Issues in Multiple Regression — Outliers and Influential Cases

  • can pull the regression line

31
New cards

Issues in Multiple Regression — Sample Size

  • if sample size is too small you are making it more difficult to find an effect even if it is there

    • if you do manage to find an effect it might be hard to generalize to a larger population

32
New cards

Determining Ideal Sample Size

  • overall fit: 50 + 8(k)

  • individual fit: 104 + (k)

    • k = number of predictors

33
New cards

Categorical Variables in Multiple Regression

  • where no meaningful order can be defined for the levels, different levels do not reflect equal distances between one another

    • e.g., religion, ethnicity, gender, academic major

34
New cards

Representing Categorical Variables

  • use dummy coding

35
New cards

Categorical Variables with only 2 groups/levels in Simple Regression

  • the constant = mean of y for the group designated as 0

  • the regression coefficient = difference between the 2 means

36
New cards

Categorical Variables

  • any categorical variable can be included in a regression analysis

  • if a categorical variable has more than 2 levels, the number of predictor variables needed will always be:

    • # of groups - 1

37
New cards

Methods of Multiple Regression

  • Simultaneous

  • Hierarchical

  • Stepwise (usually avoided because they’re hard to replicate)

    • forward

    • bidirectional

    • backward

38
New cards

Strategies for Selecting Predictor Variables — Hierarchical Analyses

  • order determined by

    • causal priority

    • research relevance

39
New cards

Hierarchical Multiple Regression

  • specifies a series of equations (a priori)

40
New cards

Hierarchical Multiple Regression — Hypotheses

  • overall model

    • R21 = 0

    • R22 = 0

  • comparison among predictors

    • M1: b1 = b2

    • M2: b1 = b2

  • specific predictors

    • M1: b1 = 0

    • M1: b2 = 0

    • M2: b1 = 0

    • M2: b2 = 0

  • difference between models

    • deltaF = 0