Linear regression

0.0(0)
studied byStudied by 0 people
0.0(0)
full-widthCall Kai
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
GameKnowt Play
Card Sorting

1/32

encourage image

There's no tags or description

Looks like no tags are added yet.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

33 Terms

1
New cards

the regression model

Y=a + b1X1 + b2X2 + … + bnXn + e

a - constant

bn - regression coefficients (they represent the change in the dependent variable for a one-unit change in the corresponding independent variable, holding other variables constant)

2
New cards

R2

the proportion of variance in the dependent variable explained by the model

  • R2=SSR/SST

3
New cards

multiple r

quantifies the strength and direction of the linear relationship

4
New cards

Factors determining the SE of the model’s coefficients

  • worse model fit → larger residuals + standard errors

  • too many variables or to few cases collected

  • large collinearity → larger standard errors

5
New cards

worse model fit 

the model does not fit the data well (R2 is low), this can result in larger residuals and, consequently, larger standard errors for the coefficient estimates.

6
New cards

collecting more cases relative to the number of predictors

can help improve the standard errors of the coefficient estimates

7
New cards

high collinearity

makes it challenging to isolate the individual effects of each predictor

8
New cards

standardized regression coefficients (beta coefficients/weights)

obtained when both the dependent and independent variables are standardized

→ larger the absolute value of the standardized coefficient, the more impact the corresponding predictor has

  • Beta = b*[(SD X)/(SD Y)]

  • Y= a + b1X1 + b2X2 + bkXk

  • Z (Y) = beta1Z1 + beta2Z2 +b betakZk

9
New cards

Collinearity

refers to high correlations between independent variables

  • inflates standard error

  • leads to multicollinearity issues

Solutions:

  • removing one or more of the highly correlated variables

  • combining highly correlated variables into a composite variable

10
New cards

Beta Weights

are useful and interpretable if the independent variables are noncollinear

  • depend on the method used to include the predictors in the equation

  • the size of the beta weights depends strictly on the variables in the equation

  • beta weights cannot be compared between different studies because they are very sensitive to sample statistics

11
New cards

Variance Inflation Factor

measures how much the variance of an estimated regression coefficient increases if your predictors are correlated

  • >4 - collinearity 

  • >10 severe collinearity 

12
New cards

Tolerance

  • ~ 1 - little collinearity

  • ~ 0 - high collinearity

  • < 0,25 - multicollinearity might exist

13
New cards

solutions for collinearity

  • removing one or more of the highly correlated variables

  • combining highly correlated variables

14
New cards

model building methods 

Standard Method (jamovi)

Backward Method 

Forward Method

Stepwise Method (SPSS)

All Possible Sub-Sets Method

Sequential/Hierarchical Regression (optional in jamovi)

Blockwise/Factor Scores Method 

15
New cards

not collinear predictors

there is no difference in the results obtained with the different methods

16
New cards

standard method (forced entry)

including all available predictors in a regression model without considering their correlation with the criterion variable or with each other

17
New cards

standard method - advantages

  • no subjectivity (about excluding or including predictors)

  • comparability across samples ((the same set of predictors are used in every analysis)

18
New cards

standard method - disadvantages

  • collinearity (if there is a high correlation among predictors, it can lead to inflated standard errors → challenging to identify the individual contribution of each predictor)

  • inclusion of irrelevant predictors 

  • reduced statistical power (presence of non-informative/collinear predictors) 

19
New cards

sequential/hierarchical method

systematic approach of entering predictor variables based on predefined criteria

→ the data analyst uses prior knowledge, domain expertise, or conceptual analyses to determine which predictors should be included in the regression model, and in which order

  • ΔR2

  • incremental F-statistics

20
New cards

ΔR2 = (R2k+1 - R2k)

quantifies the improvement in the goodness of fit of a regression model when an additional predictor or group of predictors is added

  • assesses increase in the explained variance in the dependent variable due to the inclusion of new variables

21
New cards

incremental F-statistic 

  • assess the significance of adding a predictor or group

  • determine whether the inclusion of a set of predictors improves the overall fit of the model in a statistically significant way

22
New cards

Incorporating categorical predictors into regression analysis

  • dummy coding (binary) - nominal categories

  • ordinal coding - assigning numerical values to predictors based on order 

23
New cards

the constant in dummy coding

the mean of the reference category

24
New cards

polynominal regression

allows for a more flexible relationship by introducing polynomial terms

  • Y = a + b1X + b2X2 + b3X3+ ... bnXn + e

common in developmental psychology

25
New cards

choosing the appropriate degree of polynomial regression

  1. explanatory data analysis (scatter plots, correlation analysis, other visualizations)

  2. deciding on a reasonable range of powers

  3. start with simplest model → add higher-order terms

  4. evaluate its performance

    • to assess whether the addition of a higher-order power significantly increases the R-squared value - use statistical tests

  5. until a significant improvement in R-squared is no longer observed

  • recommendation: centering the X

26
New cards

centering X

prevents collinearity

  • X → X - Mean

new variability has a mean of 0

27
New cards

assumptions of regression analysis

  • linearity of the relationship

  • homogeneity of variances 

  • no outliers 

  • normal distribution

  • prediction errors are independent and distributed randomly 

  • no multicollinearity 

28
New cards

high correlations between variables in regression analysis

lead to unstable coefficient estimates and inflated standard errors

29
New cards

residuals

the differences between the observed values and the values predicted by the regression model

→ key role in checking assumptions of regression analysis

30
New cards

linearity assumption

to support linearity, we anticipate having 50% positive and 50% negative residuals at each level of the fitted values

31
New cards

homogenity assumption

if the pattern of residuals form a rectangular shape (in linear relationship)

32
New cards

normality assumption

  • histogram 

  • Q-Q plot 

  • Shapiro-Wilks

deviations: possible outliers or transformations of variables

33
New cards

independence assumption

violated: systematic pattern in the residuals

  • analysing the autocorrelation function of residuals

  • Durbin-Watson Test