1/32
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced |
|---|
No study sessions yet.
the regression model
Y=a + b1X1 + b2X2 + … + bnXn + e
a - constant
bn - regression coefficients (they represent the change in the dependent variable for a one-unit change in the corresponding independent variable, holding other variables constant)
R2
the proportion of variance in the dependent variable explained by the model
R2=SSR/SST
multiple r
quantifies the strength and direction of the linear relationship
Factors determining the SE of the model’s coefficients
worse model fit → larger residuals + standard errors
too many variables or to few cases collected
large collinearity → larger standard errors
worse model fit
the model does not fit the data well (R2 is low), this can result in larger residuals and, consequently, larger standard errors for the coefficient estimates.
collecting more cases relative to the number of predictors
can help improve the standard errors of the coefficient estimates
high collinearity
makes it challenging to isolate the individual effects of each predictor
standardized regression coefficients (beta coefficients/weights)
obtained when both the dependent and independent variables are standardized
→ larger the absolute value of the standardized coefficient, the more impact the corresponding predictor has
Beta = b*[(SD X)/(SD Y)]
Y= a + b1X1 + b2X2 + bkXk
Z (Y) = beta1Z1 + beta2Z2 +b betakZk
Collinearity
refers to high correlations between independent variables
inflates standard error
leads to multicollinearity issues
Solutions:
removing one or more of the highly correlated variables
combining highly correlated variables into a composite variable
Beta Weights
are useful and interpretable if the independent variables are noncollinear
depend on the method used to include the predictors in the equation
the size of the beta weights depends strictly on the variables in the equation
beta weights cannot be compared between different studies because they are very sensitive to sample statistics
Variance Inflation Factor
measures how much the variance of an estimated regression coefficient increases if your predictors are correlated
>4 - collinearity
>10 severe collinearity
Tolerance
~ 1 - little collinearity
~ 0 - high collinearity
< 0,25 - multicollinearity might exist
solutions for collinearity
removing one or more of the highly correlated variables
combining highly correlated variables
model building methods
○ Standard Method (jamovi)
○ Backward Method
○ Forward Method
○ Stepwise Method (SPSS)
○ All Possible Sub-Sets Method
○ Sequential/Hierarchical Regression (optional in jamovi)
○ Blockwise/Factor Scores Method
not collinear predictors
there is no difference in the results obtained with the different methods
standard method (forced entry)
including all available predictors in a regression model without considering their correlation with the criterion variable or with each other
standard method - advantages
no subjectivity (about excluding or including predictors)
comparability across samples ((the same set of predictors are used in every analysis)
standard method - disadvantages
collinearity (if there is a high correlation among predictors, it can lead to inflated standard errors → challenging to identify the individual contribution of each predictor)
inclusion of irrelevant predictors
reduced statistical power (presence of non-informative/collinear predictors)
sequential/hierarchical method
systematic approach of entering predictor variables based on predefined criteria
→ the data analyst uses prior knowledge, domain expertise, or conceptual analyses to determine which predictors should be included in the regression model, and in which order
ΔR2
incremental F-statistics
ΔR2 = (R2k+1 - R2k)
quantifies the improvement in the goodness of fit of a regression model when an additional predictor or group of predictors is added
assesses increase in the explained variance in the dependent variable due to the inclusion of new variables
incremental F-statistic
assess the significance of adding a predictor or group
determine whether the inclusion of a set of predictors improves the overall fit of the model in a statistically significant way
Incorporating categorical predictors into regression analysis
dummy coding (binary) - nominal categories
ordinal coding - assigning numerical values to predictors based on order
the constant in dummy coding
the mean of the reference category
polynominal regression
allows for a more flexible relationship by introducing polynomial terms
Y = a + b1X + b2X2 + b3X3+ ... bnXn + e
common in developmental psychology
choosing the appropriate degree of polynomial regression
explanatory data analysis (scatter plots, correlation analysis, other visualizations)
deciding on a reasonable range of powers
start with simplest model → add higher-order terms
evaluate its performance
to assess whether the addition of a higher-order power significantly increases the R-squared value - use statistical tests
until a significant improvement in R-squared is no longer observed
recommendation: centering the X
centering X
prevents collinearity
X → X - Mean
new variability has a mean of 0
assumptions of regression analysis
linearity of the relationship
homogeneity of variances
no outliers
normal distribution
prediction errors are independent and distributed randomly
no multicollinearity
high correlations between variables in regression analysis
lead to unstable coefficient estimates and inflated standard errors
residuals
the differences between the observed values and the values predicted by the regression model
→ key role in checking assumptions of regression analysis
linearity assumption
to support linearity, we anticipate having 50% positive and 50% negative residuals at each level of the fitted values
homogenity assumption
if the pattern of residuals form a rectangular shape (in linear relationship)
normality assumption
histogram
Q-Q plot
Shapiro-Wilks
deviations: possible outliers or transformations of variables
independence assumption
violated: systematic pattern in the residuals
analysing the autocorrelation function of residuals
Durbin-Watson Test