Discussion of multiple regression involves addressing several key assumptions that need to be met to ensure the reliability of the regression analysis.
Importance of Sample Size: A larger sample size is preferred for stable estimates and evaluating statistical significance.
Encourages stability in estimates of coefficients and the overall model.
Essential for assessing significance levels like R-squared, individual parameters, and t-tests.
Statistical Power: Refers to the ability to detect effects when they exist.
Guidelines suggest 10-15 cases per predictor as a minimum, but more is better.
Minimum Acceptable Sample Size:
Field's suggestion: Minimum of 50 + 8 times the number of predictors for overall models.
For individual predictors, 104 participants plus the number of predictors are recommended.
Realistic Expectations: Collect as many cases as possible, ideally around 200 if feasible.
Ensure participants are a representative sample, avoiding biased convenience samples.
Research Area Norms: Check what peers are collecting to gauge reasonable sample sizes in your field.
Constraints on Participant Availability: Sometimes bound by pre-existing datasets or specialized populations (e.g., rare conditions).
It remains valuable to conduct studies even with smaller sample sizes if they provide insights—especially in fields like mental health where participants may be limited.
Power Analysis: Crucial for determining required sample size based on expected relationships between predictors and outcomes.
Aim for 0.8 power—80% probability of correctly rejecting a false null hypothesis.
Strength of Relationships: Assess through bivariate correlations or existing research to gauge effect sizes and impact.
Definition: Occurs when predictors in the model are highly correlated, causing issues in model estimation.
Strong correlations obscure individual predictor effects, complicating interpretations.
Assessing Severity of Multicollinearity:
Raised standard errors of estimates and reduced R-squared values.
Ideal vs. Problematic Cases:
An ideal regression would show no multicollinearity; predictors explain unique variance in outcomes.
Perfect multicollinearity leads to inability to differentiate predictors.
Correlation Matrix: Use simple correlations to check for high correlations (e.g., >0.8 to 0.9).
Moderate correlations may also lead to compounded issues in regression.
Beta Weights and Diagnostics: Utilize diagnostics such as tolerances and variance inflation factors (VIF) to assess multicollinearity.
VIF: A VIF greater than 10 signals potential multicollinearity.
Tolerances: Values lower than 0.2 indicate concern.
Identification of Outliers: Outliers can skew regression results and violate normal distribution assumptions.
Univariate outliers affect single variable distributions, while multivariate outliers disrupt prediction accuracy in combination analyses.
Residuals and Diagnosing Outliers:
Analyzing residuals can uncover extreme prediction errors.
Generally, only 5% of residuals should exceed two standard deviations from the mean in a normal distribution.
Investigation of Outliers: Understand why outliers occurred before deciding to retain or remove them.
Possible reasoning includes coding errors or respondent biases.
Can run analyses with and without outliers to gauge impact on results.
Best Practices: Always be transparent about how outliers were treated in analysis.
Report findings and methodologies clearly to maintain integrity.