PSCH 443 Multiple Regression 3 Evaluating Assumptions Part 1

Assumptions of Multiple Regression

  • Discussion of multiple regression involves addressing several key assumptions that need to be met to ensure the reliability of the regression analysis.

Sample Size Considerations

  • Importance of Sample Size: A larger sample size is preferred for stable estimates and evaluating statistical significance.

    • Encourages stability in estimates of coefficients and the overall model.

    • Essential for assessing significance levels like R-squared, individual parameters, and t-tests.

  • Statistical Power: Refers to the ability to detect effects when they exist.

    • Guidelines suggest 10-15 cases per predictor as a minimum, but more is better.

  • Minimum Acceptable Sample Size:

    • Field's suggestion: Minimum of 50 + 8 times the number of predictors for overall models.

    • For individual predictors, 104 participants plus the number of predictors are recommended.

  • Realistic Expectations: Collect as many cases as possible, ideally around 200 if feasible.

    • Ensure participants are a representative sample, avoiding biased convenience samples.

  • Research Area Norms: Check what peers are collecting to gauge reasonable sample sizes in your field.

Situations with Limited Participants

  • Constraints on Participant Availability: Sometimes bound by pre-existing datasets or specialized populations (e.g., rare conditions).

    • It remains valuable to conduct studies even with smaller sample sizes if they provide insights—especially in fields like mental health where participants may be limited.

Role of Power Analysis

  • Power Analysis: Crucial for determining required sample size based on expected relationships between predictors and outcomes.

    • Aim for 0.8 power—80% probability of correctly rejecting a false null hypothesis.

  • Strength of Relationships: Assess through bivariate correlations or existing research to gauge effect sizes and impact.

Multicollinearity

  • Definition: Occurs when predictors in the model are highly correlated, causing issues in model estimation.

    • Strong correlations obscure individual predictor effects, complicating interpretations.

  • Assessing Severity of Multicollinearity:

    • Raised standard errors of estimates and reduced R-squared values.

  • Ideal vs. Problematic Cases:

    • An ideal regression would show no multicollinearity; predictors explain unique variance in outcomes.

    • Perfect multicollinearity leads to inability to differentiate predictors.

Evaluating Multicollinearity

  • Correlation Matrix: Use simple correlations to check for high correlations (e.g., >0.8 to 0.9).

    • Moderate correlations may also lead to compounded issues in regression.

  • Beta Weights and Diagnostics: Utilize diagnostics such as tolerances and variance inflation factors (VIF) to assess multicollinearity.

    • VIF: A VIF greater than 10 signals potential multicollinearity.

    • Tolerances: Values lower than 0.2 indicate concern.

Outliers in Data

  • Identification of Outliers: Outliers can skew regression results and violate normal distribution assumptions.

    • Univariate outliers affect single variable distributions, while multivariate outliers disrupt prediction accuracy in combination analyses.

  • Residuals and Diagnosing Outliers:

    • Analyzing residuals can uncover extreme prediction errors.

    • Generally, only 5% of residuals should exceed two standard deviations from the mean in a normal distribution.

Handling Outliers

  • Investigation of Outliers: Understand why outliers occurred before deciding to retain or remove them.

    • Possible reasoning includes coding errors or respondent biases.

    • Can run analyses with and without outliers to gauge impact on results.

  • Best Practices: Always be transparent about how outliers were treated in analysis.

    • Report findings and methodologies clearly to maintain integrity.

robot