JC

PN2002 Methodology Workshop 3 - Regression

Regression Workshop Overview

In this workshop on regression methodology, attendees, led by Dr. Mike Oram, delve into regression's purpose, applications, and the key assumptions underlying various regression techniques. The session is structured to cover introduction to regression, simple regression, and multiple regression, with practical insights derived from JASP output.

Introduction to Regression

What is Regression?
Regression is a family of statistical methods utilized for predicting relationships among variables, often highlighting causal associations. Linear regression is specifically applied when variables are thought to relate linearly. It is essential to verify this linearity via scatterplots prior to implementing a regression model.

Why Not Just Use Correlation?
Correlation measures the degree of association between two variables, but it does not imply causation nor assist in predicting one variable's effect on another. For instance, regression can inform us how much effort or resources to allocate towards interventions, such as assessing the implications of funding for education or healthcare initiatives.

Regression Formula
The linear regression equation is expressed as:
[ y = a + bx ]
where:

  • y is the predicted outcome.
  • x is the predictor variable.
  • a is the y-intercept of the regression line.
  • b is the slope of the regression line, indicating how much y changes for a unit increase in x.
    For example, given a specific scenario, the regression model provides a precise quantitative suggestion on interventions needed to achieve desired outcomes.

Key Assumptions in Regression

The accuracy of regression results relies on several key assumptions, including:

  • Linearity: A linear relationship between predictor and outcome variables.
  • Data Type: Use interval/ratio (scale) data for valid results.
  • Normality: Outcome variable residuals should follow a normal distribution.
  • Homoscedasticity: Residuals should show constant variance across all levels of a predictor. Heteroscedasticity occurs when this assumption does not hold true, leading to less reliable interpretations.
  • Independence: Each predictor should contribute unique variance and not correlate excessively with others (multicollinearity). This is particularly important in multiple regression analyses.

Residuals, or the deviations of observed outcomes from predicted outcomes, are crucial for checking linearity and homoscedasticity in regression models. Visual aids like plots from JASP can help identify these conditions in data.

Simple Linear Regression

Simple linear regression analyzes how a single predictor variable influences an outcome variable. The formula remains:
[ y = a + bx ]
In practice, JASP output provides crucial components like homoscedasticity plots, residual histograms, and Q-Q plots that validate the regression model's assumptions and graphically show how the model performs relative to the data.

Reporting Regression Results

When reporting results from simple regression, crucial elements to include are:

  • Descriptive statistics for both predictor and outcome variables, including variance inflation factors (VIF) to assess multicollinearity.
  • The proportion of variance explained by the model (R²) and results from ANOVA, which compares the regression model's effectiveness against a mean-based approach.
  • It is vital to assess and report the significance and coefficients of predictors in context, using standardized tables and graphs to summarize findings effectively.

Multiple Linear Regression (MLR)

MLR extends the concept of simple regression by allowing predictions based on multiple predictors. The formula can be expressed as:
[ y = a + b1x1 + b2x2 + … + bnxn ]
Methods for MLR can include hierarchical or stepwise entry of predictors, where predictors are added based on pre-defined theories or based on their correlation with the outcome, respectively. However, the simultaneous entry, also referred to as the "enter" method in JASP, is the most commonly used approach as it allows for evaluating all predictors simultaneously.

Reporting MLR Results
MLR result reporting follows a similar structure to simple regression, emphasizing validation through homoscedasticity plots, descriptive statistics, multicollinearity check, and summarized coefficients from regression tables. A clear explanation is required, detailing the contributions of predictors to the model while acknowledging any limitations such as heteroscedasticity.

Concluding Remarks

Overall, regression analysis serves as a powerful tool in statistics for understanding and predicting relationships between variables, provided that the underlying assumptions are rigorously tested and adhered to in reporting the results effectively, whether in simple or multiple regression contexts.