Mediation in Regression: Comprehensive Study Notes
Introduction to Mediation in Regression
Background:
Mediation in regression is an advanced topic building on simple regression, which focuses on the relationship between two variables.
It extends the concept from mere correlation to an attempt to determine causation between variables, even though "correlation doesn't equal causation."
This involves structural models or Structural Equation Modeling (SEM), which are models of correlation with a causal aspect.
Path Analysis:
A simpler version of SEM, focusing on observed variables (e.g., psychological constructs) rather than theoretical mathematical constructs.
Path analysis models where a variable (predictor) affects or influences another variable (outcome) directly, and also potentially indirectly through another variable called the mediator (M).
This involves examining 'paths' between variables – a direct path from the predictor (X) to the outcome (Y), and an indirect path through a mediator (X\toM\toY).
Formal Definition of Mediators
Andrew Hayes' Definition: A simple mediation model is any causal system in which at least one causal antecedent (X variable) is proposed as influencing a Y outcome through a single intervening M (mediator) variable.
Simple Model Diagram: Shows a predictor (X) connected to an outcome (Y), with a separate variable, the mediator (M), also connecting to X and Y, indicating separate relationships through M.
Example of a Complicated Model (Path Analysis):
Predictors: Coping Humor, Negative Life Events.
Outcome: Depression Score.
Mediators (two): Positive Affect, Negative Affect.
Direct Effects: Paths where predictors directly influence the outcome.
Coping Humor \to Depression (negative relationship, e.g., more humor means less depression).
Negative Life Events \to Depression (positive relationship, e.g., more negative events means more depression).
Indirect Effects: Paths where predictors influence the outcome through a mediator.
Negative Life Events \to Negative Affect \to Depression (positive relationships along this path).
Coping Humor \to Positive Affect \to Depression (positive to positive, then negative to outcome).
Significance: This model highlights that without considering mediators, the full interplay of factors causing depression might be missed, leading to a stronger understanding.
Traditional Path Analysis Mechanics
Traditional Requirement: Traditionally, for path analysis, significant correlations were needed between your predictors (X) and outcomes (Y).
Regression-Based Approach: Path analysis is essentially an extension of standard multiple regression, where multiple regression is used to determine each relationship:
Mediators (M) are regressed onto predictors (X).
The outcome (Y) is regressed onto both predictors (X) and mediators (M).
Example Regression Paths in a Complex Model:
Positive Affect (Mediator) is regressed onto both Coping Humor and Negative Life Events.
Negative Affect (Mediator) is regressed onto both Coping Humor and Negative Life Events.
Depression (Outcome) is regressed onto Positive Affect, Negative Affect, Coping Humor, and Negative Life Events.
Path Coefficients: These represent the strength and direction of the relationships between variables along each path. These coefficients can be independently tested for significance (e.g., with p-values).
Direct Effects: The direct relationship between X and Y. For example, a coefficient of +0.25 (p-value significant) vs. -0.09 (p-value not significant).
Indirect Effects: Calculated by multiplying the individual path coefficients along the indirect pathway (e.g., (X \to M coefficient) * (M \to Y coefficient)). The product's significance is then assessed.
Total Effect: The overall single numeric relationship between predictors (X) and the outcome (Y), assuming mediators are in the model. It's akin to what's found in simple regression, but direct and indirect effects offer more nuanced understanding.
Historical Context: Path analysis originated with Sewall Wright in the 1920s. It became more popular in psychology from the 1970s as computing power increased, allowing for complex calculations. Modern software has standardized and simplified this process, now commonly referred to as "mediation analysis."
Constructing a Mediation Model in Modern Context
The Idea: Before testing mediation, there must be a plausible effect to mediate. This often stems from prior research indicating a significant direct relationship between your predictor (X) and outcome (Y).
Theoretical Construction: A mediation model is built upon existing empirical research and a strong theoretical basis:
Established X-Y Relationship: Previous research suggests X predicts Y (e.g., Parental Care \to Self-Efficacy).
X-M Relationship: Research also suggests X is related to M (e.g., Parental Care \to Self-Esteem).
M-Y Relationship: Separate research shows M is related to Y (e.g., Self-Esteem \to Self-Efficacy).
Combining these suggests that M (Self-Esteem) might mediate the X-Y relationship, meaning X influences Y partly through M.
Causal Aspect: Mediation allows researchers to discuss causal processes, not just correlations (e.g., Parental Care causes an increase in Self-Efficacy partly through Self-Esteem).
Simple Mediation Model (Example):
X: Environmental Exploration (predictor)
Y: Perceived Employability (outcome)
M: Career Adaptability (mediator)
Path A: X \to M (e.g., high environmental exploration leads to greater career adaptability).
Path B: M \to Y (e.g., high career adaptability leads to greater perceived employability).
Standard Terminology:
Predictor = X
Outcome/Criterion = Y
Mediator = M
Total Effect (overall effect of X on Y, including M's influence) = c
Direct Effect (of X on Y, removing M's influence) = c' (c-prime)
Indirect Effect = a \times b (product of path A and path B coefficients)
Baron and Kenny's Four Steps (Traditional 1986 Approach)
This traditional method for establishing mediation, while now having some criticisms, provides a fundamental understanding:
Show that X is correlated with Y:
The total effect (c path) must be statistically significant.
This step is considered necessary to establish that there is an effect to be mediated in the first place.
Traditional Implication: If this correlation is not significant, analysis stops.
Show that X is correlated with M:
The relationship between the predictor (X) and the mediator (M) must be significant (Path A).
This is treated as a simple regression where M is the outcome variable.
Show that M predicts Y independent of X:
The relationship between the mediator (M) and the outcome (Y) must be significant, while controlling for the predictor (X) (Path B).
Traditionally, this involved a hierarchical regression, entering X as the first model, then M, to isolate M's unique contribution to Y.
Difference between Partial and Full Mediation:
Full (Complete) Mediation: Occurs when the direct effect of X on Y (c') becomes non-significant (or zero) after controlling for M. This implies M completely explains the X to Y relationship.
Partial Mediation: Occurs when both the direct effect of X on Y (c') and the indirect effect (a \times b) are statistically significant. M plays a role, but X still has a direct, significant influence on Y.
Modern Perspectives on Baron and Kenny's Steps
Step 1 (X-Y correlation) - Increasingly Questioned:
Modern mediation analysis, as articulated by Andrew Hayes, no longer imposes evidence of a simple association between X and Y as a precondition.
It's possible for the total effect (c) to be non-significant, but the indirect effect (a \times b) to be significant (e.g., if there are multiple indirect paths with opposing influences that cancel out the total effect).
The focus is now primarily on the significance of the indirect effect itself.
Step 4 (Full vs. Partial Mediation) - Less Critical:
The distinction between full and partial mediation is less emphasized.
Partial mediation is much more common and accepted in social sciences, as rarely does a single mediator completely explain a relationship.
Modern practice often focuses simply on whether an indirect effect is significant, acknowledging that X often has both direct and indirect influences on Y.
Summary of Modern Essential Steps: Steps 2 (Path A) and 3 (Path B) remain highly important for establishing mediation in a contemporary setting.
Ethical/Practical Implications: Adhering strictly to traditional rules (like significant X-Y total effect) might cause researchers to miss genuine indirect effects, leading to an incomplete understanding of complex phenomena.
Testing the Indirect Effect: Sobel Test vs. Bootstrapping
The Goal: To test the statistical significance of the indirect effect (a \times b), which can also be understood as the difference between the total effect (c) and the direct effect (c'): ab = c - c'.
1. Sobel Test (Traditional Method, 1982):
A "normal theory approach" that relies on the assumption of a normal distribution for the sampling distributions of path A and path B.
Calculates a Z-score for the indirect effect using the formula: Z = \frac{ab}{\sqrt{b^2SEa^2 + a^2SEb^2 + SEa^2SEb^2}} (where SEa and SEb are the standard errors of path A and path B, respectively).
Interpretation: If the absolute value of the calculated Z-score is greater than 1.96, the indirect effect is considered statistically significant at p < 0.05.
Limitations:
Low power: The Sobel test is conservative, making it harder to detect a significant indirect effect even when one exists.
Requires large samples: Its effectiveness diminishes with smaller sample sizes.
Normality Assumption: The assumption that the sampling distributions of a and b are normal is often difficult to verify and may not hold true in practice.
2. Bootstrapping (Modern, Preferred Method):
A non-parametric, assumption-free method that does not require the assumption of normal distribution for the indirect effect.
Process:
Repeatedly takes a large number of random samples (e.g., 5,000 or 10,000 as default in software) with replacement from the original dataset.
For each sample, the indirect effect (a \times b) is calculated.
These thousands of indirect effect values form an empirical sampling distribution.
Confidence Intervals (CIs): From this distribution, a confidence interval (e.g., 95% CI) is constructed around the indirect effect.
Interpretation: The indirect effect is considered statistically significant if the 95% CI does not contain zero.
Example: A CI of [0.24, 0.51] is significant (no zero).
Example: A CI of [-0.24, 0.51] is not significant (zero is included).
Advantages: More robust, more powerful, suitable for various sample sizes, and does not rely on distributional assumptions. Modern software facilitates this process.
Note: Bootstrapped results may vary slightly across multiple runs due to the random sampling, but with a sufficient number of samples (e.g., 5,000), these variations are usually negligible.
Performing Mediation Analysis with SPSS PROCESS Macro
Hayes' PROCESS Macro (Version 4.2+): A widely used, free add-on for SPSS that automates mediation and moderation analysis, providing bootstrapped confidence intervals.
Example Model for Analysis:
Predictor (X): Environmental Exploration (continuous)
Mediator (M): Career Adaptability (continuous)
Outcome (Y): Perceived Employability (continuous)
Sample Size (N): 272
This is Model 4 in PROCESS (the standard simple mediation model with one X, one M, and one Y).
Steps in SPSS:
Initial Correlations: Generate a bivariate correlation table for X, M, and Y (Analyze \to Correlate \to Bivariate). This checks initial relationships and is important for reporting.
Access PROCESS: Go to Analyze \to Regression \to Process (once the macro is installed).
Input Variables: Drag and drop your variables into the designated fields for X, Y, and M.
Select Model: Choose "Model 4" (for single mediator).
Set Bootstrapping Options: Ensure the 95% Confidence Interval is selected.
The default number of bootstrap samples is 5,000, which is generally sufficient.
Ensure options to report bootstrap confidence intervals are selected.
Options: Further options can be selected, such as effect sizes and standardized coefficients (often left at default for basic analysis).
Click "Continue" and then "OK" to run the analysis and generate the output.
Interpreting SPSS PROCESS Output (Example Analysis)
The PROCESS output is structured to show the relationships between variables, including direct, indirect, and total effects.
Path A: X \to M (e.g., Environmental Exploration \to Career Adaptability):
Output Section: Typically an initial regression block where the mediator (M) is the outcome variable, and the predictor (X) is the independent variable.
Interpretation: Look at the regression coefficient, t-value, and p-value. A significant positive (e.g., p < 0.001, coefficient +0.41) coefficient indicates that higher Environmental Exploration leads to significantly higher Career Adaptability.
Note: This part of the output uses ordinary least squares (OLS) regression, not bootstrapping.
Path B: M \to Y controlling for X (e.g., Career Adaptability \to Perceived Employability controlling for Environmental Exploration):
Output Section: A subsequent regression block where the outcome variable (Y) is the dependent variable, and both X and M are independent variables.
Interpretation: Focus on the coefficient, t-value, and p-value for the mediator (M). A significant positive coefficient (e.g., p < 0.001, coefficient +0.75) indicates that Career Adaptability significantly predicts Perceived Employability, even after controlling for Environmental Exploration.
Note: This also uses OLS regression.
Direct Effect (c'): X \to Y controlling for M (e.g., Environmental Exploration \to Perceived Employability controlling for Career Adaptability):
Output Section: Found within the same regression block as Path B, but focusing on the predictor (X) when Y is the outcome.
Interpretation: Examine the coefficient, t-value, p-value, and OLS-based confidence intervals for X. If the p-value is > 0.05 (e.g., p = 0.0527) and the CI includes zero (e.g., [-0.01, 0.40]), the direct effect is not significant. This suggests full mediation where the mediator accounts for the entire X-Y relationship.
Total Effect (c): X \to Y (e.g., Environmental Exploration \to Perceived Employability):
Output Section: PROCESS often provides a separate summary for the total effect. It represents a simple regression of Y on X without explicitly accounting for M.
Interpretation: Look at the coefficient, t-value, and p-value. A significant p-value (e.g., p < 0.001, coefficient +0.53) and CI not containing zero (e.g., [0.29, 0.77]) indicates a significant overall relationship between X and Y.
Note: It's possible to have a significant total effect but a non-significant direct effect (as seen in this example), implying the mediator is doing all the work.
Bootstrapped Indirect Effect (ab) - The Key Finding:
Output Section: This is usually presented in a dedicated section at the very end of the output, often titled "Total, Direct, and Indirect Effects of X on Y".
Interpretation: Look for the "Effect" value (the product ab, e.g., 0.31) and crucially, the Bootstrapped Lower Level Confidence Interval (BootLLCI) and the Bootstrapped Upper Level Confidence Interval (BootULCI).
Significance Rule: If the 95% CI does not contain zero (e.g., BootLLCI = 0.18, BootULCI = 0.43), then the indirect effect is statistically significant.
Example Conclusion: Since 0.18 and 0.43 are both positive, zero is not between them, hence the indirect effect is significant. Coupled with the non-significant direct effect, this indicates full mediation.
Bootstrapped Direct Effect (Optional but Recommended):
The PROCESS macro also provides bootstrapped confidence intervals for the direct effect (c').
Potential Discrepancy: Occasionally, the bootstrapped direct effect might be significant (CI not containing zero) even if the OLS-based direct effect was not (e.g., p = 0.0527 changing to significance with bootstrapping).
Reporting: Bootstrapped values are generally considered more accurate. If a discrepancy occurs, it's often advisable to report the bootstrapped findings. In the example, if the bootstrapped direct effect became significant, it would suggest partial mediation instead of full mediation.
Reporting Mediation Findings
General Principles: Follow APA style guidelines for reporting statistical results. Maintain clarity and precision.
Narrative Write-up Example:
State the analysis performed (non-parametric bootstrapping via PROCESS v4.2).
Specify the model (simple mediation, Model 4) and variables (predictor, mediator, outcome).
Define the criterion for mediation (e.g., 95% bootstrap CI for indirect effect not containing zero).
Report the number of bootstrap samples (e.g., 5,000).
Key Findings:
Report the significant indirect effect (ab effect value, with BootLLCI and BootULCI).
Report the significance of Path A and Path B (coefficients and p-values).
Report the significance (or non-significance) of the direct effect (c' path), including the decision to use OLS or bootstrapped values.
Interpret the meaning of the mediation in plain language related to your hypothesis.
Refer to supporting figures and tables.
Example Interpretation: "People who engage in greater environmental exploration were more likely to be more adaptable in relation to their career and in turn more likely to perceive greater employability."
Visual Aids:
Figure (Path Diagram): Highly recommended. Visually represents the model (X, Y, M) and displays the unstandardized regression coefficients and p-values for Path A, Path B, Direct Effect (c'), and optionally the Total Effect (c). This immediately clarifies the model and results.
Table (Regression Coefficients): A standard APA-style table typically lists unstandardized coefficients (B), standard errors (SE), t-values, p-values, and confidence intervals for all paths to provide comprehensive details.
Final Notes on Reporting:
Clearly explain the rationale for testing the mediation model.
Report all criteria (Path A, B, direct effect, total effect) before discussing the indirect effect.
Always report bootstrap confidence intervals for the indirect effect.
Interpret findings contextually and refer back to your original hypotheses.
Advanced Considerations in Mediation
Covariates:
The PROCESS macro allows inclusion of covariates (control variables) in the mediation model.
Confounding Variables: These are third variables that causally affect both the predictor (X) and the outcome (Y), potentially creating a spurious relationship or obscuring the true one.
If a variable is not of primary interest as a mediator but might influence the X-Y relationship, it can be entered as a covariate to statistically control for its effect.
Example: If studying TV watching (X) and obesity (Y), parental behavior (e.g., encouraging healthy lifestyle) could be a confounding covariate.
Multiple Mediators: PROCESS can handle more complex models.
Parallel Multiple Mediation: Involves multiple mediators (e.g., M1, M2) all operating simultaneously between X and Y (X \to M1 \to Y and X \to M2 \to Y).
Serial Multiple Mediation: Involves mediators arranged in a sequence, where one mediator influences the next (X \to M1 \to M2 \to Y).
These models require extensive theoretical justification and are more advanced than simple mediation (Model 4).
Conclusion
Mediation analysis provides powerful tools to understand the mechanisms or how one variable influences another, enhancing the sophistication of regression analysis beyond simple relationships.
This lecture sets the stage for the next topic: Moderation, which explores when or for whom an effect occurs, offering another layer of complexity to regression models.