Regression

Regression Analysis Overview

  • Definition: Statistical technique to find the best-fitting straight line for a dataset.

  • Purpose: Allows for predictions based on correlations between variables.

  • Linear Relationship: Expressed mathematically as Y = bX + a.

Correlation vs. Regression

  • Correlation: Analysis of the relationship between two variables.

  • Regression: Uses one variable to make predictions about another.

  • X and Y Variables: Each individual has an x-coordinate (X) and a corresponding y-coordinate (Y).

The Linear Equation: Y = bX + a

  • X: Any score on X.

  • Y: Corresponding score on Y based on the value of X.

  • a (Y-intercept): Value of Y when X = 0.

  • b (slope): The change in Y for each 1-point increase in X.

Example: Netflix Costs

  • Weekly Membership: $5.00 fee + $2.00 per movie streamed.

  • Equation: Y = 2X + 5.

    • Scenarios:

      • X = 0: Y = 2(0) + 5 = 5

      • X = 3: Y = 2(3) + 5 = 11

      • X = 8: Y = 2(8) + 5 = 21

The Regression Line

  • Definition: Line that best fits the data, assuming a linear relationship.

  • Best Fit Line: Minimizes the distances from data points to the line.

  • Key Functionality:

    • Identifies central tendency.

    • Provides best prediction of Y values.

  • Error Calculation: Error = Y - Ŷ (actual Y minus predicted Y).

  • Total Squared Error: Sum of squared distances.

Key Concepts in Regression

  • Standard Error of Estimate (SEE): Measures prediction accuracy; indicates the standard distance between the regression line and actual data points, similar to standard deviation.

  • Variability: r² (coefficient of determination) indicates the proportion of variability in Y predicted by X.

    • Example: If r = 0.80, r² = 0.64 (64% variability accounted for).

Types of Regression

  • Simple Linear Regression: Predicts the dependent variable (DV) from one independent variable (IV).

  • Multiple Linear Regression: Predicts DV from 2 or more IVs.

    • IV Characteristics: Orthogonal (independent contributions), non-collinear (no multicollinearity).

  • Hierarchical Multiple Regression: Specific order of IV entry based on hypotheses.

  • Stepwise Multiple Regression: IVs entered based on software analysis.

Regression Assumptions

  • Variable Type:

    • DV: Continuous.

    • IV: Continuous or dichotomous (dummy-coded 0/1).

  • Linearity: DV must be linearly related to IV; check residual plots for curvilinear patterns.

  • Sample Size:

    • Correlation: Minimum n = 30.

    • Regression: Minimum n = 20 per predictor.

  • Normal Distribution: Residuals should be normally distributed; use Shapiro-Wilk test or Q-Q plots.

  • Homoscedasticity: Variance of residuals remains constant; residual plots should not display cones.

  • Independence of Residuals: Uncorrelated residual terms; check with Durbin-Watson test (ideal score close to 2).

  • No Multicollinearity: IVs must not be highly correlated (r < .80); check Tolerance (> 0.20) and VIF (< 10).

Reporting in APA Format

  • Include:

    • F-value (ANOVA for model fit).

    • Degrees of freedom.

    • p-value.

    • r² (coefficient of determination).

    • Specific B scores (for 2+ IVs).

  • Example: "The regression model was statistically significant. Students’ level of Anxiety predicted Test-Taking Efficacy, F(2, 97) = 215.24, p < .001. Adjusted r² = 0.81 indicates a strong effect size (81% of variability explained)."

Regression Coefficients

  • Unstandardized Coefficients: B used for predicting Y; cannot compare across predictors unless IVs are on the same scale.

  • Standardized Coefficients (β): For relative strength comparisons of predictors.

    • Indicates strongest predictor.

    • β = r in simple linear regression (when IVs are uncorrelated).

Example Interpretation

  • Pearson’s r: Moderate, inverse relationship between Student Success and Student-Life Stress, r(397) = -0.29, p < .001.

  • Regression Model: Student-Life Stress predicted Success, F(1, 397) = 36.8, p < .001.

    • Adjusted r² = 0.08 (8% of variability in Success explained by Stress).

    • Coefficients: 1-point decrease in Stress increases Success by 0.26 points.

    • Academic Adjustment (β = .50, p < .001) stronger predictor than Stress (β = -0.11, p = .02).

Example Variable Assignment

  • Y-Axis: Dependent variable, such as length of stay after surgery.

  • X-Axis: Age - independent variable.

Regression Assumptions

  1. Variable Type:

    • DV: Continuous.

    • IV: Continuous or dichotomous (dummy-coded 0/1).

  2. Linearity:

    • DV must be linearly related to IV; check residual plots for curvilinear patterns.

  3. Sample Size:

    • Correlation: Minimum n = 30.

    • Regression: Minimum n = 20 per predictor.

  4. Normal Distribution:

    • Residuals should be normally distributed; use Shapiro-Wilk test or Q-Q plots.

  5. Homoscedasticity:

    • Variance of residuals remains constant; residual plots should not display cones.

  6. Independence of Residuals:

    • Uncorrelated residual terms; check with Durbin-Watson test (ideal score close to 2).

  7. No Multicollinearity:

    • IVs must not be highly correlated (r < .80); check Tolerance (> 0.20) and VIF (< 10).