MD

Regression

Week Four Lecture: Linear Regression

Overview of the Lecture

  • Focus on the concept of linear regression, specifically multiple linear regression.

Correlation

  • Correlation Coefficient Definition: A statistical measure that describes the extent to which two variables are related.

  • Forms of Correlation:

    • Bivariate Correlation: Involves two sets of numbers (data set one and data set two).

  • Key Characteristics:

    1. Strength: Measured between -1 and 1.

    • Perfect Correlation: +1 or -1 (very rare in psychology).

    1. Direction: Positive or negative.

    • Zero Correlation: A correlation coefficient of 0.

Regression

  • Concept of Regression: A regression line, or line of best fit, is drawn through data points in a scatter plot to relate the independent variable (X) to the dependent variable (Y).

  • Components of a Simple Regression Equation:

    • Predicted score on Y (denoted as ( \hat{y} \))

    • Independent variable (X)

    • Intercept (b_0) (point on the Y-axis)

    • Slope (b_1) (degree of change in Y per unit change in X).

Coefficient of Determination

  • Definition: It indicates the proportion of variance shared by the two variables in a bivariate correlation; calculated as the square of the correlation coefficient (r).

  • Cohen’s Standards:

    • Strong correlations: above 0.5

    • Moderate correlations: around 0.3

    • Weak correlations: around 0.1

  • Example Calculation:

    • If ( r = 0.75 ): ( r^2 = (0.75)^2 = 0.5625 ) (56.25% variance shared).

Residual Variance

  • Definition: The amount of variance the two variables do not share.

  • If 56% is shared, then 44% (1 - 0.5625) is the residual variance.

Regression Equation Components

  • Equation Framework: ( \hat{y} = b0 + b1 x + e )

    • Where (e) is the error term that accounts for variance not explained by the model.

  • Graph Representation: Illustrates intercept values and slope providing insights into the relationship between x and y.

Bivariate and Multiple Regression

  • Bivariate Regression: Predicts Y from one X (a single independent variable).

  • Multiple Regression: Predicts Y from multiple predictors (more than one independent variable).

Implementation Example: Predicting Academic Satisfaction

  • Context: Predicting Y (academic satisfaction) from a student’s X score (mark).

    • Example Calculation:

    • Given ( b0 = 2 ), ( b1 = 0.33 ), and ( x = 80 ):
      [\hat{y} = 2 + 0.33 * 80 = 28.4]

    • Interpretation: Higher grades correlate to higher academic satisfaction.

Moving to Multiple Regression

  • Definition: A multiple regression assesses the impact of several predictors on a single criterion.

  • Purpose: Understanding how different predictors (X's) affect one outcome (Y).

  • Research Example: Predicting test scores based on GPA, assistance seeking, and study time.

Multiple Regression Equation Structure

  • Equation Structure: ( Y = b0 + b1X1 + b2X2 + … + bnX_n + e )

    • Includes multiple predictors squared.

Significance Testing Process

  • Overall Model Significance: Utilizes F statistics to assess the quality of the model.

  • Individual Predictors: t-tests help identify significance for each predictor after assessing the overall model.

Calculating Goodness of Fit

  • Sum of Squares in Regression:

    • Model Sum of Squares (SSM): Variance explained by the regression model.

    • Residual Sum of Squares (SSR): Variance not explained.

    • Total Sum of Squares (SST): The total variance in the dependent variable.

Criteria for Model Effectiveness

  • Evaluate model effectiveness using F-statistics and R-squared values.

  • A high R-squared indicates a better model fit.

  • For individual coefficient significance, check the p-values for predictors.

Assumptions for Multiple Regression

  1. Linearity: Variables must exhibit linear relationships.

  2. Normality: Target distribution of residuals should be normal.

  3. Homoscedasticity: Residuals should maintain constant variance across values.

  4. No Multicollinearity: Predictors should not correlate too highly with one another.

Outlier Detection Techniques

  • Mahalanobis Distance: Measures distance from the mean, identifies multivariate outliers.

  • Cook’s Distance: Evaluates influence of individual cases on the regression model.

Conclusion and Future Reading Recommendations

  • Suggested readings:

    • Andy Field’s works on statistics

    • Additional statistical literature covering multiple regression.

Practical Application

  • Availability of SPSS files for hands-on practice with album sales data to test multiple regression concepts covered.

Next Lecture Preview

  • A look ahead at advanced regression analysis concepts such as mediation and moderation.