Regression
Week Four Lecture: Linear Regression
Overview of the Lecture
Focus on the concept of linear regression, specifically multiple linear regression.
Correlation
Correlation Coefficient Definition: A statistical measure that describes the extent to which two variables are related.
Forms of Correlation:
Bivariate Correlation: Involves two sets of numbers (data set one and data set two).
Key Characteristics:
Strength: Measured between -1 and 1.
Perfect Correlation: +1 or -1 (very rare in psychology).
Direction: Positive or negative.
Zero Correlation: A correlation coefficient of 0.
Regression
Concept of Regression: A regression line, or line of best fit, is drawn through data points in a scatter plot to relate the independent variable (X) to the dependent variable (Y).
Components of a Simple Regression Equation:
Predicted score on Y (denoted as ( \hat{y} \))
Independent variable (X)
Intercept (b_0) (point on the Y-axis)
Slope (b_1) (degree of change in Y per unit change in X).
Coefficient of Determination
Definition: It indicates the proportion of variance shared by the two variables in a bivariate correlation; calculated as the square of the correlation coefficient (r).
Cohen’s Standards:
Strong correlations: above 0.5
Moderate correlations: around 0.3
Weak correlations: around 0.1
Example Calculation:
If ( r = 0.75 ): ( r^2 = (0.75)^2 = 0.5625 ) (56.25% variance shared).
Residual Variance
Definition: The amount of variance the two variables do not share.
If 56% is shared, then 44% (1 - 0.5625) is the residual variance.
Regression Equation Components
Equation Framework: ( \hat{y} = b0 + b1 x + e )
Where (e) is the error term that accounts for variance not explained by the model.
Graph Representation: Illustrates intercept values and slope providing insights into the relationship between x and y.
Bivariate and Multiple Regression
Bivariate Regression: Predicts Y from one X (a single independent variable).
Multiple Regression: Predicts Y from multiple predictors (more than one independent variable).
Implementation Example: Predicting Academic Satisfaction
Context: Predicting Y (academic satisfaction) from a student’s X score (mark).
Example Calculation:
Given ( b0 = 2 ), ( b1 = 0.33 ), and ( x = 80 ):
[\hat{y} = 2 + 0.33 * 80 = 28.4]Interpretation: Higher grades correlate to higher academic satisfaction.
Moving to Multiple Regression
Definition: A multiple regression assesses the impact of several predictors on a single criterion.
Purpose: Understanding how different predictors (X's) affect one outcome (Y).
Research Example: Predicting test scores based on GPA, assistance seeking, and study time.
Multiple Regression Equation Structure
Equation Structure: ( Y = b0 + b1X1 + b2X2 + … + bnX_n + e )
Includes multiple predictors squared.
Significance Testing Process
Overall Model Significance: Utilizes F statistics to assess the quality of the model.
Individual Predictors: t-tests help identify significance for each predictor after assessing the overall model.
Calculating Goodness of Fit
Sum of Squares in Regression:
Model Sum of Squares (SSM): Variance explained by the regression model.
Residual Sum of Squares (SSR): Variance not explained.
Total Sum of Squares (SST): The total variance in the dependent variable.
Criteria for Model Effectiveness
Evaluate model effectiveness using F-statistics and R-squared values.
A high R-squared indicates a better model fit.
For individual coefficient significance, check the p-values for predictors.
Assumptions for Multiple Regression
Linearity: Variables must exhibit linear relationships.
Normality: Target distribution of residuals should be normal.
Homoscedasticity: Residuals should maintain constant variance across values.
No Multicollinearity: Predictors should not correlate too highly with one another.
Outlier Detection Techniques
Mahalanobis Distance: Measures distance from the mean, identifies multivariate outliers.
Cook’s Distance: Evaluates influence of individual cases on the regression model.
Conclusion and Future Reading Recommendations
Suggested readings:
Andy Field’s works on statistics
Additional statistical literature covering multiple regression.
Practical Application
Availability of SPSS files for hands-on practice with album sales data to test multiple regression concepts covered.
Next Lecture Preview
A look ahead at advanced regression analysis concepts such as mediation and moderation.