Course: PY2501 Research Methods & Data Analysis
University: Aston University, Birmingham, UK
Lecturer: Dr. Ryan DiLinn
Week 10 Lecture: Focus on Multiple Regression
Part 1: Recap of Simple Regression
Part 2: Introduction to Multiple Regression
Definition and Purpose
Reporting & Interpreting Results
Part 3: Assumptions of Multiple Regression
Linearity, Normality, Homoscedasticity, No Outliers, No Multicollinearity
Part 4: Regression Formula Review
Mathematical Representation of Regression
Definition: Explores the relationship between two continuous variables.
Examples:
Exam scores vs. Exam anxiety
Running distance vs. Sweat produced
Album sales vs. Advertising budget
Goal: Establish a cause and effect relationship where X predicts Y.
Method: Fitting a line to data to predict Y from X.
Example Prediction: If Exam Anxiety = 39, then Y (Exam Performance) is predicted to be 80%.
Beta Estimate: Indicates the slope of the regression line.
t-Statistic: Measures the reliability of the predictor.
p-value: Tests if the predictor is significant (often p<0.05).
F-Statistic: Tests the overall significance of the regression model.
R2 Value: Coefficient of determination, indicating the proportion of variance explained by the model.
Adjusted R2: Adjusted for the number of predictors in the model.
Difference Between Correlation & Regression: (A) Regression predicts Y from X.
Beta Value Interpretation: (G) A change in X influences Y’s changes.
Interpretation of Regression Results: Use practical examples such as advertising budget predicting Spotify subscriptions.
Expansion of Simple Regression: Used when more than one independent variable (IV) predicts a dependent variable (DV).
General Formula: Y = b0 + b1X1 + b2X2 + ... + bnXn
Example: Y = Album sales = b0 + (b1 * Advertising Budget) + (b2 * Airplay).
How it Works: Each IV receives its regression line indicating its influence on Y.
Linearity: Relationship must be linear.
Normality of Residuals: Deviations from the regression line should be normally distributed.
Homoscedasticity: Residuals should have constant variance across levels of IV.
No Outliers: No data points should significantly deviate from the main data trend.
No Multicollinearity: IVs should not be perfectly correlated; VIF score helps assess this.
Pre-Experiment: Design experiments with adequate sample sizes (N > 50).
Post-Experiment: Potentially remove outliers or transform data.
Formula: Yi = b0 + b1Xi
Interpretations:
b0: Y-intercept
b1: Slope indicating change in Y for a unit change in X.
Example Application: Plugging values into the regression formula to predict album sales.
Forced Entry: All predictors added at once.
Hierarchical Entry: Predefined IVs added first, followed by new predictors.
Stepwise Entry: Only significant IVs are retained based on correlation with the DV.
Reporting Guidelines:
Report b-values, t-statistics (& p-values) for individual predictors.
Report the F-statistic (& its p-value) for collective prediction significance.
Report R-squared to convey variance explained by the model.
Contact: Dr. Ryan DiLinn via email r.blything@aston.ac.uk
Assignment Guidance: Check the Blackboard for specifics on submissions.
Quiz Updates: Practicals on questionnaires and multiple regression will be assessed in upcoming quizzes.