Research Methods Lecture 7 Multiple Regression

Page 2: Basics of Multiple Regression

Definition

  • Regression is a statistical method used to study the relationship between one dependent variable (DV) and two or more independent or predictor variables (IVs).

Key Points

  • Dependent variable must be continuous.

  • Independent variables can be continuous or categorical.

  • If all IVs are categorical, ANOVA is preferred, which will be covered in Week 8.


Page 3: Motivation for Multiple Regression

Why Use Multiple Regression?

  • Single regression examines the relationship between two variables only.

  • Real-world scenarios often involve multiple factors influencing outcomes.

  • Example: To understand blood pressure (BP), consider stress, genetic factors, and other variables; hence, multiple IVs provide a more realistic model.


Page 4: Examples of Multiple Regression

Situations to Apply

  1. Predicting income based on gender and level of education.

    • IVs: Gender, Years of Education, DV: Income

  2. Predicting final grades using study time and pre-test scores.

    • IVs: Study Time, Pre-Test Scores, DV: Final Grades

  3. Examining the effects of exercise and dietary salt on blood pressure.

    • IVs: Exercise, Dietary Salt, DV: BP


Page 5: Data Display

Example Data Points

  • Showcases distribution of hours studied and assignment completion rate over time.

  • Key figures: 100 assignments at 1 hour studied, scaling down to lower performance for fewer hours studied.


Page 6: Multiple Regression - Residuals

Understanding Residuals

  • Points on the scatter plot (e.g., (1,2), (2,1)...(5,4)) reflect actual data.

  • The regression points (1, 1.4), (2, 2.1)...(5, 4.2) represent predicted values.

  • Red line segments indicate the residuals, the distance between the actual data points and the regression line predictions.

  • Definition: Residuals reflect unexplained variance in the dependent variable.


Page 7: Hypothesis Testing in Regression

Null and Alternative Hypotheses

  • Null Hypothesis (H0): No effect of IVs on DV.

  • Alternative Hypothesis (H1): There is an effect of IVs on DV.

Interpretation

  • Rejecting the null indicates evidence of a significant relationship between IVs and DV.


Page 8: p-value in Multiple Regression

Using SPSS for Analysis

  • Each regression coefficient p-value is evaluated.

  • A p-value < 0.05 leads to the rejection of the null hypothesis, indicating significant predictors.


Page 9: Research Example Setup

Study Focus

  • Objective: Predict final grades based on two IVs.

    1. Hours Spent Studying per Week (X1): Total study hours.

    2. Class Attendance (X2): Percentage of attended lectures.

  • Dependent Variable (Y): Final Grade out of 100.


Page 10: Research Data

Participant Data Summary

  • Data table listing, e.g.:

    • Participant 1: Hours Studied = 10, Attendance = 95%, Grade = 95

    • Participant 2: Hours Studied = 10, Attendance = 94%, Grade = 99

    • Various other participant statistics detailed in table format.


Page 11: Data Visualization

Scatterplot Representation

  • Displaying relationship between hours studied and grades, further illustrating dataset.


Page 12: Assumptions of Multiple Regression

Checking for Outliers

  • Importance of identifying outliers in data analysis.

  • Outliers can be identified using scatterplots and may need removal to refine analysis.


Page 13: Linearity Assumption

Assessing Linear Relationships

  • Create scatterplots of every IV against the DV.

  • Check for linear patterns:

    • Hours Studied (X1) vs. Final Grade (Y) and

    • Lecture Attendance (X2) vs. Final Grade (Y).


Page 14: Continuing Linearity Check

Scatterplot Analysis

  • Reiteration of scatterplot analysis for Hours Studied, ensuring clear visibility of linear trends.


Page 15: Further Linearity Check

Attendance vs. Final Grades

  • Additional scatterplot checks for Lecture Attendance providing insights on relationships as per linearity assumption.


Page 16: Statistical Descriptives

Descriptive Statistics Summary

  • Reporting means, medians, and standard deviations of studied variables:

    • Hours Studied: Mean = 7.1

    • Percent Attendance: Mean = 72.4

    • Grades: Mean = 77.7


Page 17: Correlation Analysis

Bivariate Correlation Matrix

  • Displays correlations between the variables studied:

    • E.g., Pearson's r for Hours Studied and Grades, indicating significant relationships.


Page 18: Importance of Multicollinearity

Multicollinearity in Regression

  • Definition: IVs must not be too highly correlated (>0.80); checked using correlation matrix.

  • VIF and Tolerance Statistics will be discussed in seminars to refine understanding for reports.


Page 19: Understanding R-squared

Model Explanation Metrics

  • R-squared indicates how much variance in the DV is explained by the model.

  • Adjusted R-squared corrects for the number of IVs, providing a more accurate measure.


Page 20: Reliability Assessment

Using Cronbach's Alpha

  • Cronbach’s Alpha assesses internal consistency/reliability of multi-item scales.

  • Recommended benchmark = 0.7, ensuring items are reliable before including in analysis.


Page 21: Research Report Structure

Crafting the Research Question

  • Formulate your question clearly, outlining predictor and dependent variables.

Focus Areas for Operationalization

  • Mindfulness, Sense of Humor, Growth Mindset, etc., and their corresponding scales or questionnaires to be used.


Page 22: Research Report Timeline

Steps to Follow

  1. Submit Ethics Form.

  2. Receive approval on Ethics Form.

  3. Collect Data post-approval.

  4. Analyze Data in SPSS, ensuring thorough checks for errors.

  5. Write the Research Report following guidelines.

  6. Submit by deadline (29 NOV at 15:00).


Page 23: Important Notes

Ethics Compliance

  • Failure to receive approval on the ethics form leads to a mark of “0” on Assessment 1.

  • Ensure timely and complete submissions for successful evaluation.

Multiple Regression Lecture Notes - Week 7

Basics of Multiple Regression

Definition

Multiple regression is a statistical method that analyzes the relationship between one dependent variable (DV) and two or more independent or predictor variables (IVs). It aims to predict the DV based on variations in the IVs.

Key Points

  • The dependent variable must be continuous, enabling the analysis of variations and predictions accurately.

  • Independent variables can be continuous, categorical, or a mix of both, offering flexibility in modeling real-world data scenarios.

  • In cases where all IVs are categorical, Analysis of Variance (ANOVA) is preferred, which will be discussed further in Week 8, emphasizing the relevance of selecting the correct statistical method.

Motivation for Multiple Regression

Why Use Multiple Regression?

Single regression is limited to examining the relationship between only two variables. Most real-world situations involve multiple factors influencing outcomes. For instance, to understand how blood pressure (BP) is affected, it is essential to consider various IVs like stress, genetic predispositions, physical activity, and dietary choices. By employing multiple IVs, researchers can create a more comprehensive and realistic model of the dependent variable, leading to more robust conclusions.

Examples of Multiple Regression

Situations to Apply

  1. Predicting Income: Analyze how gender and education level affect income.

    • IVs: Gender, Years of Education;

    • DV: Income.

  2. Final Grades Prediction: Understanding how study time and pre-test scores contribute to final grades.

    • IVs: Study Time, Pre-Test Scores;

    • DV: Final Grades.

  3. Health and Lifestyle Impact: Examining the influence of exercise frequency and dietary salt intake on blood pressure levels.

    • IVs: Exercise, Dietary Salt;

    • DV: Blood Pressure.

Data Display

Example Data Points

Illustrates the distribution of hours studied against assignment completion rates over a specified timeline.

  • Key figures show a total of 100 assignments for 1 hour studied, with performance declining as study time decreases. This visual aids in recognizing the importance of consistent study habits on academic success.

Multiple Regression - Residuals

Understanding Residuals

Residuals represent the discrepancies between actual data points and the regression line's predictions. For any given point on the scatter plot (e.g., (1,2), (2,1), ..., (5,4)), the regression points (e.g., (1, 1.4), (2, 2.1), ..., (5, 4.2)) represent predicted values based on the chosen IVs.

  • Red line segments in this scatter plot metaphorically indicate the residuals, visually demonstrating the errors in the prediction model.

  • Understanding residuals is crucial, as they reflect unexplained variance in the dependent variable, guiding researchers in model refinement and evaluation.

Hypothesis Testing in Regression

Null and Alternative Hypotheses

  • Null Hypothesis (H0): Assumes no significant effect of the IVs on the DV; implies that any observed relationship is due to chance.

  • Alternative Hypothesis (H1): Proposes that there is a significant effect of IVs on the DV; suggests that the observed relationship reflects a true association.

Interpretation

Rejecting the null hypothesis implies strong evidence for a significant relationship between the IVs and the DV, encouraging further analysis and exploration.

p-value in Multiple Regression

Using SPSS for Analysis

Each regression coefficient's p-value is assessed to determine significance. A p-value below 0.05 typically signifies rejection of the null hypothesis, thus identifying significant predictors impacting the DV.

Research Example Setup

Study Focus

  • Objective: To predict final grades based on contributions from two IVs.

    • Hours Spent Studying per Week (X1): Total hours allocated for studying.

    • Class Attendance (X2): Percentage of lectures attended.

    • Dependent Variable (Y): Final Grade (on a scale of 0-100).

Research Data

Participant Data Summary

A comprehensive data table is essential for clarity, showcasing the following:

  • Participant 1: Hours Studied = 10, Attendance = 95%, Grade = 95

  • Participant 2: Hours Studied = 10, Attendance = 94%, Grade = 99

  • Additional participants’ data summarized in the table format to allow for easy comparative analysis.

Data Visualization

Scatterplot Representation

Visually presenting the relationship between hours studied and grades further elucidates the dataset, helping in identifying patterns and trends essential for a clear understanding of the correlations.

Assumptions of Multiple Regression

Checking for Outliers

Identifying outliers is a significant step in the data analysis process. Outliers can disproportionately influence regression results, thus being mindful of their presence is crucial. Outliers can be detected using scatterplots and may require removal or special considerations in analyses to refine and ensure accuracy in model outputs.

Linearity Assumption

Assessing Linear Relationships

Create scatterplots for each IV against the DV to assess for linearity. Key relationships to evaluate include:

  • Hours Studied (X1) vs. Final Grade (Y)

  • Lecture Attendance (X2) vs. Final Grade (Y) A linear relationship is pivotal for validating the use of multiple regression as the analysis method.

Statistical Descriptives

Descriptive Statistics Summary

Reporting essential statistics such as mean, median, and standard deviations for each variable:

  • Hours Studied: Mean = 7.1

  • Percent Attendance: Mean = 72.4

  • Grades: Mean = 77.7 These statistics provide a foundational understanding of the data distribution and characteristics.

Correlation Analysis

Bivariate Correlation Matrix

Displays Pearson's correlations between studied variables:

  • For instance, the correlation coefficient indicates significant relationships between Hours Studied and Grades, essential for establishing predictive validity in the regression model.

Importance of Multicollinearity

Multicollinearity in Regression

It's crucial to ensure the IVs are not highly correlated (typically a threshold of >0.80). Identification and management of multicollinearity are performed using correlation matrices. Additionally, Variance Inflation Factor (VIF) and Tolerance statistics will be discussed in seminars to enhance understanding for comprehensive report writing.

Understanding R-squared

Model Explanation Metrics

R-squared reveals the proportion of variance in the DV that is explained by the regression model, providing insight into model effectiveness. The Adjusted R-squared accounts for the number of IVs in the model, offering a more accurate reflection of predictor efficacy.

Reliability Assessment

Using Cronbach's Alpha

Cronbach's Alpha is an essential tool for assessing the internal consistency and reliability of scales used in multi-item measurements. A recommended benchmark is ≥0.7, ensuring that items are reliable and valid before their inclusion in analysis to foster credible results.

Research Report Structure

Crafting the Research Question

Formulate the research question clearly, outlining the IVs and DV to guide investigation. Focus Areas for Operationalization must be well defined, covering aspects like mindfulness, sense of humor, growth mindset, and their respective measurement scales or questionnaires.

Research Report Timeline

Steps to Follow

  • Submit Ethics Form.

  • Obtain ethics approval prior to data collection.

  • Collect data following ethical guidelines.

  • Analyze the data using SPSS, ensuring meticulous checks for errors.

  • Write and structure the Research Report according to prescribed guidelines.

  • Submit the report by the deadline (29 NOV at 15:00).

Important Notes

Ethics Compliance

Non-compliance with ethics approval processes results in a mark of “0” for Assessment 1. Adherence to submission timelines and comprehensive documentation is vital for successful evaluation.

robot