Regression is a statistical method used to study the relationship between one dependent variable (DV) and two or more independent or predictor variables (IVs).
Dependent variable must be continuous.
Independent variables can be continuous or categorical.
If all IVs are categorical, ANOVA is preferred, which will be covered in Week 8.
Single regression examines the relationship between two variables only.
Real-world scenarios often involve multiple factors influencing outcomes.
Example: To understand blood pressure (BP), consider stress, genetic factors, and other variables; hence, multiple IVs provide a more realistic model.
Predicting income based on gender and level of education.
IVs: Gender, Years of Education, DV: Income
Predicting final grades using study time and pre-test scores.
IVs: Study Time, Pre-Test Scores, DV: Final Grades
Examining the effects of exercise and dietary salt on blood pressure.
IVs: Exercise, Dietary Salt, DV: BP
Showcases distribution of hours studied and assignment completion rate over time.
Key figures: 100 assignments at 1 hour studied, scaling down to lower performance for fewer hours studied.
Points on the scatter plot (e.g., (1,2), (2,1)...(5,4)) reflect actual data.
The regression points (1, 1.4), (2, 2.1)...(5, 4.2) represent predicted values.
Red line segments indicate the residuals, the distance between the actual data points and the regression line predictions.
Definition: Residuals reflect unexplained variance in the dependent variable.
Null Hypothesis (H0): No effect of IVs on DV.
Alternative Hypothesis (H1): There is an effect of IVs on DV.
Rejecting the null indicates evidence of a significant relationship between IVs and DV.
Each regression coefficient p-value is evaluated.
A p-value < 0.05 leads to the rejection of the null hypothesis, indicating significant predictors.
Objective: Predict final grades based on two IVs.
Hours Spent Studying per Week (X1): Total study hours.
Class Attendance (X2): Percentage of attended lectures.
Dependent Variable (Y): Final Grade out of 100.
Data table listing, e.g.:
Participant 1: Hours Studied = 10, Attendance = 95%, Grade = 95
Participant 2: Hours Studied = 10, Attendance = 94%, Grade = 99
Various other participant statistics detailed in table format.
Displaying relationship between hours studied and grades, further illustrating dataset.
Importance of identifying outliers in data analysis.
Outliers can be identified using scatterplots and may need removal to refine analysis.
Create scatterplots of every IV against the DV.
Check for linear patterns:
Hours Studied (X1) vs. Final Grade (Y) and
Lecture Attendance (X2) vs. Final Grade (Y).
Reiteration of scatterplot analysis for Hours Studied, ensuring clear visibility of linear trends.
Additional scatterplot checks for Lecture Attendance providing insights on relationships as per linearity assumption.
Reporting means, medians, and standard deviations of studied variables:
Hours Studied: Mean = 7.1
Percent Attendance: Mean = 72.4
Grades: Mean = 77.7
Displays correlations between the variables studied:
E.g., Pearson's r for Hours Studied and Grades, indicating significant relationships.
Definition: IVs must not be too highly correlated (>0.80); checked using correlation matrix.
VIF and Tolerance Statistics will be discussed in seminars to refine understanding for reports.
R-squared indicates how much variance in the DV is explained by the model.
Adjusted R-squared corrects for the number of IVs, providing a more accurate measure.
Cronbach’s Alpha assesses internal consistency/reliability of multi-item scales.
Recommended benchmark = 0.7, ensuring items are reliable before including in analysis.
Formulate your question clearly, outlining predictor and dependent variables.
Mindfulness, Sense of Humor, Growth Mindset, etc., and their corresponding scales or questionnaires to be used.
Submit Ethics Form.
Receive approval on Ethics Form.
Collect Data post-approval.
Analyze Data in SPSS, ensuring thorough checks for errors.
Write the Research Report following guidelines.
Submit by deadline (29 NOV at 15:00).
Failure to receive approval on the ethics form leads to a mark of “0” on Assessment 1.
Ensure timely and complete submissions for successful evaluation.
Multiple regression is a statistical method that analyzes the relationship between one dependent variable (DV) and two or more independent or predictor variables (IVs). It aims to predict the DV based on variations in the IVs.
The dependent variable must be continuous, enabling the analysis of variations and predictions accurately.
Independent variables can be continuous, categorical, or a mix of both, offering flexibility in modeling real-world data scenarios.
In cases where all IVs are categorical, Analysis of Variance (ANOVA) is preferred, which will be discussed further in Week 8, emphasizing the relevance of selecting the correct statistical method.
Single regression is limited to examining the relationship between only two variables. Most real-world situations involve multiple factors influencing outcomes. For instance, to understand how blood pressure (BP) is affected, it is essential to consider various IVs like stress, genetic predispositions, physical activity, and dietary choices. By employing multiple IVs, researchers can create a more comprehensive and realistic model of the dependent variable, leading to more robust conclusions.
Predicting Income: Analyze how gender and education level affect income.
IVs: Gender, Years of Education;
DV: Income.
Final Grades Prediction: Understanding how study time and pre-test scores contribute to final grades.
IVs: Study Time, Pre-Test Scores;
DV: Final Grades.
Health and Lifestyle Impact: Examining the influence of exercise frequency and dietary salt intake on blood pressure levels.
IVs: Exercise, Dietary Salt;
DV: Blood Pressure.
Illustrates the distribution of hours studied against assignment completion rates over a specified timeline.
Key figures show a total of 100 assignments for 1 hour studied, with performance declining as study time decreases. This visual aids in recognizing the importance of consistent study habits on academic success.
Residuals represent the discrepancies between actual data points and the regression line's predictions. For any given point on the scatter plot (e.g., (1,2), (2,1), ..., (5,4)), the regression points (e.g., (1, 1.4), (2, 2.1), ..., (5, 4.2)) represent predicted values based on the chosen IVs.
Red line segments in this scatter plot metaphorically indicate the residuals, visually demonstrating the errors in the prediction model.
Understanding residuals is crucial, as they reflect unexplained variance in the dependent variable, guiding researchers in model refinement and evaluation.
Null Hypothesis (H0): Assumes no significant effect of the IVs on the DV; implies that any observed relationship is due to chance.
Alternative Hypothesis (H1): Proposes that there is a significant effect of IVs on the DV; suggests that the observed relationship reflects a true association.
Rejecting the null hypothesis implies strong evidence for a significant relationship between the IVs and the DV, encouraging further analysis and exploration.
Each regression coefficient's p-value is assessed to determine significance. A p-value below 0.05 typically signifies rejection of the null hypothesis, thus identifying significant predictors impacting the DV.
Objective: To predict final grades based on contributions from two IVs.
Hours Spent Studying per Week (X1): Total hours allocated for studying.
Class Attendance (X2): Percentage of lectures attended.
Dependent Variable (Y): Final Grade (on a scale of 0-100).
A comprehensive data table is essential for clarity, showcasing the following:
Participant 1: Hours Studied = 10, Attendance = 95%, Grade = 95
Participant 2: Hours Studied = 10, Attendance = 94%, Grade = 99
Additional participants’ data summarized in the table format to allow for easy comparative analysis.
Visually presenting the relationship between hours studied and grades further elucidates the dataset, helping in identifying patterns and trends essential for a clear understanding of the correlations.
Identifying outliers is a significant step in the data analysis process. Outliers can disproportionately influence regression results, thus being mindful of their presence is crucial. Outliers can be detected using scatterplots and may require removal or special considerations in analyses to refine and ensure accuracy in model outputs.
Create scatterplots for each IV against the DV to assess for linearity. Key relationships to evaluate include:
Hours Studied (X1) vs. Final Grade (Y)
Lecture Attendance (X2) vs. Final Grade (Y) A linear relationship is pivotal for validating the use of multiple regression as the analysis method.
Reporting essential statistics such as mean, median, and standard deviations for each variable:
Hours Studied: Mean = 7.1
Percent Attendance: Mean = 72.4
Grades: Mean = 77.7 These statistics provide a foundational understanding of the data distribution and characteristics.
Displays Pearson's correlations between studied variables:
For instance, the correlation coefficient indicates significant relationships between Hours Studied and Grades, essential for establishing predictive validity in the regression model.
It's crucial to ensure the IVs are not highly correlated (typically a threshold of >0.80). Identification and management of multicollinearity are performed using correlation matrices. Additionally, Variance Inflation Factor (VIF) and Tolerance statistics will be discussed in seminars to enhance understanding for comprehensive report writing.
R-squared reveals the proportion of variance in the DV that is explained by the regression model, providing insight into model effectiveness. The Adjusted R-squared accounts for the number of IVs in the model, offering a more accurate reflection of predictor efficacy.
Cronbach's Alpha is an essential tool for assessing the internal consistency and reliability of scales used in multi-item measurements. A recommended benchmark is ≥0.7, ensuring that items are reliable and valid before their inclusion in analysis to foster credible results.
Formulate the research question clearly, outlining the IVs and DV to guide investigation. Focus Areas for Operationalization must be well defined, covering aspects like mindfulness, sense of humor, growth mindset, and their respective measurement scales or questionnaires.
Submit Ethics Form.
Obtain ethics approval prior to data collection.
Collect data following ethical guidelines.
Analyze the data using SPSS, ensuring meticulous checks for errors.
Write and structure the Research Report according to prescribed guidelines.
Submit the report by the deadline (29 NOV at 15:00).
Non-compliance with ethics approval processes results in a mark of “0” for Assessment 1. Adherence to submission timelines and comprehensive documentation is vital for successful evaluation.