Focus on understanding the relationship between two numerical variables.
Dependent Variable (DV): The variable being predicted or explained.
Independent Variable (IV): The variable used to predict the DV.
Simple linear regression (SLR) is based on concepts from Pearson's correlation.
Correlation only measures the strength of a relationship without designating independent and dependent variables.
Simple Linear Regression aims to predict the DV based on the IV, indicating one variable's role in influencing another.
Utilizes scatter plots to visualize relationships between variables.
The line of best fit is a foundational concept; represented by the equation y = mx + b
, where:
y = dependent variable
m = slope (change in y for a one-unit change in x)
x = independent variable
b = y-intercept (value of y when x = 0)
Regression enables the distinction between IVs and DVs in prediction.
A common goal in psychology is predicting human behavior, aligning well with regression analysis goals.
Predictors (independent variables) are used for forecasting changes in the dependent variable.
Distinction between correlation and regression is crucial, especially when considering causal relationships.
Regression can suggest relationships, but does not imply causation.
Possible confounding factors may influence DVs, necessitating careful design to infer causation.
Causal inferences require controlled experimental designs alongside regression analysis.
Simple linear regression analysis is primarily observational, using either:
Cross-sectional studies (data collected at one point in time).
Longitudinal studies (data collected over time).
The primary goal is to explain variability in the DV (y or outcome variable).
Variability in the DV can be divided into:
Explained Variability: Variability due to the IV.
Unexplained Variability (Residuals/Error): Random factors or measurement errors not captured by the model.
The regression equation quantifies this relationship, calculating total variability, explained variability, and residuals.
Simple linear regression quantifies total variability using three components:
Total Sum of Squares (SST): Total variability in the observations.
Regression Sum of Squares (SSR): Variability explained by the model.
Residual Sum of Squares (SSE): Variability not explained by the model.
Relationship: SST = SSR + SSE.
Regression Line: Represents the best predictive relationship.
Intercept (a): Starting point of regression line. Represents expected score when x = 0.
Slope (b): Indicates the rate of change of y relative to changes in x.
Example: If b = 5, then for every additional quiz completed, the predicted grade increases by 5 points.
Independence of Observations: No participant influences another.
Normality of Residuals: Residuals must be normally distributed. Tested using:
Shapiro-Wilk Test
Visualizations such as histograms and P-P plots.
Homoscedasticity: Equal spread of residuals across all levels of IV.
Linearity: A linear relationship exists between IV and DV. Verified through residual plots.
Syntax used: regress DV IV
.
Model as a Whole and Individual Predictors: Test statistics and significance are calculated.
Model output includes:
F value: Tests overall model significance.
p-value: Determines statistical significance (p < 0.05 is preferred).
R-Squared: Indication of variance explained by the model (0 to 1 scale).
Interpretation of size: Small (0-12%), Medium (13-25%), Large (26%+).
Standard summary: how predictors relate to changes in DVs and significant findings.
Example write-up format:
X significantly predicts Y
Statistical values included: F, p-value, R-squared, confidence intervals for beta coefficients.
Unstandardized coefficients show the effect size changes relative to original variable measurements.
Standardized coefficients allow comparison across different measures or scales.
Stata syntax for standardized coefficients: regress DV IV, beta
.
The regression equation provides predictions for different values of the IV.
Example calculations using the regression formula to predict DV values based on various IV inputs.
The lecture covered key components of simple linear regression, interpretation of outputs, and application for predicting outcomes based on one predictor variable.