Recording-2025-03-10T00:02:53.056Z

Overview of Simple Linear Regression

  • Focus on understanding the relationship between two numerical variables.

  • Dependent Variable (DV): The variable being predicted or explained.

  • Independent Variable (IV): The variable used to predict the DV.

  • Simple linear regression (SLR) is based on concepts from Pearson's correlation.

Similarities and Differences Between Correlation and Regression

  • Correlation only measures the strength of a relationship without designating independent and dependent variables.

  • Simple Linear Regression aims to predict the DV based on the IV, indicating one variable's role in influencing another.

Scatter Plots and the Line of Best Fit

  • Utilizes scatter plots to visualize relationships between variables.

  • The line of best fit is a foundational concept; represented by the equation y = mx + b, where:

    • y = dependent variable

    • m = slope (change in y for a one-unit change in x)

    • x = independent variable

    • b = y-intercept (value of y when x = 0)

Research Design in Simple Linear Regression

  • Regression enables the distinction between IVs and DVs in prediction.

  • A common goal in psychology is predicting human behavior, aligning well with regression analysis goals.

  • Predictors (independent variables) are used for forecasting changes in the dependent variable.

  • Distinction between correlation and regression is crucial, especially when considering causal relationships.

Causation vs. Prediction

  • Regression can suggest relationships, but does not imply causation.

  • Possible confounding factors may influence DVs, necessitating careful design to infer causation.

  • Causal inferences require controlled experimental designs alongside regression analysis.

Research Design Types

  • Simple linear regression analysis is primarily observational, using either:

    • Cross-sectional studies (data collected at one point in time).

    • Longitudinal studies (data collected over time).

Purpose and Functionality of the Regression Model

  • The primary goal is to explain variability in the DV (y or outcome variable).

  • Variability in the DV can be divided into:

    • Explained Variability: Variability due to the IV.

    • Unexplained Variability (Residuals/Error): Random factors or measurement errors not captured by the model.

  • The regression equation quantifies this relationship, calculating total variability, explained variability, and residuals.

Understanding Variability Through Sum of Squares

  • Simple linear regression quantifies total variability using three components:

    • Total Sum of Squares (SST): Total variability in the observations.

    • Regression Sum of Squares (SSR): Variability explained by the model.

    • Residual Sum of Squares (SSE): Variability not explained by the model.

  • Relationship: SST = SSR + SSE.

Statistical Interpretation of the Regression Model

  • Regression Line: Represents the best predictive relationship.

  • Intercept (a): Starting point of regression line. Represents expected score when x = 0.

  • Slope (b): Indicates the rate of change of y relative to changes in x.

    • Example: If b = 5, then for every additional quiz completed, the predicted grade increases by 5 points.

Assumptions of Regression Analysis

  • Independence of Observations: No participant influences another.

  • Normality of Residuals: Residuals must be normally distributed. Tested using:

    • Shapiro-Wilk Test

    • Visualizations such as histograms and P-P plots.

  • Homoscedasticity: Equal spread of residuals across all levels of IV.

  • Linearity: A linear relationship exists between IV and DV. Verified through residual plots.

Running and Interpreting Regression Analysis in Stata

  • Syntax used: regress DV IV.

  • Model as a Whole and Individual Predictors: Test statistics and significance are calculated.

  • Model output includes:

    • F value: Tests overall model significance.

    • p-value: Determines statistical significance (p < 0.05 is preferred).

    • R-Squared: Indication of variance explained by the model (0 to 1 scale).

      • Interpretation of size: Small (0-12%), Medium (13-25%), Large (26%+).

Data Interpretation for Reporting Results

  • Standard summary: how predictors relate to changes in DVs and significant findings.

  • Example write-up format:

    • X significantly predicts Y

    • Statistical values included: F, p-value, R-squared, confidence intervals for beta coefficients.

Standardized vs. Unstandardized Regression Coefficients

  • Unstandardized coefficients show the effect size changes relative to original variable measurements.

  • Standardized coefficients allow comparison across different measures or scales.

  • Stata syntax for standardized coefficients: regress DV IV, beta.

Using the Regression Equation for Predictions

  • The regression equation provides predictions for different values of the IV.

  • Example calculations using the regression formula to predict DV values based on various IV inputs.

Conclusion

  • The lecture covered key components of simple linear regression, interpretation of outputs, and application for predicting outcomes based on one predictor variable.

robot