Regression is a statistical method used to model relationships between variables.
Bivariate regression focuses on the effect of one independent variable on one dependent variable.
Multiple regression extends this to study the effect of multiple independent variables on one dependent variable.
Dependent Variable (Y): The outcome variable we care about and want to study.
Independent Variable (X): The variable used to predict or explain changes in the dependent variable.
Variability in Y is critical. We're interested in why some individuals may score high while others may score low on the dependent variable.
Examples include academic performance, mental health measures, etc.
The aim is to explain the variance of the dependent variable (Y).
We explore questions such as:
Why do some people excel in standardized tests while others do not?
Statistical modeling helps explain variability in response to changes in predictor variables.
Outcome Variable: College GPA
Predictor Variable: High School GPA
We can model predicting college success using high school performance as a predictor.
From simple regression (one predictor) to multiple regression (multiple predictors), the fundamental principles remain similar.
Scatter plots illustrate the variability of Y based on changes in X.
A simple regression model defines a linear relationship.
Example equation: Y = A + B * X
A: Y-intercept (value of Y when X=0)
B: Slope (indicates how much Y changes as X increases)
We need to define what a good model looks like and how it explains the data.
Two approaches to regression:
Best Predictor Approach: Identify which independent variables significantly explain the dependent variable.
Causal Modeling Approach: A broader exploration of how a system of variables predicts an outcome, incorporating complex relationships (non-linear).
Good models are determined by their ability to explain the most variability effectively.
Simple Linear Model: Y = A + B * X
Parameters:
A (Intercept): Changes the position of the line up or down on the Y-axis.
B (Slope): Reflects how steep the line is, indicating the strength and direction of the relationship.
Y-hat (Ŷ): Represents predicted values of Y based on our model.
The slope (B) indicates how much Y changes for each one-unit increase in X.
Adjusting B changes the steepness of the line:
Positive B = increase in Y
Negative B = decrease in Y
B=0 results in a flat line, indicating no relationship.
Observed Values: Actual measurements collected in the study.
Predicted Values (Ŷ): These values are derived from the regression model based on chosen parameters.
We will evaluate how well the predicted values match the observed values to assess the model's effectiveness.
The process of determining A and B for the best predictive equation is crucial.