Definition: Statistical technique to find the best-fitting straight line for a dataset.
Purpose: Allows for predictions based on correlations between variables.
Linear Relationship: Expressed mathematically as Y = bX + a.
Correlation: Analysis of the relationship between two variables.
Regression: Uses one variable to make predictions about another.
X and Y Variables: Each individual has an x-coordinate (X) and a corresponding y-coordinate (Y).
X: Any score on X.
Y: Corresponding score on Y based on the value of X.
a (Y-intercept): Value of Y when X = 0.
b (slope): The change in Y for each 1-point increase in X.
Weekly Membership: $5.00 fee + $2.00 per movie streamed.
Equation: Y = 2X + 5.
Scenarios:
X = 0: Y = 2(0) + 5 = 5
X = 3: Y = 2(3) + 5 = 11
X = 8: Y = 2(8) + 5 = 21
Definition: Line that best fits the data, assuming a linear relationship.
Best Fit Line: Minimizes the distances from data points to the line.
Key Functionality:
Identifies central tendency.
Provides best prediction of Y values.
Error Calculation: Error = Y - Ŷ (actual Y minus predicted Y).
Total Squared Error: Sum of squared distances.
Standard Error of Estimate (SEE): Measures prediction accuracy; indicates the standard distance between the regression line and actual data points, similar to standard deviation.
Variability: r² (coefficient of determination) indicates the proportion of variability in Y predicted by X.
Example: If r = 0.80, r² = 0.64 (64% variability accounted for).
Simple Linear Regression: Predicts the dependent variable (DV) from one independent variable (IV).
Multiple Linear Regression: Predicts DV from 2 or more IVs.
IV Characteristics: Orthogonal (independent contributions), non-collinear (no multicollinearity).
Hierarchical Multiple Regression: Specific order of IV entry based on hypotheses.
Stepwise Multiple Regression: IVs entered based on software analysis.
Variable Type:
DV: Continuous.
IV: Continuous or dichotomous (dummy-coded 0/1).
Linearity: DV must be linearly related to IV; check residual plots for curvilinear patterns.
Sample Size:
Correlation: Minimum n = 30.
Regression: Minimum n = 20 per predictor.
Normal Distribution: Residuals should be normally distributed; use Shapiro-Wilk test or Q-Q plots.
Homoscedasticity: Variance of residuals remains constant; residual plots should not display cones.
Independence of Residuals: Uncorrelated residual terms; check with Durbin-Watson test (ideal score close to 2).
No Multicollinearity: IVs must not be highly correlated (r < .80); check Tolerance (> 0.20) and VIF (< 10).
Include:
F-value (ANOVA for model fit).
Degrees of freedom.
p-value.
r² (coefficient of determination).
Specific B scores (for 2+ IVs).
Example: "The regression model was statistically significant. Students’ level of Anxiety predicted Test-Taking Efficacy, F(2, 97) = 215.24, p < .001. Adjusted r² = 0.81 indicates a strong effect size (81% of variability explained)."
Unstandardized Coefficients: B used for predicting Y; cannot compare across predictors unless IVs are on the same scale.
Standardized Coefficients (β): For relative strength comparisons of predictors.
Indicates strongest predictor.
β = r in simple linear regression (when IVs are uncorrelated).
Pearson’s r: Moderate, inverse relationship between Student Success and Student-Life Stress, r(397) = -0.29, p < .001.
Regression Model: Student-Life Stress predicted Success, F(1, 397) = 36.8, p < .001.
Adjusted r² = 0.08 (8% of variability in Success explained by Stress).
Coefficients: 1-point decrease in Stress increases Success by 0.26 points.
Academic Adjustment (β = .50, p < .001) stronger predictor than Stress (β = -0.11, p = .02).
Y-Axis: Dependent variable, such as length of stay after surgery.
X-Axis: Age - independent variable.
Variable Type:
DV: Continuous.
IV: Continuous or dichotomous (dummy-coded 0/1).
Linearity:
DV must be linearly related to IV; check residual plots for curvilinear patterns.
Sample Size:
Correlation: Minimum n = 30.
Regression: Minimum n = 20 per predictor.
Normal Distribution:
Residuals should be normally distributed; use Shapiro-Wilk test or Q-Q plots.
Homoscedasticity:
Variance of residuals remains constant; residual plots should not display cones.
Independence of Residuals:
Uncorrelated residual terms; check with Durbin-Watson test (ideal score close to 2).
No Multicollinearity:
IVs must not be highly correlated (r < .80); check Tolerance (> 0.20) and VIF (< 10).