PSCH 443 Midterm Study Guide

Overview of Key Statistical Concepts in Regression Analysis

1. Covariance

  • Formula:

    • Cov(X,Y) = (1/(n-1)) * Σ(Xi - X̄)(Yi - Ȳ)

    • Where:

      • Xi, Yi = Data points for variables X and Y

      • X̄, Ȳ = Mean of variables X and Y

      • n = Number of data points

  • Interpretation:

    • Measures how two variables change together.

    • Positive covariance: both variables increase together.

    • Negative covariance: one variable increases while the other decreases.

    • Zero covariance: no relationship.

2. Correlation (Pearson’s r)

  • Formula:

    • r = Cov(X,Y) / (sX * sY)

  • Interpretation:

    • Measures strength and direction of a linear relationship between two variables.

    • Ranges from -1 to +1:

      • r = 1: Perfect positive correlation

      • r = -1: Perfect negative correlation

      • r = 0: No linear relationship

    • Standardized measure, easier to interpret than covariance.

3. Slope (b_1) for Bivariate Linear Regression

  • Formula:

    • b_1 = Cov(X,Y) / Var(X)

  • Interpretation:

    • Represents change in Y for a unit change in X

    • Indicates the rate of change in the dependent variable with respect to the independent variable.

4. Intercept (b_0) for Bivariate Linear Regression

  • Formula:

    • b_0 = Ȳ - b_1 * X̄

  • Interpretation:

    • Predicted value of Y when X = 0

    • Point where regression line crosses Y-axis.

5. Error Variance

  • Formula:

    • σ²_error = (1/(n-2)) * Σ(Yi - Ŷi)²

  • Interpretation:

    • Variability in Y not explained by the model

    • Smaller error variance indicates better fit.

6. SSM (Sum of Squares for Model)

  • Formula:

    • SSM = Σ(Ŷi - Ȳ)²

  • Interpretation:

    • Variation in Y explained by the model

    • Larger SSM indicates better explanation of variability.

7. SSRes (Sum of Squares for Residuals)

  • Formula:

    • SSRes = Σ(Yi - Ŷi)²

  • Interpretation:

    • Unexplained variation in Y

    • Smaller SSRes indicates better fit.

8. SST (Total Sum of Squares)

  • Formula:

    • SST = Σ(Yi - Ȳ)²

  • Interpretation:

    • Total variation in Y, combining explained and unexplained variation.

9. R² (Coefficient of Determination)

  • Formula:

    • R² = SSM / SST or R² = 1 - (SSRes / SST)

  • Interpretation:

    • Proportion of variance in Y explained by X

    • R² range: 0 (no variance explained) to 1 (all variance explained).

10. F-statistic

  • Formulas:

    • MSM = SSM / df_Model

    • MSRes = SSRes / df_Res

    • F = MSM / MSRes

  • Interpretation:

    • Tests overall significance of the regression model

    • High F-value indicates at least one predictor is significantly related to Y.

11. Degrees of Freedom for F-test in Multiple Regression

  • df_Model = p (number of predictors)

  • df_Res = n - p - 1 (residual degrees of freedom)

  • df_Total = n - 1 (total degrees of freedom).

12. Beta (β) in Multiple Regression

  • Formula:

    • β̂j = Cov(Xj,Y) / Var(Xj)

  • Interpretation:

    • Change in Y for a one-unit change in Xj, holding other predictors constant.

Additional Concepts in Regression Analysis

Brute Force Methods of Parameter Estimation

  • Explanation:

    • Computational approach testing all sets of parameter values to find best-fit parameters.

    • Rarely used due to computational expense.

Least-Squares Estimation (LSE)

  • Formula:

    • β̂1 = Σ(Xi - X̄)(Yi - Ȳ)/Σ(Xi - X̄)²

  • Explanation:

    • Minimizes differences between observed and predicted values to find the best-fitting line.

Partial Correlation

  • Formula:

    • rXY.Z = (rXY - rXZ * rYZ) / sqrt((1 - rXZ²)(1 - rYZ²))

  • Explanation:

    • Measures the relationship between X and Y while controlling for Z.

Semipartial Correlation

  • Formula:

    • rXY.Z = (rXY - rXZ * rYZ) / (1 - rXZ²)

  • Explanation:

    • Measures the unique contribution of X to Y, controlling Z.

Multiple Correlation

  • Formula:

    • R = √(1 - SSRes/SSTotal)

  • Explanation:

    • Represents correlation between dependent variable and multiple predictors.

R² Change (ΔR²)

  • Formula:

    • ΔR² = R²_new - R²_old

  • Explanation:

    • Assesses the increase in explained variance with new predictors.

ANOVA (F-test)

  • Formula:

    • F = MSM / MSRes

  • Explanation:

    • Tests overall significance of regression model.

Interpretation of Regression Coefficients

  • b: Actual change in Y per unit change in X

  • β: Change in Y per standard deviation change in X.

Standard Error (SE)

  • Formula:

    • SEβ̂ = sqrt(SSRes / (n - p - 1)) * (Σ(1/(Xi - X̄)²)

  • Explanation:

    • Variability of coefficient estimate; smaller SE implies more precise estimates.

Significance Testing (t-tests)

  • Formula:

    • t = β̂ / SEβ̂

  • Explanation:

    • Tests if a coefficient is significantly different from zero; large t-statistic indicates significant predictor.

Multicollinearity

  • Explanation:

    • Occurs when predictors are highly correlated; makes coefficient estimation difficult.

    • Detection: Use Variance Inflation Factor (VIF).

Outliers

  • Explanation:

    • Significant deviations from other data points; can skew regression results.

Normality, Linearity, and Homoscedasticity

  • Assumptions that must be checked for regression assumptions to be valid.

  • Further analysis on residuals can clarify adherence to these assumptions.

robot