Formula:
Cov(X,Y) = (1/(n-1)) * Σ(Xi - X̄)(Yi - Ȳ)
Where:
Xi, Yi = Data points for variables X and Y
X̄, Ȳ = Mean of variables X and Y
n = Number of data points
Interpretation:
Measures how two variables change together.
Positive covariance: both variables increase together.
Negative covariance: one variable increases while the other decreases.
Zero covariance: no relationship.
Formula:
r = Cov(X,Y) / (sX * sY)
Interpretation:
Measures strength and direction of a linear relationship between two variables.
Ranges from -1 to +1:
r = 1: Perfect positive correlation
r = -1: Perfect negative correlation
r = 0: No linear relationship
Standardized measure, easier to interpret than covariance.
Formula:
b_1 = Cov(X,Y) / Var(X)
Interpretation:
Represents change in Y for a unit change in X
Indicates the rate of change in the dependent variable with respect to the independent variable.
Formula:
b_0 = Ȳ - b_1 * X̄
Interpretation:
Predicted value of Y when X = 0
Point where regression line crosses Y-axis.
Formula:
σ²_error = (1/(n-2)) * Σ(Yi - Ŷi)²
Interpretation:
Variability in Y not explained by the model
Smaller error variance indicates better fit.
Formula:
SSM = Σ(Ŷi - Ȳ)²
Interpretation:
Variation in Y explained by the model
Larger SSM indicates better explanation of variability.
Formula:
SSRes = Σ(Yi - Ŷi)²
Interpretation:
Unexplained variation in Y
Smaller SSRes indicates better fit.
Formula:
SST = Σ(Yi - Ȳ)²
Interpretation:
Total variation in Y, combining explained and unexplained variation.
Formula:
R² = SSM / SST or R² = 1 - (SSRes / SST)
Interpretation:
Proportion of variance in Y explained by X
R² range: 0 (no variance explained) to 1 (all variance explained).
Formulas:
MSM = SSM / df_Model
MSRes = SSRes / df_Res
F = MSM / MSRes
Interpretation:
Tests overall significance of the regression model
High F-value indicates at least one predictor is significantly related to Y.
df_Model = p (number of predictors)
df_Res = n - p - 1 (residual degrees of freedom)
df_Total = n - 1 (total degrees of freedom).
Formula:
β̂j = Cov(Xj,Y) / Var(Xj)
Interpretation:
Change in Y for a one-unit change in Xj, holding other predictors constant.
Explanation:
Computational approach testing all sets of parameter values to find best-fit parameters.
Rarely used due to computational expense.
Formula:
β̂1 = Σ(Xi - X̄)(Yi - Ȳ)/Σ(Xi - X̄)²
Explanation:
Minimizes differences between observed and predicted values to find the best-fitting line.
Formula:
rXY.Z = (rXY - rXZ * rYZ) / sqrt((1 - rXZ²)(1 - rYZ²))
Explanation:
Measures the relationship between X and Y while controlling for Z.
Formula:
rXY.Z = (rXY - rXZ * rYZ) / (1 - rXZ²)
Explanation:
Measures the unique contribution of X to Y, controlling Z.
Formula:
R = √(1 - SSRes/SSTotal)
Explanation:
Represents correlation between dependent variable and multiple predictors.
Formula:
ΔR² = R²_new - R²_old
Explanation:
Assesses the increase in explained variance with new predictors.
Formula:
F = MSM / MSRes
Explanation:
Tests overall significance of regression model.
b: Actual change in Y per unit change in X
β: Change in Y per standard deviation change in X.
Formula:
SEβ̂ = sqrt(SSRes / (n - p - 1)) * (Σ(1/(Xi - X̄)²)
Explanation:
Variability of coefficient estimate; smaller SE implies more precise estimates.
Formula:
t = β̂ / SEβ̂
Explanation:
Tests if a coefficient is significantly different from zero; large t-statistic indicates significant predictor.
Explanation:
Occurs when predictors are highly correlated; makes coefficient estimation difficult.
Detection: Use Variance Inflation Factor (VIF).
Explanation:
Significant deviations from other data points; can skew regression results.
Assumptions that must be checked for regression assumptions to be valid.
Further analysis on residuals can clarify adherence to these assumptions.