Population Mean (μ): The true average of a variable in the entire population.
Sample Mean (X̄): The average computed from a sample, used to estimate the population mean.
Conditional Expectation (E(Y|X)): The expected value of a variable Y given a certain value of X.
Regression Equation: Y=β0+β1X+uY = β_0 + β_1X + uY=β0+β1X+u, where:
YYY = Dependent Variable
XXX = Independent Variable
uuu = Error Term
β0β_0β0 = Intercept
β1β_1β1 = Slope
Residual ( u^\hat{u}u^ ): The difference between observed and predicted values (Y−Y^Y - \hat{Y}Y−Y^).
Ordinary Least Squares (OLS): A method to estimate regression parameters by minimizing the sum of squared residuals.
Standard Error of Regression (SER): Measures the average size of the residuals.
R-Squared (R2R^2R2): Measures the proportion of variance in YYY explained by XXX.
Q: What is the conditional expectation formula in a simple regression model?
A: E(Y∣X)=β0+β1XE(Y |X) = β_0 + β_1XE(Y∣X)=β0+β1X.
Q: How do we estimate β0β_0β0 and β1β_1β1 in a regression?
A: Using OLS, we minimize the sum of squared residuals to find:
β1^=Cov(X,Y)Var(X)\hat{β_1} = \frac{Cov(X, Y)}{Var(X)}β1^=Var(X)Cov(X,Y)
β0^=Yˉ−β1^Xˉ\hat{β_0} = \bar{Y} - \hat{β_1} \bar{X}β0^=Yˉ−β1^Xˉ
Q: How do you interpret β0β_0β0 (intercept) in a regression equation?
A: It is the expected value of YYY when X=0X = 0X=0, but may not always be meaningful if X=0X = 0X=0 is not a realistic scenario.
Q: How do you interpret β1β_1β1 (slope) in a regression equation?
A: It represents the expected change in YYY for a one-unit increase in XXX.
Q: What is the Sum of Squared Residuals (SSR)?
A: The sum of squared differences between observed and predicted values.
Q: What is Total Sum of Squares (TSS)?
A: The total variation in the dependent variable (YYY).
Q: What is Explained Sum of Squares (ESS)?
A: The portion of TSS explained by the regression model.
Q: How is R2R^2R2 calculated?
A: R2=ESSTSS=1−SSRTSSR^2 = \frac{ESS}{TSS} = 1 - \frac{SSR}{TSS}R2=TSSESS=1−TSSSSR.
Q: What does an R2R^2R2 value close to 1 indicate?
A: A high proportion of the variance in YYY is explained by XXX, meaning a good model fit.
Q: What does an R2R^2R2 value close to 0 indicate?
A: The model explains very little of the variance in YYY.
Q: What criterion is used to determine the best regression line?
A: The one that minimizes the sum of squared residuals (OLS method).