1/104
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced | Call with Kai |
|---|
No analytics yet
Send a link to your students to track their progress
P-Value for Slopes
use the p-value to evaluate the null hypothesis that a slope coefficient is equal to zero.
The p-value is the smallest level of significance for which the null hypothesis can be rejected. We test the significance of coefficients by comparing the p-value to the chosen significance level:
If the p-value is less than the significance level, the null hypothesis can be rejected.
If the p-value is greater than the significance level, the null hypothesis cannot be rejected.
Assumptions underlying a multiple regression model include
A linear relationship exists between the dependent and independent variables.
The residuals are normally distributed.
The variance of the error terms is constant for all observations.
The residual for one observation is not correlated with that of another observation.
The independent variables are not random, and there is no exact linear relation between any two or more independent variables.
Goal of QQ plot
Check if residuals are normally distributed
Points deviating from the Q–Q line indicate that residuals are not normally distributed. The plot assesses normality, not correlation. Points below the line indicate that observed values are more negative than expected under normality, while points above the line indicate more positive values than expected.
Functional Form Model Misspecifications
Misspecification | Description | Effect |
Omission of important independent variable(s) | Based on economic theory, one or more variables that should have been included are omitted. | Biased and inconsistent regression parameters May lead to serial correlation or heteroskedasticity in the residuals |
Inappropriate variable form | The relationship between the dependent and independent variables may be non-linear. | May lead to heteroskedasticity in the residuals |
Inappropriate variable scaling | Variables may need to be transformed before estimating the regression. | May lead to heteroskedasticity in the residuals or multicollinearity |
Data improperly pooled | Sample has periods of dissimilar economic environments (that should not be pooled). | May lead to heteroskedasticity or serial correlation in the residuals |
More on Misspecifications
Problem | What goes wrong | Why it matters |
|---|---|---|
Omitted variable (correlated) | Bias + inconsistency (other variables take credit) | Worst case |
Omitted variable (uncorrelated) | Intercept bias only | Less severe |
Wrong transformation | Nonlinear → forced linear | Poor fit |
Bad scaling | Units distort meaning | Misleading comparisons |
Pooling data | Different regimes mixed | Wrong conclusions |
robust standard errors
Correcting Heteroskedasticity
To correct for conditional heteroskedasticity of regression residuals, we can calculate robust standard errors (also called White-corrected standard errors or heteroskedasticity-consistent standard errors). These robust standard errors are then used to recalculate the t-statistics using the original regression coefficients for hypothesis testing.
Serialcorrelation
Serialcorrelation, also known as autocorrelation, refers to a situation in which regression residual terms are correlated with one another; that is, not independent. Serial correlation can pose serious problem with regressions using time series data.
Positiveserial correlation
Positiveserial correlation exists when a positive residual in one time period increases the probability of observing a positive residual in the next time period.
Negativeserial correlation
Negativeserial correlation occurs when a positive residual in one period increases the probability of observing a negative residual in the next period.
lagged values and positive serial correlation
lagged values
- if lagged values is inaccurate and mess up the whole model and lead to inconsistency
if not lagged values and uses external variables then can never be inconssitent
postiive serial correlation
Type I error - false positive
Durbin-Watson (DW) statistic
Residual serial correlation at a single lag can be detected using the Durbin-Watson (DW) statistic.
Breusch-Godfrey (BG) test
more general test (which can accommodate serial correlation at multiple lags) is the Breusch-Godfrey (BG) test.
robuststandard errors
To correct for serial correlation in regression residuals, we can calculate robuststandard errors (also called Newey--West corrected standard errors or heteroskedasticity-consistent standard errors). These robust standard errors are then used to recalculate the t-statistics using the original regression coefficients.
most common sign of multicollinearity
The most common sign of multicollinearity is when t-tests indicate that none of the individual coefficients are significantly different than zero, but the F-test indicates that at least one of the coefficients is statistically significant, and the R2 is high. This suggests that the variables together explain much of the variation in the dependent variable, but the individual independent variables don't. This can happen when the independent variables are highly correlated with each other—so while their common source of variation is explaining the dependent variable, the high degree of correlation also "washes out" the individual effects.
What test to check Heteroskedasticity?
Breusch Pagan Chi²
regress residuals² back into the formula
BP = n * R² for residuals
df = k
H0: Homo Ha: Hetero
How to fix Hetero?
White-Correlated/Robert Standard error
What test for serial correlation
Durbin Watson - single
Breusch Godfrey - multi
regress residuals but with lagged variables
test if lagged variable slopes is stat sig
H0: no serial Corr Ha: serial corr
How to fix serial corr?
Newey-West/Robust SE
Multicollinearity
inflated SE
Type II
regress the variables with each other removing 1 at a time
VIF = 1/1-R²
low is good high is bad
high-leverage points
high-leverage points are the extreme observations of the independent (or 'X') variables.
The sum of the individual leverages for all observations is k + 1 (with intercept). If a variable's leverage is higher than three times the average, [3(k + 1) / n], it is considered potentially influential.
studentized residuals
We can identify outliers using the studentized residuals. The following steps outline the procedure:
Estimate the regression model using the original sample of size n. Delete one observation and re-estimate the regression using (n - 1) observations. Perform this sequentially, for all observations, deleting one observation at a time.
Compare the actual Y value of the deleted observation i to the predicted Y-values using the model parameters estimated with that observation deleted.
e*i = Yi -∧Y*i
The studentized residual is the residual in Step 2 divided by its standard deviation.
t*i = e*is*e
We can then compare this studentized residual to critical values from a t-distribution with n – k – 2 degrees of freedom (because we now only have n – 1 observations), to determine if the observation is influential.
qualitative dependent variable
Financial analysis often calls for the use of a model that has a qualitative dependent variable—a categorical variable, usually a binary variable, which takes on a value of either zero or one. An example of an application requiring the use of a qualitative dependent variable is a model that attempts to estimate the probability of default for a bond issuer. In this case, the dependent variable may take on a value of one in the event of default and zero in the event of no default.