Quantitative Methods (Full)

0.0(0)
Studied by 1 person
call kaiCall Kai
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
GameKnowt Play
Card Sorting

1/143

encourage image

There's no tags or description

Looks like no tags are added yet.

Last updated 11:18 AM on 4/11/26
Name
Mastery
Learn
Test
Matching
Spaced
Call with Kai

No analytics yet

Send a link to your students to track their progress

144 Terms

1
New cards

Why are linear models not ideal for use in economic situations?

Expect serial correlation from the residual values

  • ideally residual terms are unpredictable and uncorrelated

2
New cards

What do you do if the value being modelled grows exponentially instead of linearly?

Take natural log (and therefore exponent)

ln(y) = b + bt + e, so

y = e^(b + bt + e)

3
New cards

When would a linear model be appropriate, as opposed to a log-linear model?

When the growth is approximately constant

4
New cards

When would a log-linear model be appropriate, as opposed to a linear model?

When the growth is approximately linear

5
New cards

What are the three requirements for a time series to be covariance stationary?

  • Constant and finite:

    • expected values in all periods

    • variance in all periods

    • covariance with lagged versions of the time series for all

6
New cards

What happens to time series without covariance stationarity?

  • Results are economically invalid

  • regression will lead to spurious results

  • Estimate of b will be biased

  • Hypothesis tests will be invalid

7
New cards

What is an autoregressive model?

Independent variables are historical values of the dependent variables

8
New cards

within AR, what does it mean for a model to be incomplete?

information within the data that the model is not capturing

9
New cards

How do you correct for an AR model with significant serial correlation (autocorrection) - (Incomplete model)

  • Increase number of lags until no significant autocorrect

  • Testing for autocorrection in an AR model

  • Test for autocorrelation with a t-test

Dubran-Watson Test doesn’t work for AR models (usually works for serial correlation)

10
New cards

When is a time series mean-reverting?

  • It falls when level above mean

  • It rises when level below mean

11
New cards

How does the formula change for regression for mean-reverting level?

xt = b0 + btxt

Would become:

xt = b0/ (1 - bt)

12
New cards

Can you have multiple Dependent or Independent variables?

Independent variables are those that can be manipulated, while dependent variables are influenced by changes in independent variables in a given study

13
New cards

What are the 5 assumptions of regression?

  1. linearity - linear relationship

  2. Homoskedasticity - Unchanging variance

  3. Error Independence - Observations are independent

  4. Normality - Residuals are normally distributed

  5. Variable Independence - no exact linear relationships between two or more independent variables

14
New cards

What would be the 5 violations of regression?

  1. Nonlinearity

  2. Heteroskedasticity

  3. Serial correlation or autocorrelation

  4. non-normality

  5. Multicollinearity

15
New cards

What does the error term represent?

The stochastic or random part of the model, capturing any unexplained variation in the dependent variable due to randomness, measurements errors, or unobserved factors

16
New cards

What do the independent variables represent?

The deterministic part of the model, quantifying the observed relationship between the independent variables and the dependent variable

17
New cards

What is the coefficient of determination and what does it do?

Also known as R-squared, measures goodness of fit of an estimated regression to the data. It can also be defined as the ration of the variation of the dependent variable explained by the independent variables.

18
New cards

Quick formula for R-Squared:

SSR/SST (think alphabet)

19
New cards

Do you want more or less variables for multiple linear regression?

Usually you want less, to avoid overfitting (Less is More). As you add more and more independent variables R-Squared will increase

20
New cards

Why is adjusted R² a little bit better than R²?

Doesn’t automatically go up with the addition of more independent variables

21
New cards

How do you determine what the addition of a new variable will have on Adjusted R-Squared?

if coefficients t-stat >|1.0|, then A.R² will go up

If coefficients t-stat <|1.0|, then A.R² will go down

22
New cards

What does a lower Akaike’s information criterion (AIC) indicate?

A lower AIC indicates a better fitting mode (you want it to be as low as possible to indicate a better model)

23
New cards

What does Bayesian Information Criteria (BIC) indicate?

A lower BIC indicates a better-fitting model

24
New cards

When do we prefer AIC to BIC or vice versa?

We use AIC when we are using a model for predications, we use BIC is all we’re interested in is the best goodness of fit

25
New cards

In terms of Adjusted R², which data would we want to use?

Ideally you want the highest Adjusted R² value, if just interpreting R² and Adjusted-R², this would change with AIC and BIC values given potentially

26
New cards

What is the Coefficient?

The slope of the independent variable, and it represents the expected change in the dependent variable for a 1 unit change in the independent variable (Holding all other variables constant - this is really key to remember)

27
New cards

What does a coefficient of 0 mean?

Independent variable has no significance, and probably can be excluded from the regression

28
New cards

What are the degrees of freedom?

for multiple regression: # of data points - # of regression coefficients

(n - (k + 1)

29
New cards

What is the really key thing to remember about the coefficient of independent variables?

This is based on the change that would occur for a 1 unit change HOLDING ALL OTHER VARIABLES CONSTANT

30
New cards

How do we interpret the hypothesis test and rejecting/ not rejecting the null hypothesis?

if the calculated t-statistic > t-critical value, we can reject the null hypothesis, if the calc. t-stat < t-crit, we cannot reject the null hypothesis

31
New cards

What is an unrestricted model?

A model that includes ALL the variables in the initial specification

32
New cards

What is a restricted (or nested) model?

Restricts the slope to 0, for one or more independent variables - not all of them are used. It is nested in the unrestricted model

33
New cards

What is the criteria for the F-test for joint test of slope coefficients?

Exceeds the critical F-value for the selected significance level

34
New cards

What is Model Error

Error between a predicted value and the actual value for a dependent variable within the data set

35
New cards

What is sampling error?

errors created by forecasting independent variables for use in forecasting a dependent variable

36
New cards

What is a logistic regression (logt) model?

Represents the dependent variable as a natural logarithm of probability ratios (confiding results to a range between 0 and 1)

37
New cards

When should a logistic regression (logit) model be used?

When the dependent variable is discrete (i.e. not continuous)

38
New cards

What is the stochastic part of a model?

The error term

39
New cards

What is the next step after estimating the regression model?

Analyse scatterplots of variables and residuals

40
New cards

What is the next step after analysing the scatterplots of variables and residuals?

Seeing if the regression assumptions are satisfied

41
New cards

What is the next step after seeing if the regression assumptions are satisfied (and they are)

Checking if the goodness of fit is satisfactory/ significant

42
New cards

What is the next step after seeing if the regression assumptions are satisfied (and they are not)

adjust the model

43
New cards

What is the next step after checking if the goodness of fit is satisfactory/ significant? (and they are)

test with out of sample date

44
New cards

What is the next step after checking if the goodness of fit is satisfactory/ significant? (and they are not)

adjust the model

45
New cards

In terms of interpreting (scatterplot) relationships, do we want to have little or no correlation, negative correlation, or positive correlation?

We want to have little or no correlation because it suggests low multicollinearity of those variables, which is a desirable characteristic. This tells us that each variable provides unique information, leading to mode stable and reliable coefficient estimates and simplifies model interpretation and enhances performance by avoiding redundancy among predictors

46
New cards

Based on p-values, when would it be correct to reject the null hypothesis?

If the p-value for the independent variable is < less than the level of significance, but you should not reject the null if the p-value is greater than the level of significance

47
New cards

How many dummy variables should be used to incorporate qualitative independent variables into a regression model?

n - 1 dummy variables

48
New cards

If we had a concern that a model might have an artificially large R² and t-statistics that are understated, what regression assumption is likely violated?

Multicollinearity - standard errors for each coefficient become inflated which results in understated t-statistics, which in turn leads to coefficients being incorrectly classified as not statistically significant. It would also have inflated R² and F-statistic values, and seem to be a better fit than it actually is

49
New cards

When does multicollinearity occur?

when at least two independent variables are highly correlated

50
New cards

What does the standard error of the forecast do?

Quantifies uncertainty around the prediction, NOT improves the forecasting of the dependent variable

51
New cards

What is Model Specification

Set of variables included in the regression and the regression equations functional form

52
New cards

What does it mean to have a sound economic basis for your model?

Economic reasoning behind the choice of variables and their interactions

53
New cards

What does parsimony mean?

Less is more - each variable plays an essential role, additional variables don’t add spurious accuracy

54
New cards

What does good in-sample but bad out-of-sample performance mean?

This would be an example of overfitting: an overfit model explains the data used to fit in, but may not work well with data outside the set

55
New cards

What does appropriate functional form mean?

A model should incorporate non-linear forms, if appropriate

56
New cards

What is Homoskedasticity?

(The ONE you want) Constant variance and one assumption for valid regression

57
New cards

What is Heteroskedasticity?

(Not the one you want) Nonconstant variance and violates assumptions

58
New cards

What are the types of heteroskedasticity?

Unconditional (not a problem in linear regression) and Conditional (size of error terms is related to value of the independent variables, and is a problem in linear regression)

59
New cards

How do you detect Heteroskedasticity?

Breusch-Pagan (BP) test - one-tail chi-square test

60
New cards

What is positive serial correlation?

Residuals tend to go in groups which violates assumptions

61
New cards

What is negative serial correlation?

Residuals tend to bounce back and forth which violates assumption

62
New cards

Are coefficient estimates largely affected or unaffected for positive/ negative serial correlation?

unaffacted for positive, affected for negative

63
New cards

In terms of serial correlation, what does an F-stat that is too large indicate? too small?

Too large means positive, too small means negative

64
New cards

Are standard errors too high or too low for positive/ negative serial correlation?

too low for positive, too high for negative

65
New cards

Are there more Type I or Type II errors for positive/ negative serial correlation

More Type I in positive, More Type II in negative

66
New cards

Is False significance/ false insignificance associated with positive/ negative serial correlation

False insignificance is associated with negative serial correlation, false significance with positive serial correlation

67
New cards

How to test for serial correlation?

Durban-Watson Test and Breusch-Godfrey Test

68
New cards

Can serial correlation be eliminated?

No, serial correlation cannot be eliminated, the standard errors simply account for it

69
New cards

What is multicollinearity

Two or more independent variables are highly correlated with each other

70
New cards

What are the effects of multicollinearity

  • model estimates of dependent variable are unaffected

  • Standard errors of coefficients are too large: t-stat are too small

71
New cards

How do you detect multicollinearity?

Visually, it will look absolutely fine on the scatter plot.

However, a high R-squared and a significant F-stat with insignificant t-stats (very low) for all slope coefficients is evidence of multicollinearity, or

Multicollinearity may exist even when the F-stat is insignificant or t-statistics are significant

72
New cards

What the is Variance Inflation Factor (VIF)?

a VIF exists for each independent variable in a multiple regression:

VIF = 1/ 1 - R²

Each independent variable is regressed against the other independent variables.

VIF>5 warrants further investigation of the given independent variable

VIF>10 indicates serios multicollinearity requiring correction

73
New cards

How do you correct for multicollinearity?

  • Exclude on or more independent variables from the model until multicollinearity is no longer present, or

  • Use a different proxy for one of the variables

  • Increase the sample size

74
New cards

Where does serial correlation typically occur?

time-series data sets

75
New cards

What does the Breusch-Godfrey test check for?

Checks the regression for serial correlation

76
New cards

What does Variance Inflation Factor (VIF) test for?

Multicollinearity

77
New cards

What does the Breusch-Pagan test for?

Tests for conditional heteroskedasticity

78
New cards

What is a potential consequence of omitted variables?

Heteroskedasticity or serial correlation

79
New cards

What is a potential consequence of inappropriate variable form?

Heteroskedasticity

80
New cards

What is a potential consequence of inappropriate variable scaling?

Heteroskedasticity or Multicollinearity

81
New cards

What is a potential consequence of inappropriate data pooling?

heteroskedasticity or serial correlation

82
New cards

Can patterns in serially correlated residuals contain information that has the potential to be exploited?

yes

83
New cards

Does conditional or unconditional heteroskedasticity cause errors in statistical inference?

Conditional

84
New cards

What does good out-of-sample performance mean?

Model generalises well (low risk of overfitting or underfitting)

85
New cards

What are the examples of potentially influential data points? (not violations of assumptions)

  • High-leverage points

  • Outliers

  • Influential Observations

86
New cards

What is a high-leverage point?

An extreme value of an independent variable

87
New cards

What is an outlier?

An extreme value of a dependent variable

88
New cards

What is an influential observation?

An observation whose inclusion may significantly alter regression results

89
New cards

What is the Measure of Leverage?

Leverage measures the distance between the value of the i-th observation of that independent variable and the mean value of that variable across all n observations:

0 < Leverage < 1

90
New cards

How do you look for high-leverage position?

Measure of Leverage

91
New cards

What does a high measure of leverage mean? low?

The higher the leverage, the more distant the observation from the mean for the variable

92
New cards

How do we determine if a point has a high measure of leverage?

h > 3((k + 1) / n)

93
New cards

In what scenario may multicollinearity not be a major issue

If the goal of the analysis is to predict the dependent variable, rather than to understand the roles of the independent variables

94
New cards

What is my story prompt for remembering the regression process?

Eager Captains ESpecially Study Sailors Guarding The Buried Past

95
New cards

What is a studentized residual?

Quotient resulting from the division of a residual by an estimate of its s.d., a form of a students t-stat with the estimate of error varying between points

= e/s

96
New cards

What is Cook’s distance?

A measure of how much the estimate values of the regressed change if observation i is deleted from the sample

97
New cards

What does it say about the observation if Cook’s Distance (D) is > 0.5

May be influential and merits further investigation

98
New cards

What does it say about the observation if Cook’s Distance (D) is > 1.0

Highly likely to be an influential data point

99
New cards

What does it say about the observation if Cook’s Distance (D) is

> 2 x (k/m)^0.5

highly likely to be an influential data point

100
New cards

Does the measure of Leverage apply to Dependent or Independent variables?

Independent