Lvl2: Quantitative Methods

0.0(0)
Studied by 0 people
call kaiCall Kai
Locked
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
GameKnowt Play
Card Sorting

1/59

encourage image

There's no tags or description

Looks like no tags are added yet.

Last updated 12:46 AM on 7/3/26
Name
Mastery
Learn
Test
Matching
Spaced
Call with Kai
Chat

No analytics yet

Send a link to your students to track their progress

60 Terms

1
New cards

Dependent variable is continuous vs discrete

Continuous: the traditional regression model

Discrete: logistic regression

2
New cards

Regression Process

  1. Analyze the residuals

  2. Examine the goodness of fit - significance of fit

3
New cards

Assumptions in a simple regression

  1. Linearity - Dependent & independent

  2. Homoskedasticity - same variance of regression residuals

  3. Independence of errors - observations are independent; regression residuals are uncorrelated

  4. Normality - regression residuals normal distribution

  5. Independence of independent variables - not random; no linear relation between ind variables

4
New cards

normal Q-Q plot

visualize the distribution of a variable (regression residual) by comparing it to a normal distribution

<p><span>visualize the distribution of a variable (regression residual) by comparing it to a normal distribution</span></p>
5
New cards

coefficient of determination (R-squared)

ratio of the variation of the dependent variable explained by the independent variables (sum of squares regression) to the total variation of the dependent variable (sum of squares total)

<p><span>ratio of the variation of the dependent variable explained by the independent variables (sum of squares regression) to the total variation of the dependent variable (sum of squares total)</span></p>
6
New cards

Disadvantages of R-squared

  1. cannot provide information on whether the coefficients are statistically significant

  2. biases in the estimated coefficients and predictions

  3. cannot tell whether the model fit is good - bad model may have a high R2 due to overfitting and biases in the model

7
New cards

overfitting

model is too complex - too many independent variables relative to the number of observations in the sample

8
New cards

adjusted R-squared

does not automatically increase when another independent variable is added to a regression

R2 is strictly greater than adjusted R2

adjusted R2 may be negative, whereas the R2 has a minimum of zero

<p><span>does not automatically increase when another independent variable is added to a regression</span></p><p><em>R</em><sup>2</sup><span> is strictly greater than adjusted </span><em>R</em><sup>2</sup></p><p><span>adjusted </span><em>R</em><sup>2</sup><span> may be negative, whereas the </span><em>R</em><sup>2</sup><span> has a minimum of zero</span></p>
9
New cards

Akaike’s information criterion (AIC)

lower AIC indicates a better-fitting model

<p><span>lower AIC indicates a better-fitting model</span></p>
10
New cards

Schwarz’s Bayesian information criterion (BIC)

BIC assesses a greater penalty for having more parameters in a model

<p><span>BIC assesses a greater penalty for having more parameters in a model</span></p>
11
New cards

AIC vs BIC

AIC is preferred if the model is used for prediction purposes

BIC is preferred when the best goodness of fit is desired

12
New cards

Test whether a variable is significant in explaining the dependent variable’s variation

H0: bj = 0 and Ha: bj ≠ 0

13
New cards

F-distributed test statistic

q is the number of restrictions

<p><em>q</em><span> is the number of restrictions</span></p>
14
New cards

How to test for significance

  1. Define hypothesis

  2. Find critical value

  3. Reject the null if calculated statistic exceeds critical value

  4. If fail to reject the null i.e. null is correct

15
New cards

general linear F-test

test the null hypothesis that slope coefficients on all variables are equal to zero

<p><span>test the null hypothesis that slope coefficients on all variables are equal to zero</span></p>
16
New cards

Omitted variable bias

  • omission of an important independent variable

  • If the omitted variable is uncorrelated with X1, the coefficient for X1 will still be estimated correctly

17
New cards

Misspecified Regression

knowt flashcard image
18
New cards

Unconditional heteroskedasticity

error variance is not correlated with the regression’s independent variables - no major problems for statistical inference

19
New cards

Conditional heteroskedasticity

error variance is correlated with (conditional on) the values of the independent variables

  • t-statistics will be inflated

  • tend to find significant relationships where none actually exist

  • more Type I errors (rejecting the null hypothesis when it is actually true)

20
New cards

Breusch-Pagan (BP) test

test for conditional heteroskedasticity

<p>test for conditional heteroskedasticity</p>
21
New cards

heteroskedasticity-consistent standard errors

robust standard errors

  • adjust the standard errors of the regression’s estimated coefficients to account for the heteroskedasticity

22
New cards

serial correlation or autocorrelation

regression errors are correlated across observations

  • incorrect estimate of the regression coefficients’ standard errors

  • no adjustment required if none of the regressors is a lagged value of the dependent variable

  • more Type I errors

23
New cards

positive vs negative serial correlation

positive residual for one observation increases the chance of a positive residual in a subsequent observation

a positive residual for one observation increases the chance of a negative residual for another observation

24
New cards

Durbin-Watson (DW) test

  • measure of autocorrelation

  • compares the squared differences of successive residuals with the sum of the squared residuals

  • applies only to testing for first-order serial correlation

  • ranges from 0 to 4 (~2 = no autocorrelation, < 2 positive. >2 negative)

25
New cards

Breusch-Godfrey (BG) test

  • can detect autocorrelation up to a pre-designated order p, where the error in period t is correlated with the error in period tp

  • npk – 1 and p degrees of freedom, where p is the number of lags

26
New cards

serial -correlation consistent standard errors

adjust the coefficient standard errors to account for the serial correlation

27
New cards

multicollinearity

when two or more independent variables are highly correlated or when there is an approximate linear relationship among independent variables

  • impossible to distinguish the individual impacts of the independent variables

  • diminished t-statistics, so t-tests of coefficients have little power (ability to reject the null hypothesis)

28
New cards

Variance inflation factor (VIF)

  • VIFj > 5 warrants further investigation of the given independent variable

  • VIFj >10 indicates serious multicollinearity requiring correction

<ul><li><p><span>VIF</span><em><sub>j</sub></em><span> &gt; 5 warrants further investigation of the given independent variable</span></p></li><li><p><span>VIF</span><em><sub>j</sub></em><span> &gt;10 indicates serious multicollinearity requiring correction</span></p></li></ul><p></p>
29
New cards

correct for multicollinearity

  • excluding one or more of the regression variables

  • using a different proxy for one of the variables

  • increasing the sample size

30
New cards

influential observation

an observation whose inclusion may significantly alter regression results

31
New cards

high-leverage point

data point having an extreme value of an independent variable

32
New cards

outlier data point

data point having an extreme value of the dependent variable

33
New cards

leverage (hii) - detecting high-leverage point

  • distance between the value of the ith observation of that independent variable and the mean value of that variable across all n observations

  • value between 0 and 1

  • if an observation’s leverage exceeds

<ul><li><p><span>distance between the value of the </span><em>i</em><span>th observation of that independent variable and the mean value of that variable across all </span><em>n</em><span> observations</span></p></li><li><p><span>value between 0 and 1</span></p></li><li><p><span>if an observation’s leverage exceeds </span></p></li></ul><p></p>
34
New cards

studentized residuals - detecting outliers

  • compared to the critical value of the t-distributed statistic with (nk − 2) degrees of freedom

  • |ti*| > 3 - outlier

  • |ti*| > critical value of t-statistic - potentially influential

<ul><li><p><span>compared to the critical value of the </span><em>t</em><span>-distributed statistic with (</span><em>n</em><span> − </span><em>k</em><span> − 2) degrees of freedom</span></p></li><li><p><span>|</span><em>t<sub>i</sub>*</em><span>| &gt; 3 - outlier</span></p></li><li><p><span>|</span><em>t<sub>i</sub>*</em><span>| &gt; critical value of </span><em>t</em><span>-statistic - potentially influential</span></p></li></ul><p></p>
35
New cards

dummy variable

  • takes on a value of 1 if a particular condition is true and 0 if that condition is false

  • to distinguish among n categories, we need n − 1 dummy variables - the category not assigned becomes the “base” or “control” group

36
New cards

Logistic regression (logit)

The natural logarithm (ln) of the odds of an event happening

<p><span>The natural logarithm (ln) of the odds of an event happening</span></p>
37
New cards

maximum likelihood estimation (MLE) method

  • estimates logistic regression coefficients

  • a chi-square-distributed test statistic

38
New cards

likelihood ratio (LR) test

  • to assess the fit of logistic regression models

  • LR = −2 × (Log-likelihood restricted model − Log-likelihood unrestricted model)

  • chi-squared with q degrees of freedom

  • log-likelihood metric is always negative, so higher values (closer to 0) indicate a better-fitting model

39
New cards

Problems with a time series

  • serial correlation in the error term causes estimates of the intercept (b0) and slope coefficient (b1) to be inconsistent - independent variable is a lagged variable of the dependent

  • The mean or variance of the time series changes over time

40
New cards

log-linear model

ln yt = b0 + b1t + εt, t = 1, 2, . . . , T.

41
New cards

Covariance-Stationary

properties, such as mean and variance, do not change over time

  • the expected value of the time series must be constant and finite in all periods

  • variance of the time series must be constant and finite in all periods

  • covariance of the time series with itself for a fixed number of periods in the past or future must be constant and finite in all periods

42
New cards

standard error of the residual correlation

knowt flashcard image
43
New cards

mean-reverting level

knowt flashcard image
44
New cards

root mean squared error (RMSE)

  • compare the out-of-sample forecasting performance

  • square root of the average squared error

  • smallest RMSE is judged the most accurate

45
New cards

random walk

  • value of the series in one period is the value of the series in the previous period plus an unpredictable random error

  • error term, εt, has constant variance and is uncorrelated with the error term in previous periods

  • b0 = 0 and b1 = 1

  • the expected value of εt is zero

  • best forecast of xt that can be made in period t − 1 is xt−1

  • currency exchange rates

  • undefined mean-reverting level

  • for any period t, the variance of xt = (t − 1)σ2

  • not a covariance-stationary time series, because a covariance-stationary time series must have a finite variance

46
New cards

first-differencing

  • subtracts the value of the time series in the first prior period from the current value of the time series

  • mean-reverting level of the first-differenced model as b0/(1 − b1) = 0/1 = 0

  • variance of yt in each period is var(εt) = σ2

  • variance and the mean of yt are constant and finite in each period, yt is a covariance-stationary time series

<ul><li><p><span>subtracts the value of the time series in the first prior period from the current value of the time series</span></p></li><li><p><span>mean-reverting level of the first-differenced model as </span><em>b</em><sub>0</sub><span>/(1 − </span><em>b</em><sub>1</sub><span>) = 0/1 = 0</span></p></li><li><p><span>variance of </span><em>y<sub>t</sub></em><span> in each period is var(ε</span><em><sub>t</sub></em><span>) = σ</span><sup>2</sup></p></li><li><p><span>variance and the mean of </span><em>y<sub>t</sub></em><span> are constant and finite in each period, </span><em>y<sub>t</sub></em><span> is a covariance-stationary time series</span></p></li></ul><p></p>
47
New cards

random walk with drft

  • random walk with drift has b0 ≠ 0, compared to a simple random walk, which has b0 = 0

<ul><li><p><span>random walk with drift has </span><em>b</em><sub>0</sub><span> ≠ 0, compared to a simple random walk, which has </span><em>b</em><sub>0</sub><span> = 0</span></p></li></ul><p></p>
48
New cards

unit root

  • lag coefficient is equal to 1.0

  • all random walks, with or without a drift term, have unit roots

  • not covariance stationary

49
New cards

Dickey and Fuller test

  • unit root test

  • xtxt−1 = b0 + (b1 − 1)xt−1 + εt —→ b0 + g1xt−1 + εt

  • a test of g1 = 0 is a test of b1 = 1

  • H0: g1 = 0; Ha: g1 < 0

50
New cards

n-period moving average

  • to remove short-term fluctuations or noise by smoothing out the time series of sales

  • moving average of the current and past n − 1 values

<ul><li><p><span>to remove short-term fluctuations or noise by smoothing out the time series of sales</span></p></li><li><p><span>moving average of the current and past </span><em>n</em><span> − 1 values</span></p></li></ul><p></p>
51
New cards

MA(1) - moving-average model of order 1

  • moving average of εt and εt−1

  • First: examine the variance of xt and its first two autocorrelations

  • first autocorrelation is not equal to 0, but the second and higher autocorrelations are equal to 0

  • MA(1) model has a memory of one period

<ul><li><p><span>moving average of ε</span><em><sub>t</sub></em><span> and ε</span><em><sub>t</sub></em><sub>−1</sub></p></li><li><p>First: <span>examine the variance of </span><em>x<sub>t</sub></em><span> and its first two autocorrelations</span></p></li><li><p><span>first autocorrelation is not equal to 0, but the second and higher autocorrelations are equal to 0</span></p></li><li><p><span>MA(1) model has a memory of one period</span></p></li></ul><p></p>
52
New cards

AR vs MA

autocorrelations of most autoregressive time series start large and decline gradually, whereas the autocorrelations of an MA(q) time series suddenly drop to 0 after the first q autocorrelations

53
New cards

autoregressive moving-average (ARMA) model

  • p autoregressive terms and q moving-average terms, denoted ARMA(p, q)

  • parameters in ARMA models can be very unstable

  • criteria for deciding on p and q for a particular time series are far from perfect

54
New cards

Autoregressive Conditional Heteroskedasticity Models (ARCH)

  • If the estimate of a1 is statistically significantly different from zero, we conclude that the time series is ARCH(1)

<ul><li><p><span>If the estimate of </span><em>a</em><sub>1</sub><span> is statistically significantly different from zero, we conclude that the time series is ARCH(1)</span></p></li></ul><p></p>
55
New cards

ARCH - predict variance of errors in period t+1

knowt flashcard image
56
New cards

2 time series - one dependent, one independent variable

  • test for unit root - DF test

  • one of them has a unit root - not covariance stationary; one or more of linear regression assumptions violated; coefficients and standard error inconsistent; coefficient appears significant but is not

  • both have a unit root - establish if cointegrated

57
New cards

cointegrated

long-term financial or economic relationship exists between them such that they do not diverge from each other

58
New cards

cointegrated vs not

  • not; error term not covariance stationary; some regression assumptions will be violated; regression coefficients and standard errors will not be consistent, and we cannot use them for hypothesis tests

  • yes; error term is covariance stationary; regression coefficients and standard errors will be consistent, and we can use them for hypothesis tests; may not be the best model of the short-term relation

59
New cards

cointegration test

  • use the critical values computed by Engle and Granger

  • fails to reject - not cointegrated

  • reject - cointegrated

60
New cards

expected total holding period cost

trading costs = round-trip commission + bid-ask spread

management fees = fee * period