Adv Econometrics III

5.0(1)
studied byStudied by 12 people
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
Card Sorting

1/71

flashcard set

Earn XP

Description and Tags

Key concepts & common STATA commands.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

72 Terms

1
New cards

random variable

a numerical summary of a random outcome

2
New cards

outcome

the mutually exclusive result of a random process

3
New cards

variable

a measurable characteristic of a population

4
New cards

sample space

the set of all possible outcomes of a random process

5
New cards

event

a subset of the sample space

6
New cards

estimator

the function of the data in the sample derived to infer the estimand

7
New cards

estimand

the true value in the observable population which is to be estimated

8
New cards

target/structural parameter

the specific unknown population parameter that is to be estimated

9
New cards

central limit theorem

when N is sufficiently large, the distribution of the estimated mean becomes more normal. Therefore, the variance (σ2) becomes more predictable.

10
New cards

Gauss-Markov OLS Assumptions

  1. Exogeneity

  2. No multicolinearity

  3. Linear relationship between dependent var & independent var

  4. Homoskedasticity

  5. No autocorrelation

  6. Independent & normally distributed error term

11
New cards

properties of OLS under Gauss-Markov assumptions

B-L-U-E

Best linear unbiased estimator

12
New cards

exogeneity

the x-variables and the error term are not correlated: E(εi | X) = 0. Therefore, neither the error term nor the dependent variable (Y) influence the explanatory variables (X) since they are determined outside of the model.

13
New cards

multicollinearity

a breach in Gauss-Markov correlation between ≥2 explanatory variables, probably because they measure a similar trait

14
New cards

homoskedasticity

the variance (σ2) of the error term (ε) is constant throughout the sample. Therefore, the dispersion of residuals is similar for all X. This can be visually detected through a rectangle-shaped mass of residuals in a scatter plot of residuals (the absence of change in the residuals as X changes)

15
New cards

heteroskedasticity

The variance (σ2) of the error term (ε) is not constant throughout the sample. Therefore, the dispersion of residuals is dissimilar for all X. This violates Gauss-Markov assumptions of OLS, meaning that the standard errors are no longer efficient; however, the sample is still unbiased. In terms of BLUE, the Heteroskedasticity sample is no longer efficient (E). This can be visually detected through a conical or trumpet-shaped pattern in a scatter plot of the residuals.

16
New cards

autocorrelation / serial correlation

the correlative relationship between an independent variable and its own past values. This violation of OLS assumptions often occurs in time series data.

17
New cards

endogeneity

correlation between explanatory vars and the error term, violating OLS assumptions: E(εi | X) ≠ 0. This creates a bilateral causal relationship between the X and Y variables. This is best detected through the Hausman test in STATA w/ the command

18
New cards

residuals

the differences between observed (actual) values and the estimated values predicted by the model. This is shown in SSR

19
New cards

grounds to reject the null hypothesis (Ho) and propose the alternative hypothesis (Ha) / statistical significance

p-value < critical value

20
New cards

insufficient grounds to reject the null hypothesis / statistical insignificance

p-value > critical value

21
New cards

p-value

the probability of observing a z-stat, t-stat, F-stat, etc. with an absolute value ≥ the observed results (more extreme than the observed results)

22
New cards

Triple S method of analysing variable coefficients

sign, size & significance of the a variable’s estimated coefficient

23
New cards

dummy variable

a numerical var expressed as 0 or 1 to represent categorical data, often gender, race, union membership, etc.

24
New cards

elasticity

the % change of the dependent var due to 1% change in the independent var

25
New cards

linear-linear model (Y = f[X])

change Y = beta change X

26
New cards

linear-log model (Y=f[logX])

change Y = beta/100 % change X

27
New cards

log-linear model (logY = f[X])

% change Y = (100)beta change X

28
New cards

log-log model (logY = f[logX])

% change Y = beta% change X

29
New cards

internal validity

a regression that successfully yields inferences applicable to the chosen population

30
New cards

external validity

a regression whose inferences made from a sample can also be applied to other populations

31
New cards

Variance Inflation Factor (VIF)

a method of identifying multicollinearity by quantifying how much correlation between predictor variables inflates the variance of a regression coefficient. This index = 1/(1-R²), multilolinearity is generally of concern if (index >10). To run this in STATA, use command VIF.

32
New cards

Breusch-Pagan test for heteroskedasticity

Ho: constant variance/hetsked (σ1 = σ2, etc.)

Ha: inconstant variance/homosked (σ1 ≠ σ2, etc).

To run this test in Stata, use the command ESTAT HETTEST

33
New cards

robust standard errors

standard errors adjusted for heteroskedastiicity. add Stata command , ROBUST following the last independent variable in a regression code line

34
New cards

standard errors

= variance / square root(# of observations)

35
New cards

Type I neoclassical measurement error

the error is uncorrelated with the true value of the variable (eg: independent inaccuracies in reporting one’s weight)

36
New cards

Type II neoclassical measurement error

the error is correlated with the true-value or with other variables (eg: many observations - often self-reported - intentionally misrepresent a characteristic like income, education)

37
New cards

conditions for instrumented regression

  1. relevance: the instrument must correlate with the problematic endogenous variable.

  2. exclusive restriction: the instrument only affects the outcome through the endogenous x-variable

38
New cards

rule of thumb for weak instrument identification

F-stat < 10 for a significance test for

Use the STATA command ESTAT FIRSTSTAGE

39
New cards

Hausman test

A test to determine if the estimator is consistent & efficient (adheres to BLUE)

Ho: the regressor is exogenous (E(Xiεi) = 0)

Ha: the regressor is endogenous (E(Xiεi) ≠ 0)

40
New cards

linear probability model

a OLS model following the binomial distribution that uses limited dependent variables. These models can suffer from issues like predicted probabilities outside the 0-1 range and heteroskedasticity.

41
New cards

probit model

the cumulative distribution function of independent variables which models the probability of an event’s occurrence. This model follows the standard normal distribution, hence its coefficients are interpreted as z-scores for the Probit Index. Use STATA command PROBIT.

42
New cards

logit model

the log of the probability of an event’s occurrence. this model follows the logistic distribution and is interpreted as the “odds” of an event happening. Use STATA command LOGIT

43
New cards

latent variable

a variable that cannot be observed, but can be inferred from other observable variables (eg: intelligence as measured through a test score). This type of variable often appears in logit or probit models

44
New cards

maximum likelihood estimation

estimating the parameters of an assumed probability distribution based on some observed data to maximise a likelihood function so that, under the assumed statistical model, the observed data is most probable.

45
New cards

MARGINS command

finds marginal effects

46
New cards

multinomial regressions

regressions for categorical data with no order/ranking that are calculated according to maximum likelihood estimation. This method assumes the independence of irrelevant alternatives and predicts the log odds of an observation being classified as a respective category.

47
New cards

cross-sectional data

data that provides a ‘snapshot’ of multiple observations at a given point in time (time is constant)

48
New cards

time-series data

data for only one variable collected at successive, recurring intervals to capture change over time

49
New cards

panel data

a combination of cross-sectional data and time series data

50
New cards

Chow Test for structural change

A statistical test to determine if the coefficients in two separate regression models are equal, often used in DiD regressions to examine changes between two groups and/or changes before/after an intervention.

Ho: coefficients are the same for every Y (no structural break exists)

Ha: coefficients are different for every Y (a structural break exists)

51
New cards

Difference-in-Diffferences

Causal estimator method of using control and treatments groups to examine trends in 2 groups pre- and post-intervention. This method addresses biases from pre-existing differences between the two groups and omitting time trends that would have occurred regardless of the intervention.

This method assumes:

  • parallel trend

  • exogeneity

  • conditional independence

52
New cards

balanced panel data

panel data with an equal number of observations in each cross-section and time period

53
New cards

unbalanced panel data

panel data with an unequal number of observations in each cross-section and time period

54
New cards

fixed effects (FE) Model

model for panel data that assumes characteristics to be fixed over time and correlated with observable variables

55
New cards

random effects (RE) model

a model for panel data that assumes unmeasurable characteristics in each observation to be random and uncorrelated with any of the model’s random variables

56
New cards

STATA command XTSET

organises panel data properly

57
New cards

error term (ε)

the unexplained portion of a dependent variable's variance that's not accounted for by the independent variables in a model

58
New cards

conditions for proper instrumented regression

  1. instrument correlates with the endogenous variable

  2. the instrument does not correlate with the error term

If these conditions are met, the instrument will only effect the endogenous variable

59
New cards

1st stage insstrumentation

60
New cards

2nd stage instrumentation

61
New cards

methods of addressing Heteroskedasticity

  • drop the hetsked variable

  • cluster the standard error

    • subdivide the hetsked variable into several new variables based on common traits and their residuals

  • use the log form of the hetsked variable

  • use hetsked robust standard errors (STATA command , ROBUST at the end of a regression command

62
New cards

taking the log of an explanatory variable (ln[X])

a strategy to address heteroskedasticity in an explanatory variable by stabilising its variance (σ2). If this is executed, the variable must now be interpreted as % change

63
New cards

F-test / joint F-test

A statistical test assessing the overall significance of a variable’s coefficient (B) to hence determine whether they should be included in a final regression model. This test can also compare the goodness-of-fit of several different suggested models. To run this test in STATA, use the command TEST X1 = X2 after a regression output.

Ho: B2 = B3 etc. = 0 (the included variables are not significantly different to 0 and therefore do not statistically explain change in Y)

Ha: B2 ≠ B3 etc. ≠ 0 (the included variables are significantly different from 0 and therefore do statistically explain change in Y)


64
New cards

order condition for instrumented regression

there is exactly one instrument for every endogenous variable

65
New cards

over-identication

there exist more instruments than endogenous variables

66
New cards

under-identification

there are insufficient instruments compared to endogenous variables

67
New cards

2-stage least squares (2sls)

68
New cards

long & narrow panel data

panel data w/ long time dimension & narrow range of subjects

69
New cards

short & wide panel data

panel data w/ short time dimension & wide range of subjects

70
New cards

long & wide panel data

panel data w/ long time dimension & wide range of subjects

71
New cards

short & narrow panel data

panel data w/ short time dimension & narrow range of subjects

72
New cards

heterogeneity bias

bias resulting from the omission of the unobserved fixed effect