1/54
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
random variable
a numerical summary of a random outcome
outcome
the mutually exclusive result of a random process
variable
a measurable characteristic of a population
sample space
the set of all possible outcomes of a random process
event
a subset of the sample space
estimator
the function of the data in the sample derived to infer the estimand
estimand
the true value in the observable population which is to be estimated
target/structural parameter
the specific unknown population parameter that is to be estimated
central limit theorem
when N is sufficiently large, the distribution of the estimated mean becomes more normal
Gauss-Markov OLS Assumptions
Exogeneity
No multicolinearity
Linear relationship between dependent var & independent var
Homoskedasticity
No autocorrelation
Normally distributed error term
properties of OLS under Gauss-Markov assumptions
B-L-U-E
Best linear unbiased estimator
exogeneity
the x-variables and the error term are not correlated: E(εi | X) = 0. Therefore, neither the error term nor the dependent variable influence the explanatory variables since they are determined outside of the model.
multicollinearity
a breach in Gauss-Markov correlation between ≥2 explanatory variables, probably because they measure a similar trait
homoskedasticity
the variance (σ) of the error term (ε) is constant throughout the sample. Therefore, the dispersion of residuals is similar for all X
heteroskedasticity
the variance (σ) of the error term (ε) is not constant throughout the sample. Therefore, the dispersion of residuals is dissimilar for all X. This violates Gauss-Markov assumptions of OLS.
autocorrelation / serial correlation
the correlative relationship between an independent variable and its own past values. This violation of OLS assumptions often occurs in time series data.
endogeneity
correlation between explanatory vars and the error term such that there is a bilateral causal relationship between the X and Y variables.
residuals
the differences between observed (actual) values and the estimated values predicted by the model
grounds to reject the null hypothesis (Ho) and propose the alternative hypothesis (Ha) / statistical significance
p-value < critical value
insufficient grounds to reject the null hypothesis / statistical insignificance
p-value > critical value
p-value
the probability of observing a z-stat, t-stat, F-stat, etc. with an absolute value ≥ the observed results
Triple S method of analysing variable coefficients
sign, size & significance of the a variable’s estimated coefficient
dummy variable
a numerical var expressed as 0 or 1 to represent categorical data, often gender, race, union membership, etc.
elasticity
the % change of the dependent var due to 1% change in the independent var
linear-linear model (Y = f[X])
change Y = beta change X
linear-log model (Y=f[logX])
change Y = beta/100 % change X
log-linear model (logY = f[X])
% change Y = (100)beta change X
log-log model (logY = f[logX])
% change Y = beta% change X
internal validity
a regression that successfully yields inferences applicable to the chosen population
external validity
a regression whose inferences made from a sample can also be applied to other populations
Variance Inflation Factor (VIF)
a method of identifying multicollinearity by quantifying how much correlation between predictor variables inflates the variance of a regression coefficient. This index = 1/(1-R²). To run this in STATA, use command
Breusch-Pagan test for heteroskedasticity
Ho: constant variance/hetsked (σ1 = σ2, etc.)
Ha: inconstant variance/homosked (σ1 ≠ σ2, etc).
To run this test in Stata, use the command ESTAT HETTEST
robust standard errors
standard errors adjusted for heteroskedastiicity. HOW TO CALC IN STATA
standard errors
= variance / square root( # of observations)
Type I neoclassical measurement error
the error is uncorrelated with the true value of the variable (eg: independent inaccuracies in reporting one’s weight)
Type II neoclassical measurement error
the error is correlated with the true-value or with other variables (eg: many observations intentionally misrepresent a characteristic like income)
conditions for instrumented regression
relevance: the instrument must correlate with the problematic endogenous variable.
exclusive restriction: the instrument only affects the outcome through the endogenous x-variable
rule of thumb for weak instrument identification
F-stat < 10 for a significance test for
Hausman test
A test to determine if the estimator is consistent & efficient (adheres to BLUE)
Ho: the regressor is exogenous (E(Xiεi) = 0)
Ha: the regressor is endogenous (E(Xiεi) ≠ 0)
linear probability model
a OLS model following the binomial distribution that uses limited dependent variables. These models can suffer from issues like predicted probabilities outside the 0-1 range and heteroskedasticity.
probit model
the cumulative distribution function of independent variables which models the probability of an event’s occurrence. This model follows the standard normal distribution. Use STATA command LOGIT.
logit model
the log of the probability of an event’s occurrence. this model follows the logistic distribution and is interpreted as the “odds” of an event happening. Use STATA command LOGIT
latent variable
a variable that cannot be observed, but can be inferred from other observable variables
maximum likelihood estimation
estimating the parameters of an assumed probability distribution based on some observed data to maximise a likelihood function so that, under the assumed statistical model, the observed data is most probable.
MARGINS command
finds marginal effects
multinomial regressions
regressions for categorical data with no order/ranking
cross-sectional data
time-series data
panel data
a combination of cross-sectional data and time series data
Chow Test for structural change
Ho: coefficients are the same for every Y
Ha: coefficients are different for every Y
Difference-in-Diffferences
Causal estimator method of using control and treatments groups to examine trends in 2 groups pre- and post-intervention. This method addresses biases from pre-existing differences between the two groups and omitting time trends that would have occurred regardless of the intervention.
This method assumes:
parallel trend
exogeneity
conditional independence
balanced panel data
panel data with an equal number of observations in each cross-section and time period
unbalanced panel data
panel data with an unequal number of observations in each cross-section and time period
Fixed Effect Model
STATA command XTSET
organises panel data properly