1/76
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced |
|---|
No study sessions yet.
causality
specific action leads to specific outcome
cross sectional data
data collected across sample units in a particular time period
time series data
data from a single entity that chnages over time
pooled cross sections
mix of cross section and time series data
panel data
data that combines cross-sectional and time series data, tracking multiple entities over time.
balanced panel
A type of panel data where each entity is observed the same number of times over the same time periods.
outcomes
potential results
probability
proportion of time that outcome occurs in the long run
sample space
set of all outcomes
probability distribution
list of all different outcomes and their probability
cumulative probability distribution
probability that a random variable is less than or equal to a particular value
bernoulli random variable
binary variable
probability density distribution
probability that the random variable falls between 2 particular points
variance
measures spread of a probability distribution
standardization
random variable w mean 0 &variance 1
law of iterated expectations
multivariate normal distribution
joint notmal distribution
marginal distribution
joint probability distribution
bivariate normal distribution
2 random variables are normally distributed
random sampling
selecting sample randomly from population
identically distributed
all of the variables have the same marginal distribution
independently sampled
sample is made randomly
IID
independently and identically distributeed
the law of large numbers
as the size of a sample increases, sample mean will approach the population mean (if IID, large outliers are unlikely)
central limit theorem
the distribution of the sample mean will be well approximated by a normal distribution when n is large
estimators
guesses for the real thing
p-value
the porbability of drawing a statistic adverse to null hypothesis, assuming the hypothesis is correct
type I error
reject true H0
type II error
accept false H0
OLS estimators
chooses the regression coefficients so the estimated regression line is as close as possible to ht eobserved data
R²
the ratio of the sample variance of Y^to the one of Y
ESS
the sum of suared derivations of the predicted value from its average
TSS
the sum of squared derivations of Yi from its average
SER
an estimator of the standard derivation of the regression ui
in sample prediction
the observation for which ther prediction was made is also used in teg regression coefficients
out of sample prediction
prediction for observations not in the estimation sample
Least squares assumptions
mathematical assumptions under which OLS estimates the causal effects
ommited variable bias
bias in the OLS estimator that arises when the regressor is correlated with an omitted variable
2 conditions for omitted variable bias
X is correlated with the omitted variable
the omitted variable is a determinant of the dependent variable Y
OLS regression line
straight line constructed using OLS
perfect multicollinearity
if one of the regressors is a perfect linear function of the other regressors
imperfect multicollinearity
2(+) regressors are highly correlated
solutions for multicollinearity
obtain more information
introduce nonsample information
Principal Component Analysis
dummy variable trap
when dummy variables are perfectly collinear, typically from including all categories, leading to multicollinearity in regression
control variable
not the interest of the study, included to not suffer from omitted variable bias
joint hypotheses
imposes two or more restrictions on the general coefficients
F statistic
used to test a join thypothesis about regresison coefficients
restricted regression
H0 is forced to be true
unrestricted regression
alternative hypothesis is allowed to be true
base specification
a core of base set regressors, contains variables of interest
alternative specification
set of control regressors enz
Nonlinear Least Squares
nonlinear functions cannot be estimated by OLS, but can be by nonlinear least squares (consistent, normally distributed in large samples)
Chow test
checks if two groups have different regression lines by testing if their coefficients are the same.
internal validity
statistical inferences about causal effects
external validity
if regression findings can be generalized to population
when are studies internally valid
if the estimated regression coefficients are unbiased & concistent
solution omitted variable bias
include the variable
include control variable if they lead to
buest guess model
predicts outcomes using the most likely values based on available data, often using the mean or most common value
the buest guess variable us uncorrelated with the error term
what happens w intentional miserporting
the coefficient will be biased
sample selection bias
when certain individuals or groups have a higher chance of being included in the sample due to factors related to the outcome of interest
simultaneous causality
when X and Y are influencing eachother
sources of inconsistency of OLS errors
improperly handled heteroscadisticy
correlation of error term across observations (autocorrelations)
solution heteroskadisticity
robust SE
3 requirements of reliable prediction
data used to estimate prediction en observation for which prediction is to be made are drawn from same distribution (LSA 1)
list of predictors aim to estimate causal effects
many predictors —> prove more accurate out of sample
before and after panel data comparisons
if things like cultural impact remain the same, you can see influence of other variable, doesn’t apply when T>2
entity fixed effects regression
method for controlling omitted variables in panel data when the omitted variables vary across entitites but don’t change over time
time fixed regression
fixed regression for time variables
what is the expected value of LPM
Y=1
coefficient in probit model B1
the change in z score associated with 1 unit change in X
nonlinear least squares
OLS but for probit & logit estimators
likelihood function
joint probability function of the data, treated as a function of unknown coefficients
maximum likelihood estimator
consists of the values of the coefficients that maximize the likelihood function
McFadden-R²
goodness-of-fit measure for probit and logit
marginal effects of changes in variables
to interpret the coefficients of probit and logot
odds ratios
represent the probability of a success compared with the probability of a failure in logit model