1/69
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced |
|---|
No study sessions yet.
econometrics
using economic theory and statistical techniques to analyze economic data
examines the relationship between two or more variables
Examples of use:
testing of economic theories
forecasting
fitting economic models to real-world data
policy recommendations
Causality
action—> effect
causal effects best estimated using randomized controlled experiments
Forecast
knowledge of causal relationships not necessary
Sources of data
experimental data
observational data
Types of data
cross-sectional data
time series data
panel data
Linear regression model with a single regressor
Yi = β0 + β1Xi + ui
Interpretation of Yi = β0 + β1Xi + ui :
β1 . . . slope: by how much Y changes when X changes by one unit
β0 . . . intercept: what is the value of Y when X = 0
An intercept does not always have a real-world meaning (like in the class size example)
The intercept has a mathematical meaning
ui… error term: incorporates factors that influence Y but are not included in the model
ordinary least squares (OLS) estimators
values of b0 and b1 that minimize this sum
but minimizing the sum using calculus gives us explicit formulas:
βˆ 1 = ∑ n i=1 (Xi-but minimizing the sum using calculus gives us explicit formulas:
βˆ 1 = ∑ n i=1 (Xi X¯ ) (Yi Y¯ ) ∑ n i=1 (Xi X¯ ) 2 = sXY/s 2 X
βˆ 0 = Y¯ βˆ 1X
Why use OLS estimators?
widely used
software available
good theoretical properties
Two Measures of Fit
R²- fraction of variation in Y explained by X
SER- standard error of regression- how far Yi typically is from its predicted value
R²
Write: Yi=Yi^+ui^
Variation of Yi: TSS= ∑ (Yi-Y)²
Variation of Yi^: ESS= ∑(Yi^-Y)²
R² is the ratio of explained variation to total variation: R²= ESS/TSS
alterntaively, we can take variation in residuals, SSR ∑ ui^²
since TSS=ESS+SSR it is also R²=1-SSR/TSS
Properties of R²
0<=R²<=1
small R²: poor fit
large R²: good fit - X is good at predicting Y
R²= 0 when βˆ 1 = 0, ESS = 0 (i.e. fitted line is horizontal)
R² = 1 when Xi explains all variation in Yi (i.e. all datapoints lie on the fitted line)
SER
estimator of the standard deviation of ui
measure of the spread of the observations around the regression line
definition:
su^²= 1/n-2 ∑ui^²= SSR/ n-2
SER= su^=sqr(SSR/n-2)
Assumption 1
the conditional distribution of ui given Xi has mean zero
E (ui jXi) = 0
"regression line correct on average"
assumption satisfied in many cases, not satisfied in other cases
E (ui jXi) = 0 implies cov (Xi , ui) = 0
therefore cov (Xi , ui) =/ 0 implies E (ui jXi) =/ 0
if ui is correlated with Xi , assumption 1 is violated
Assumption 2
(Xi , Yi) are independent and identically distributed (i.i.d)
satisfied if observations drawn randomly from a population (in a survey)
example of non-independent data: time series
Assumption 3
Large outliers are unlikely
values outside the usual range are not very likely
outliers make OLS estimators misleading
finite fourth moments: 0 < E(Xi^4) < ∞ and 0 < E (Yi^4) < ∞ (funite kurtosis)
most distributions have finite fourth moments
source of outliers: data entry errors (typos etc.)
under homoskedasticity
βˆ 0 and βˆ 1 are efficient (have smallest variance in a class of all linear unbiased estimators)
Two sided test (T-statistic)
t = (estimator - hypothesized value)/standard error of the estimator
p-value
probability of obtaining a statistic at least as different from the null hypothesis value as is the actual statistic
the smallest significance level at which the null hypothesis could be rejected
When to use a one-sided test?
only use one-sided test when there is a good reason
economic theory
empirical evidence
when unsure, use two-sided test
95% confidence interval for β1
an interval that contains the true value of β1 with a 95% probability
the set of values of β1 that cannot be rejected by a 5% two-sided hypothesis test
βˆ 1 +- 1.96SE βˆ1
If for predicted change multiply by change
Binary variable
takes on only two values, 0 or 1
test statistic: t = βˆ 1 /SE(βˆ 1 )
95% confidence interval for β1 : βˆ 1 1.96 SE(βˆ 1
homoskedastic
if the variance of the conditional distribution of ui is constant for i = 1, . . . , n and in particular does not depend on Xi . Otherwise, the error term is heteroskedastic.
Properties of OLS estimators:
if OLS assumptions 1-3 hold, then OLS estimators are
unbiased
consistent
asymptotically normal
if in addition the errors ui are homoskedastic, the OLS estimators are also efficient (have smallest variance among all unbiased linear estimators)
Omitted Variable Bias
occurs when two conditions are true:
the omitted variable is correlated with the included regressor
corr(X, u)=/ 0
therefore E(ui|Xi)=/0 and the first least squares assumption is incorrect
OLS estimator is thus biased and inconsistent
the omitted variable is a determinant of the dependent variable
Formula:
pxu=Corr(Xi, ui)
suppose the second and theird least squares assumption hold
βˆ1-p→ β1 + ρXu (σu/σX)
if pxu=/0, then even in large samples βˆ1 does not converge in probability of β1
the size of the bias depends on teh size of pxu
the direction of the bias depends on the sign of pxu
Multiple Regression Model
Yi=B0+B1X1i+B2X2i+…_ui, i=1,…,n
B1…slope of coefficient on X1: by how much Y is expected to change holding X2,…, Xk constant
Standard Error of Regression for Multiple regression
SER= su^
where su^²= SSR/ (n-k-1)
k is the number of regressors excluding the constant
division by n-k-1 is the degrees of freedom adjustment
Adjusted R Sqaured
adjusted R² = 1- (n-1/n-k-1)(SSR/TSS)
adding a regressor → two opposite effects
SSR decreases (better fit)
n-1/ n-k-1 increases (because k increases)
Properties
adjusted R²<=R²
adjusted R² can increase or decrease with k
adjusted R² can be negative
adjusted R²= 1 - su²/ sY²
Least Squares Assumptions for Multiple Regressions
conditional distribution of ui given X1i, X2i, …, Xki has zero mean: E(ui|X1i, X2i,…, Xki) =0
(X1i, X2i, …, Xki, Yi), i=1,…, n are iid
large outliers are unlikely, 0<E(X1i^4) <inf, …, 0 < E (Xki^4< inf and 0 < E(Yi^4) < inf
there is no perfect multicollinearity
Perfect Multicollinearity
Regressors are said to be perfectly multicollinear if one of the regressors is a perfect linear function of other regressors
Dummy Variable Trap
occurs where there are G binary variables, each observation falls into one and only one category, there is an intercept in the regression and all G binary variables are included as regressors
Imperfect Multicollinearity
two or more regressors are highly correlated (there is a linear function of regressors that is highly correlated with another regressor)
different from perfect multicollinearity
under imperfect multicollinearity, one or more coefficients will be imprecisely estimated
px1, x2=corr(X1, X2)
in linear regression with regressors X1 and X2, sigmaB^1²= (1/n)(1/1-p²)(sigma u²/ Sigma x²)
Properties of OLS Estimators
B^0, B^1, …, B^k are random variables
under least squares assumptions 1-4, B^0, B^1, …, B^k are unbiased and consistent
in large samples, B^0, B^1, …, B^k are jointly normally distributed and B^j ~N (Bj, sigmaBj²), j=0,…, k
B^0, B^1, …, B^k are usually correlated
F statistic
hypothesis:
H0: B1=0, B2=0, …, Bk=0
H1: Bj=/0 for at least one j, j=1,…k
under the null, none of the regressors explain Y
also called test for significance of regression
k restrictions (q=k)
F~Fk,inf

Homoskedasticity-only F-statistic
F= (SSRr-SSRur)/q/SSRur/(n-kur-1) = (R²ur-R²r)/q/(1-R²ur)/(n-kur-1)
Confidence Set
a 95% confidence set for two or more coefficients is a set that contains the true population values of these coefficients in 95% of randomly drawn samples
Quadratic Regression Model
TestScorei= B0 + B1 Incomei + B2 Incomei² + ui
model is nonlinear in variables but linear in parameters
we can test for the presence of nonlinearity formally:
H0: B2=0 (regression is linear)
H1: B2=/ 0 (regression is quadratic)
Population Regression Function
E(TestScorei|Incomei)=B0+B1Incomei+B2Incomei²
population coefficients are unknown→ need to be estimated
Effect on Y of a change in a regressor
Expected change in Y associated with a change in X1, holding X2,…, Xk constant
Linear model: ∆ Y= B1∆X1
nonlinear model: ∆Y=f(x1+∆X1,X2,…,Xk)-f(X1,X2,…,Xk)
in nonlinear model, expected change depends on the value of X1, X2, …, Xk (whereas in linear models it does not)
Standard errors of estimated effects
Approach 1:
compute F-statistic for testing that B1+21B2=0
because q=1
F=t²=(B^1+21B2^/SE(B1^+21B2^))²=(∆Y^/SE(∆Y^))²
therefore: SE(∆Y^)=|∆Y|/sqrtF
Approach 2:
transform the regression model in such a way that one of the coefficients in the transformed regression is B1+21B2
denote γ= B1+21B2
then SE(B1^+21B2^)=SE(γ^)
sequential hypothesis testing
Step 1: pick a maximum value of r and estimate the polynomial regression of this r
Step 2: test H0: Br = 0; if this is rejected, X^r belongs to the regression, so use polynomial of degree r
Step 3: if H0: Br=0 is not rejected, eliminate X^r from the regression and estimate a polynomial regression of degree r-1; test whether Br-1=0; if this is rejected, use polynomial of degree r-1
Step 4: continue this procedure until the coefficient on the highest power is statistically significant
Important property of logarithm:
ln(x+∆x)-ln(x)~=∆x/x (when ∆x/x is small)
Logarithmic Regression Models
linear-log model: Yi=B0+B1+lnXi+ui
log-linear model: ln(Yi)= B0+B1Xi+ui
log-log model: ln(Yi) = B0+B1ln(Xi) +ui
all three models linear in parameters→ OLS methods can be used to estimate unknown values of parameters
Linear-Log model
1% change in X is associated with a change in Y of 0.01B1
Log-linear model
one-unit change in X is associated with a 100x B1% change in Y
Log-log model
1% change in X is associated with a B1% change in Y
Comparing Logarithmic Specifications
Adjusted R² can be used to compare log-linear and log log model
adjusted R² can be used to compare linear log and linear model
Adjusted R² cannot be used to compare linear-log and log-log models
Internal Validity
A statistical analysis is internally valid if the statistical inferences about causal effects are valid for the population being studied
estimator of causal effect should be unbiased and consistent
hypothesis tests should have desired significance level confidence intervals should have desired confidence level
Threats:
unbiasedness and consistency of B^ coefficients
omitted variable bias
misspecification of functional form
errors-in variables
sample selection
simultaneous causality
correct estimation of SE (B^)
heteroskedastic errors
errors correlated across observations
External Validity
A statistical analysis is externally valid if its inferences and conclusion can be generalized from the population and settings studied to other populations and settings
Threats:
differences in population
differences in settings
Threat: Omitted Variable Bias
Omitted variable bias arises when omitted variable
determines Y
is correlated with one or more included regressors
OLS estimator is biased and inconsistent
Solutions to omitted variable bias:
omitted variable is observed
including omitted variable reduces possible bias
however if the variable does not belong (its true coefficient is zero), variance of other estimated coefficients increases
variance-bias trade-off
omitted variable is not observed
use instrumental variables (IV) regression (discussed later)
use randomized controlled experiments (discussed later)
use panel data (not discussed in this course)
Threat: Misspecification of Functional Form
Functional form is not specified correctly
e.g. true population function is nonlinear but the estimated regression is linear
OLS estimator biased and inconsistent
a type of omitted variable bias
Threats: Errors-In-Variables
Data on regressors may be recorded with error:
respondents in survey give wrong answers
typos
wrong data downloaded
this is called measurement error
leads to error-in-variables bias
error term is correlated with regressor→ B1 biased and inconsistent
Solutions:
get an accurate measure of X
use instrumental variables (correlated with Xi but uncorrelated with measurement error) - discussed later
develop a model of the measurement error and estimate parameters
requires knowledge of type of measurement error
ad hoc
not discussed here
Threats: Sample Selection
when availability of data is influenced by a selection process that is related to the value of the dependent variable
introduces correlation between error term and regressor → selection bias
Threats Simultaneous Causality
in our models so far, X was causing Y
but the causality can also run the other way round (Y causes X)
simultaneous causality
OLS estimators biased and inconsistent
simultaneous causality bias
Solutions:
use instrumental variables regression - discussed later
use randomized controlled experiments - discussed later
Threats: Heteroskedasticity
solution: use heteroskedasticity-robust standard error formula
Threats: serially correlated
e.g. if a school performs better than average one year, it will probably do so also next year
arises mostly in time series data and panel data
second least squares assumption violated
solution: use heteroskedasticity- and serial correlation- robust formula for standard error
Average Causal Effect
Causal effect can be different for each individual
the effects of a drug can depend on
age
whether smoking
other health conditions
solution: estimate mean causal effect in population
often sufficient
Ideal Randomized Control Experiment
select individuals at random from population
distribution of the potential outcomes and causal effects is from the same distribution as in the population
so the expected value of the causal affect in the sample is the same as the average causal effect in the population
assign individuals randomly to treatment or control group
an individual’s treatment status is independent of their potential outcomes
so the expected value of the outcome for those treated minus the expected value of the outcome for those not treated equals the expected value of the causal effect
Checking for Balance
Characteristics of people in the treatment group similar to those in control group
differences not significant
experiment well designed
Notation
Xi… treatment indicator variable
Xi=1…treatment
Xi=0… no treatment
Yi…observed outcome
Y1i…potential outcome when treatment received
Y0i…potential outcome when no treatment received
Y1i-Y0i…causal effect of treatment
E(Y1i-Y0i)…average causal effect
if observations on Yi and Xi come from an ideal randomized control trial
E(Y1i-Y0i)=E(Yi|Xi=1)-E(Yi|Xi=0)
Differences Estimator
the difference in sample averages for treatment and control groups
can be computed by regressing the outcome variable Yi on binary treatment indicator Xi:
Yi=B0+B1Xi+ui, i=1,…,n
if Xi is randomly assigned then E(ui|Xi)=0 and the OLS estimator of causal effect B1 is unbiased and consistent
Differences Estimator with Additional Regressors
adding regressors→ improved efficiency
including control variables W:
Yi=B0+B1Xi+B2W1i+…+B1-rWri+ui, i=1,…,n
variables W must be such that ui satisfies
E(ui|Xi, Wi) = E(ui|Wi)
conditional mean independence
satisfied if Wi are pretreatment characteristics and Xi is randomly assigned
variables Wi do not have causal interpretation
Threats to Internal Validity: Failure to Randomize
treatment not assigned randomly
based on the characteristics or preferences of subject
nonrandom assignment leads to correlation of Xi and ui → biased estimator of the treatment effect
Threats to Internal Validity: Failure to Follow the Treatment Protocol
people do not always follow the treatment
failure to follow the protocol completely: partial compliance
element of choice in whether the subject receives a treatment
Xi can be correlated with ui even with initial random assignment
bias in OLS estimator
Threats to Internal Validity: Attrition
Subjects dropping out after their assignment
dropping out may be unrelated to the treatment program
e.g. leave to care for a sick relative
this does not cause a bias
but dropping out may be related to the treatment
Threats to Internal Validity: Experimental Effects
merely being in the experiment can change behavior of subjects
hawthorne effect
double blind protocol can mitigate this effect
neither the subject nor the experimenter know whether the subject receives the treatment or not
economics: double blind experiments often infeasible
both experimenter and subject know in which group the subject is
Threats to Internal Validity: Small Sample Sizes
experiments with human subjects are expensive
sample sizes sometimes small
small size does not cause bias
but causal effects are estimated imprecisely
inference can be misleading
Threats to External Validity: Nonrepresentative Sample
population studied and population of interest must be similar
e.g. training program with former prison inmates does not generalise to workers who have never committed a crime
Threats to External Validity: Nonrepresentative Program or Policy
the policy or program of interest must be similar to program studied
example:
program studied: small scale, tightly monitored experiment
program implemented: scaled-up, not the same quality control, less well funded→ not as effective
another difference in programs: duration
Threats to External Validity: General Equilibrium Effects
turning a small, temporary experimental program into a widespread, permanent program might change economic environment
the results cannot be generalized
small program: internally valid, measure causal effect holding constant the market or policy environment
large program: these factors are not held