1/79
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced |
|---|
No study sessions yet.
*
comment line
use command
load the stata data file
summarize command
summarize X1 X2 and Y
drop command
delete a variable
regress command
makes a regression
independent variable
variable that possibly influences the value of the dependent variable, X
dependent variable
the outcome of interest dependent on the independent variable, Y
slope coefficient
coefficient that reflects how much the dependent variable increases when the independent variable increases by one, B1
constant/intercept
the point at which a regression line crosses the Y-axis, B0
error term
term associated with unmeasured factors in a regression model, E (epsilon)
i
the observation number
population regression equation
Yi = B0 + B1Xi + Ei
endogenous
when changes in the independent variable are related to factors in the error term (outside factors)
exogenous
when changes in the independent variable are NOT related to factors in the error term
why exogenous > endogenous?
you want to focus on exogenous variables instead of endogenous variables to make reasonable inferences
correlation
measures the extent to which two variables are linearly related to each other
positive correlation
high values of one variable are associated with high values of the other, normally upward trends
negative correlation
high values of one variable are associated with low values of the other, normally downward trends
two challenges of econometrics
randomness and endogeneity
randomization
the process where the value of the independent variable is determined by a rando process, like chance, making the IV uncorrelated with everything
randomized controlled trial
experiment where the treatment of interest is randomized; good because they create exogeneity
treatment group
group that receives treatment of interest
control group
group that does not receive treatment of interest
internal validity
when research findings are not biased
external validity
when research findings can be applied outside of the context in which the experiment was conducted
standard deviation
measured how widely dispersed the values of the observation are
replication
research that can be duplicated based on the information provided at the time of publication using replication files
how to open a Do file
click on window, do file editor, new do-file editor, safe as “assignment.do”
how to load Stata data files
file, open, select file, save command in syntax file "C
tabulate command
produces a frequency table when prompted, must include variable name after
list command
lists out observations, must include variable name after
equal to command
==
not equal to command
!=
if command
limits the data used in analyses
scatter x y command
plotting scatterplot of two variables
fitted value
value of Y predicted by the estimated equation, Yi hat = B0 hat + B1 hat Xi
what do hats mean in equations
hats are estimates
regression line
fitted line from regression
residual
distance between fitted value and actual observed value, E hat = Yi – B0 hat – B1 hat Xi
OLS estimation strategy
minimizing the sum of the squared distances of each data point from the regression line, using calculus to find the solution
OLS formula
∑(i=1)^N▒〖ϵhat= 〗 ∑(i=1)^N▒〖(Yi-B0 hat=B1hat Xi)^2〗
Sampling randomness
variations in estimates seen in a subset of an entire population
Modeled randomness
variation that exits even when observing an entire population
central limit theorem
the average or any random variable follows a normal distribution; histogram should look like a normal distribution
variance
a measure of how much a random variable varied
standard error
the square root of variance
variance of the regression
the variance of the regression measures how well the model explains variation in the dependent variable
homoscedastic
when a random variable has the same variance for all observations
heteroscedastic
when some observations are on average closer to the predicted value than others
goodness of fit
how well a model fits the data, r^2
standard error of regression/goodness of fit
a measure of goodness of fit as the square root of the variance of the regression; σ hat
outliers
observations that are extremely different from the rest of the sample
sample size and outliers
when sample sizes are small, a single outlier can exert considerable influence on OLS coefficient estimates
hypothesis testing
process assessing whether the observed data is or is not consistent with a claim of interest
null hypothesis
a hypothesis of no effect; H0 = B1 = 0
alternative hypothesis
the outcome that is accepted if null hypothesis is rejected
reject null hypothesis
there is sufficient evidence that says that the null hypothesis is false, and the alternative hypothesis is true
fail to reject null hypothesis
there is not enough evidence to prove the null hypothesis is false
type 1 error
rejecting a null hypothesis that is really true
type 2 error
failing to reject a null hypothesis that is actually false
significance level
probability of committing a type 1 error; common is α=0.05
trade off between type 1 and 2 errors
lowering the significance level decreases the probability of making a type 1 error while increasing the probability of making a type 2 error
steps to do a hypothesis test
choose one-sided or two-sided alternative hypothesis; set a significance level α; find a critical value based on the t distribution; use OLS to estimate parameters
confidence interval
range of true values that are most consistent with the observed coefficient estimate
stata commands for hypothesis test
display invttail (number of observations – number of parameters, significance level(/2 if two-tailed)) 2. display invnormal (number of observations – number of parameters, significance level(/2 if two-tailed)) 3. Display2*ttail(degrees of freedom, observed t statistics) 4. set obs (possible values) 5. graph twoway
multivariate OLS
OLS with multiple independent variables
multivariate OLS and endogeneity
fights endogeneity by pulling variables from the error term into the estimated equation
ceteris paribus
all else being equal
auxiliary regression
a regression that is not directly the one of interest but yields information helpful in analyzing the equation we really care about
omitted variable bias
bias that results from leaving out a variable that affects the dependent variable and is correlated with the independent variable
measurement error
when a variable is measured inaccurately; has greater effect on the independent variable
attenuation bias
consequence of the omission of the measurement error from the estimated model; grows larger with larger amount of measurement error
multicollinearity
when there are strong linear relationships between independent variables
factors that influence the variance of multivariate estimates
model fit, sample size, variation, multicollinearity
standardized coefficients
the coefficient of an independent variable that has been standardized
standardizing variable formula
〖Variable〗^Standardized=(Variable- (Variable) ̅)/(sd(Variable))
difference of means test
comparing the mean of Y for one group against the mean of Y for another group
dummy variable
either 0 or 1
categorical variables
have two or more categories with no intrinsic ordering
ordinal variables
express rank but not necessarily relative size