1/81
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
selection effect/bias
selection of sample that is not randomized, so is it biased
treatment groups
those who get some treatment of interest in an experiment
control group
those who do not get the treatment of interest
observational study
research where you don't get to randomize who gets the treatment. just observing some relationship in the world
experimental study/randomized control
research designs in which you can randomize who gets the treatment
quasi-experimental research
research in which you have observational data, but you find ways to ensure that the treatment was effectively randomly distributed
internal validity
is the experiment well designed? free from confounders or bias?
external validity
is the finding applicable to other populations, situations, or cases? does it apply outside the context of the research?
Yi
dependent variable, the thing we want to predict
Xi
independent variable, the thing that predicts the DV
Ei
error term, the part the DV or IV doesn't explain, everything NOT in our model. not directly observable
endogeneity
the IV is correlated with the error term
confounder
another unmeasured variable that affects the IV and then also affects the DV
randomness
noise in the data, could go away with larger sample sizes
randomization
randomizing to create treatment and control groups, creates exogeneity
distribution of outcome
the idea that you could observe different outcomes with different probabilities, even when you only observe one outcome
population
overall collection of individuals, beyond just the sample
sample
collection of individuals on which statistical analyses are performed, and from which general trends for the population are inferred
individual
object or unit, single data point contributing to the sample
random variable
the thing being measured in any random experiment
expectation
the bets guess about what number will be drawn from the distribution, loosely thought of as "average" of distribution
variance
how far the numbers you draw tend to be from that best guess
central limit theorem (CLT)
: the distribution of the mean
tends toward a normal distribution.
regression model
Yi = Bo + B1Xi + Ei
B1
slope coefficient, relationship between X and Y
B0
constant, value of Y when X is zero (intercept)
covariance
measures how much two random variables vary together
positive correlation
when X is higher, expect Y is higher
negative correlation
when X is higher, we expect Y is lower
not associated
when X is higher, doesn't tell us anything about Y
correlation
measures the extent to which two variables are linearly related to each other,
bivariate regression
technique to estimate a model with two variables (DV and IV), allows us to quantify the degree to which X and Y move together
omitted variable bias
specific form of endogeneity, often why estimates change if the model changes, X is correlated with something that influences Y (error term is correlated with Y)
unbiased estimate
on average, our estimate is equal to the true parameter
biased estimate
our coefficient is systematically wrong, either too high or too low than the true parameter
robust
whether the model changes or not when the specification changes
homoskedasticity
when random variable X has same variance for all observations of X. not a problem
heteroskedasticity
when the random variable X DOES NOT have the same variance for all observations of X. fixable problem. this means non-constant variance in our errors
outlier
observation that is extremely different from the rest of the observations in the sample. this drags the estimate of the mean/slope towards it.
null distribution
describing how weird our result is if there really is no difference
p-value
probability of observing a difference in mean or a coefficient as big as what we observed, if the null hypothesis were true
critical value
a point on a distribution that defines the boundary between rejecting and not rejecting the null hypothesis in a hypothesis test
significantly different
based on critical value
null hypothesis
no effect, no difference in means, no relationship between X and Y, B1 = 0
alternative hypothesis
likely an effect, there is likely a difference in means and relationship between X and Y, B1 DOES NOT = 0
type 1 error
false positive, when we reject a null hypothesis that is actually true. saying there is a relationship when there isn't.
type 2 error
false negative, when we fail to accept the alternative hypothesis, saying there isn't a relationship when there is. (small sample size, study has low power)
substantive significance
the relationship needs to be large enough to matter
power limitations
larger sample = larger power and vice versa, higher variance = harder to detect relationships. big variance needs larger sample size
statistically significant
reject the null and accept the alternative, based on critical value cutoff, likely a difference in means and relationship between X and Y, B1 = 0
irrelevant variable
adding a variable to regression that doesn't actually explain Y, will not cause bias but will eat up degrees of freedom
model specification
choosing what variables to include in the model
binary/dummy variables
useful, used in experiments to identify the treated (1) and control (0) units, make difference in means across two groups easy to calculate
discrete data
comes in 'bins' or groups. ex. on a scale of 1 to 5, how much do you like this class?
continuous data
can take any value in a sequence. ex. annual income, votes for each candidate
categorical data
descriptive, describes how the world is, comes from qualitiative research, can be ordinal (ordered: low, medium, high) or nominal (cannot be ordered: majors)
cross-sectional data
sample of a population in a given period of time
repeated cross-sectional data
taking different samples of a population over time
panel (time-series) data
seeing the same population repeatedly over time
fixed effects model
control for unit specific effects, time period effects. benefits gives more leverage in identifying causal relationships, look at a single unit over time instead of comparing
case study
intensive study of a single spatial and temproal phenomenon
cross-case study
study of several cases to compare a phenomenon across space and time
process tracing
attempts to identify the intervening causal process between an IV or variables and the outcome of the DV, qualitative method, how X becomes Y
elite interviews
asking people who were involved in the political event or issue about what happened when and why, usually semi-structured and open-ended questions
focus groups
asking a group of people what they think about a given issue
A: attrition
1 problem in experiments: units drop out of experiment, never observe their outcome variable
B: balance
2 problem in experiments: do the covariates (control variables) have the same mean across the two groups
C: compliance
3 problem in experiments: whether units actually receive the treatment they were assigned to
natural experiments
when a researcher identifies a situation in which values of the independent variable have been determined by a random process
goodness of fit
how much of Y does X explain, related to r-squared
residual
part of Y that X doesn't explain, tells us what isn't there
qualitative research
what mechanisms/processes lead to this outcome vs. another, outliers
quantitative research
effects, population-level relationship, what is the relationship on average between two variables? (ex. does poverty predict)
difference of means test
comparing the mean of Y for one group against the mean of Y for another
categorical variable
has two or more categories but no ordering
ordinal variable
expresses rank but not relative size
reference category
coefficients on all the included dummy variables indicate how much higher or lower the DV is relative to this
blocking
picking treatment and control groups in advance so that they are equal in covariates
intention to treat analysis
addresses potential endogeneity that arises from non-compliance
balance table
a method to compare the characteristics of a treatment group and a control group, typically used in experimental or matching studies
P-value
If the p-value is less than the significance level (typically 0.05), the evidence against the null hypothesis is considered strong, and the null hypothesis is often rejected in favor of the alternative hypothesis.
T-Test
analyzes the significant differences between the means of two sample groups (the mean of a reference with a known reference mean)