1/60
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
selection effect/bias
selection of sample that is not randomized, so is it biased
treatment groups
those who get some treatment of interest in an experiment
control group
those who do not get the treatment of interest
observational study
research where you don't get to randomize who gets the treatment. just observing some relationship in the world
experimental study/randomized control
research designs in which you can randomize who gets the treatment
quasi-experimental research
research in which you have observational data, but you find ways to ensure that the treatment was effectively randomly distributed
internal validity
is the experiment well designed? free from confounders or bias?
external validity
is the finding applicable to other populations, situations, or cases? does it apply outside the context of the research?
Yi
dependent variable, the thing we want to predict
Xi
independent variable, the thing that predicts the DV
Ei
error term, the part the DV or IV doesn't explain, everything NOT in our model. not directly observable
endogeneity
the IV is correlated with the error term
confounder
another unmeasured variable that affects the IV and then also affects the DV
randomness
noise in the data, could go away with larger sample sizes
randomization
randomizing to create treatment and control groups, creates exogeneity
distribution of outcome
the idea that you could observe different outcomes with different probabilities, even when you only observe one outcome
population
overall collection of individuals, beyond just the sample
sample
collection of individuals on which statistical analyses are performed, and from which general trends for the population are inferred
individual
object or unit, single data point contributing to the sample
random variable
the thing being measured in any random experiment
expectation
the bets guess about what number will be drawn from the distribution, loosely thought of as "average" of distribution
variance
how far the numbers you draw tend to be from that best guess
central limit theorem (CLT)
: the distribution of the mean
tends toward a normal distribution.
regression model
Yi = Bo + B1Xi + Ei
B1
slope coefficient, relationship between X and Y
B0
constant, value of Y when X is zero (intercept)
covariance
measures how much two random variables vary together
positive correlation
when X is higher, expect Y is higher
negative correlation
when X is higher, we expect Y is lower
not associated
when X is higher, doesn't tell us anything about Y
correlation
measures the extent to which two variables are linearly related to each other,
bivariate regression
technique to estimate a model with two variables (DV and IV), allows us to quantify the degree to which X and Y move together
omitted variable bias
specific form of endogeneity, often why estimates change if the model changes, X is correlated with something that influences Y (error term is correlated with Y)
unbiased estimate
on average, our estimate is equal to the true parameter
biased estimate
our coefficient is systematically wrong, either too high or too low than the true parameter
robust
whether the model changes or not when the specification changes
outlier
observation that is extremely different from the rest of the observations in the sample. this drags the estimate of the mean/slope towards it.
null distribution
describing how weird our result is if there really is no difference
p-value
probability of observing a difference in mean or a coefficient as big as what we observed, if the null hypothesis were true
critical value
how different two distributions need to be before you conclude they aren't from the same distribution
significantly different
based on critical value
null hypothesis
no effect, no difference in means, no relationship between X and Y, B1 = 0
alternative hypothesis
likely an effect, there is likely a difference in means and relationship between X and Y, B1 DOES NOT = 0
type 1 error
false positive, when we reject a null hypothesis that is actually true. saying there is a relationship when there isn't.
type 2 error
false negative, when we fail to accept the alternative hypothesis, saying there isn't a relationship when there is. (small sample size, study has low power)
substantive significance
the relationship needs to be large enough to matter
statistically significant
reject the null and accept the alternative, based on critical value cutoff, likely a difference in means and relationship between X and Y, B1 = 0
discrete data
comes in 'bins' or groups. ex. on a scale of 1 to 5, how much do you like this class?
continuous data
can take any value in a sequence. ex. annual income, votes for each candidate
categorical data
descriptive, describes how the world is, comes from qualitiative research, can be ordinal (ordered: low, medium, high) or nominal (cannot be ordered: majors)
case study
intensive study of a single spatial and temproal phenomenon
elite interviews
asking people who were involved in the political event or issue about what happened when and why, usually semi-structured and open-ended questions
focus groups
asking a group of people what they think about a given issue
natural experiments
when a researcher identifies a situation in which values of the independent variable have been determined by a random process
goodness of fit
how much of Y does X explain, related to r-squared
residual
part of Y that X doesn't explain, tells us what isn't there
qualitative research
what mechanisms/processes lead to this outcome vs. another, outliers
quantitative research
effects, population-level relationship, what is the relationship on average between two variables? (ex. does poverty predict
difference of means test
comparing the mean of Y for one group against the mean of Y for another
categorical variable
has two or more categories but no ordering