1/22
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced |
|---|
No study sessions yet.
t-statistic represents
how many standard errors the estimate is from zero
omitted variable bias
arises when a relevant variable is left out
a correctly specified model should reflect a plausible set of assumptions and should be complete
very hard to detect OVB
A model with a higher R² means
the model explains more of the variance in the dependent variable
data mining
selectively reporting significant results
scouring through datasets for (partial) correlations from which to develop a thesis
how can outliers and violations of normality affect statistical inference
outliers and non-normal data can distort estimates and invalidate hypothesis tests. Check residual plots, use transformations, or robust methods
consider a regression output where R² = 0.85 and TSS = 1000, what is the ESS?
ESS = R² x TSS = 0.85 × 1000 = 850
three main types of distributional tests
seeing where a value fits within a distribution (determines how unusual or typical a single value is compared to the rest of the data)
making a certainty estimate from a sample to the general population (estimating population parameters from a sample, larger the sample the better our estimates)
comparing two sample means (involves testing whether two groups are significantly different from each other)
z-score
number of standard deviations away from the mean
confidence intervals
error bars, for example at the 95% confidence interval, it is the range of values that has a 95% probability of containing the measure you are interested in
standard error formula
standard deviation of the sample mean
what question do we answer with a t test
is the difference between the two statistically significant?
what are all the explanations of correlation between A and B?
A causes B (possibly through C)
B causes A (possibly through C)
C causes both A and B
C causes B, and A is only spuriously correlated
What other misspecifications could there be to a set of data?
omitted variable bias
included variable bias
normality and outliers
data mining
interaction effects
casual endogeneity
ecological fallacy
included variable bias
the addition of multiple (usually irrelevant) variables to obtain the desired result
normality and outliers
OLS can only analyze variables that follow a (close to) normal distribution - data can still be transformed
make sure you eliminate outliers
How is the margin of error affected by sample size
The margin of error decreases as the sample size increases.
inferential statistics
using sample data to make inferences about a larger populationd
distributional tests
statistical tools that help determine how data points relate to theoretical probability distributions
OLS (ordinary least squares) multivariate analysis
statistical technique used to estimate the relationship between one dependent variable and two or more independent variables, extension of a simple linear regression
R² formula
1 - (residual sum of squares / total sum of squares)
TSS (total sum of squares)
measures the total variation in the dependent variable
quantifies how spread out the observed data are before considering any model
ESS (expected sum of squares)
a quantity used in describing how well a model, often a regression model, represents the data being modelled
RSS (residual sum of squares)
measures the unexplained variation
difference between point and line of best fit