1/12
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced | Call with Kai |
|---|
No analytics yet
Send a link to your students to track their progress
invalid post-selection inference
if we conduct classical inference with the same data that was used to select the statistical model
data analysis in the textbook
Step 1: select the statistical model
Step 2: the data set
Step 3: model fitting (fit the model via OLS)
Step 4: hypothesis test
type I error rate
alpha
probability of falsely rejecting H0
data analysis in practice
Step 1: model fitting (fit models across all possible subsets of independent variables via OLS)
Step 2: perform model selection (select the model that minimizes AIC)
Step 3: hypothesis test (for the regression coefficients in the selected model
central limit theorem
ensures that the OLS estimator of θ1 is approximately normally distributed around the true value θ1, provided the sample size is sufficiently large
consequences of invalid post-selection inference
- The parameter distribution after model selection is a mixture of multiple distribution conditional on selecting a specific model, weighted by the probability of selecting this model
- Statistical inference after model selection does not take into account the ‘true’ uncertainty when estimating θ1: the standard error rate of the OLS estimator of θ1 is consistently underestimated
- Type I rate inflation does not vanish asymptotically (N to infinity)
solutions for invalid post-selection inference
data splitting
simultaneous inference
conditional selective inference
data splitting
easy solution
Dataset is divided into training and test sets
data splitting: limitations
- Loss in efficiency in model selection due to small sample size in the training set (ex favor simple models)
- Loss of statistical power due to small sample size in the testing set
simultaneous inference
- It considers all models explored during selection are relevant for inference, aiming for overall error control (ex to control the probability of making any error)
- Constructs simultaneous (1 - a) confidence intervals based on least-squares estimated for the parameters of all linear regression models that were ever considered
simultaneous inference: problem
can lead to very wide CI’s
conditional selective inference
- It focuses on a selected model and conditions the inference on the selected event (the selection of which hypotheses to test)
- For example the confidence interval of θ1 is constructed if only X1 is selected
conditional selective inference: problem
Depends on the model selecting tools (such as AIC) that are used