lesson 3: invalid post-selection inference - why should we care?

0.0(0)

Studied by 0 people

Call Kai

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Knowt Play

Card Sorting

1/12

There's no tags or description

Looks like no tags are added yet.

Last updated 2:38 PM on 5/28/26

Name	Mastery	Learn	Test	Matching	Spaced	Call with Kai

No analytics yet

Send a link to your students to track their progress

13 Terms

New cards

invalid post-selection inference

if we conduct classical inference with the same data that was used to select the statistical model

New cards

data analysis in the textbook

Step 1: select the statistical model

Step 2: the data set

Step 3: model fitting (fit the model via OLS)

Step 4: hypothesis test

New cards

type I error rate

alpha

probability of falsely rejecting H0

New cards

data analysis in practice

Step 1: model fitting (fit models across all possible subsets of independent variables via OLS)

Step 2: perform model selection (select the model that minimizes AIC)

Step 3: hypothesis test (for the regression coefficients in the selected model

New cards

central limit theorem

ensures that the OLS estimator of θ₁is approximately normally distributed around the true value θ_1,provided the sample size is sufficiently large

New cards

consequences of invalid post-selection inference

- The parameter distribution after model selection is a mixture of multiple distribution conditional on selecting a specific model, weighted by the probability of selecting this model

- Statistical inference after model selection does not take into account the ‘true’ uncertainty when estimating θ₁: the standard error rate of the OLS estimator of θ₁is consistently underestimated

- Type I rate inflation does not vanish asymptotically (N to infinity)

New cards

solutions for invalid post-selection inference

data splitting
simultaneous inference
conditional selective inference

New cards

data splitting

easy solution

Dataset is divided into training and test sets

New cards

data splitting: limitations

- Loss in efficiency in model selection due to small sample size in the training set (ex favor simple models)

- Loss of statistical power due to small sample size in the testing set

New cards

simultaneous inference

- It considers all models explored during selection are relevant for inference, aiming for overall error control (ex to control the probability of making any error)

- Constructs simultaneous (1 - a) confidence intervals based on least-squares estimated for the parameters of all linear regression models that were ever considered

New cards

simultaneous inference: problem

can lead to very wide CI’s

New cards

conditional selective inference

- It focuses on a selected model and conditions the inference on the selected event (the selection of which hypotheses to test)

- For example the confidence interval of θ₁ is constructed if only X1 is selected

New cards

conditional selective inference: problem

Depends on the model selecting tools (such as AIC) that are used