lesson 3: invalid post-selection inference - why should we care?

0.0(0)
Studied by 0 people
call kaiCall Kai
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
GameKnowt Play
Card Sorting

1/12

encourage image

There's no tags or description

Looks like no tags are added yet.

Last updated 2:38 PM on 5/28/26
Name
Mastery
Learn
Test
Matching
Spaced
Call with Kai

No analytics yet

Send a link to your students to track their progress

13 Terms

1
New cards

invalid post-selection inference

if we conduct classical inference with the same data that was used to select the statistical model

2
New cards

data analysis in the textbook

Step 1: select the statistical model

Step 2: the data set

Step 3: model fitting (fit the model via OLS)

Step 4: hypothesis test

3
New cards

type I error rate

alpha

probability of falsely rejecting H0

4
New cards

data analysis in practice

Step 1: model fitting (fit models across all possible subsets of independent variables via OLS)

Step 2: perform model selection (select the model that minimizes AIC)

Step 3: hypothesis test (for the regression coefficients in the selected model

5
New cards

central limit theorem

ensures that the OLS estimator of θ1 is approximately normally distributed around the true value θ1, provided the sample size is sufficiently large

6
New cards

consequences of invalid post-selection inference

-        The parameter distribution after model selection is a mixture of multiple distribution conditional on selecting a specific model, weighted by the probability of selecting this model

-        Statistical inference after model selection does not take into account the ‘true’ uncertainty when estimating θ1: the standard error rate of the OLS estimator of θ1 is consistently underestimated

-        Type I rate inflation does not vanish asymptotically (N to infinity)

7
New cards

solutions for invalid post-selection inference

  • data splitting

  • simultaneous inference

  • conditional selective inference

8
New cards

data splitting

easy solution

Dataset is divided into training and test sets

9
New cards

data splitting: limitations

-        Loss in efficiency in model selection due to small sample size in the training set (ex favor simple models)

-        Loss of statistical power due to small sample size in the testing set

10
New cards

simultaneous inference

-        It considers all models explored during selection are relevant for inference, aiming for overall error control (ex to control the probability of making any error)

-        Constructs simultaneous (1 - a) confidence intervals based on least-squares estimated for the parameters of all linear regression models that were ever considered

11
New cards

simultaneous inference: problem

can lead to very wide CI’s

12
New cards

conditional selective inference

-        It focuses on a selected model and conditions the inference on the selected event (the selection of which hypotheses to test)

-        For example the confidence interval of θ1 is constructed if only X1 is selected

13
New cards

conditional selective inference: problem

Depends on the model selecting tools (such as AIC) that are used