Chapter 5: Resampling Methods

0.0(0)
studied byStudied by 0 people
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
Card Sorting

1/12

encourage image

There's no tags or description

Looks like no tags are added yet.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

13 Terms

1
New cards
Resampling Methods
[Resampling] These techniques refit a model to samples derived from the training set to gather more information about the fitted model. For instance, they offer estimations of test-set prediction error, standard deviation, and bias of parameter estimates.
2
New cards
Training Error vs. Test Error
[Resampling] The test error is the error that results from using a statistical learning method to predict the response on a new observation that was not used in training. The training error can be easily calculated by applying the statistical learning method to the observations used in training. The training error rate often differs significantly from the test error rate and can dramatically underestimate it.
3
New cards
Validation Set Approach
[Resampling] This approach randomly divides the available data into two parts
4
New cards
Drawbacks
[Resampling] The validation estimate of the test error can be highly variable, depending on which observations are included in the training set versus the validation set. Also, because only a subset of the observations are used to fit the model, the validation set error may overestimate the test error for the model fit on the entire dataset.
5
New cards
K-Fold Cross-Validation
[Resampling] This widely used method estimates test error. The data is randomly divided into K equal-sized parts. One part (k) is left out, the model is fit to the remaining K − 1 parts, and predictions are made for the left-out part. This process is repeated for each of the K parts, and the results are combined.
6
New cards
LOOCV
[Resampling] Setting K = n yields n-fold or leave-one-out cross-validation.
7
New cards
Cross-Validation for Classification Problems
[Resampling] The data is divided into K roughly equal-sized parts, and a cross-validation statistic is computed based on the error rate.
8
New cards

Right Approach

Cross-validation needs to be applied to all steps of a procedure. If initial steps, such as feature selection, use the labels of the training data, this must be included in the validation process.

9
New cards

The Bootstrap

[Resampling] is a tool to quantify the uncertainty associated with an estimator or statistical learning method. It can estimate the standard error of a coefficient or provide a confidence interval for that coefficient. This approach allows a computer to mimic the process of obtaining new datasets by repeatedly sampling observations from the original dataset with replacement to estimate the variability of an estimate without generating additional samples.

10
New cards
Other uses of the Bootstrap
[Resampling] The bootstrap is primarily used to obtain standard errors of an estimate, and it can also provide approximate confidence intervals for a population parameter.
11
New cards
Bootstrap and Prediction Error
[Resampling] Using the bootstrap to estimate prediction error can lead to underestimation of the true prediction error due to the overlap between bootstrap samples and the original data. Cross-validation is a simpler and more attractive approach for estimating prediction error.
12
New cards

Pre-validation

[Resampling] is designed for comparison of adaptively derived predictors to fixed, pre-defined predictors. It involves forming a "pre-validated" version of the adaptive predictor that hasn’t "seen" the response.

13
New cards
Bootstrap versus Permutation Tests
[Resampling] The bootstrap samples from the estimated population to estimate standard errors and confidence intervals, while permutation methods sample from an estimated null distribution to estimate p-values and False Discovery Rates for hypothesis tests.