Chapter 5: Resampling Methods

0.0(0)

Studied by 0 people

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Card Sorting

1/12

There's no tags or description

Looks like no tags are added yet.

Study Analytics

Name	Mastery	Learn	Test	Matching	Spaced

No study sessions yet.

13 Terms

New cards

Resampling Methods

[Resampling] These techniques refit a model to samples derived from the training set to gather more information about the fitted model. For instance, they offer estimations of test-set prediction error, standard deviation, and bias of parameter estimates.

New cards

Training Error vs. Test Error

[Resampling] The test error is the error that results from using a statistical learning method to predict the response on a new observation that was not used in training. The training error can be easily calculated by applying the statistical learning method to the observations used in training. The training error rate often differs significantly from the test error rate and can dramatically underestimate it.

New cards

Validation Set Approach

[Resampling] This approach randomly divides the available data into two parts

New cards

Drawbacks

[Resampling] The validation estimate of the test error can be highly variable, depending on which observations are included in the training set versus the validation set. Also, because only a subset of the observations are used to fit the model, the validation set error may overestimate the test error for the model fit on the entire dataset.

New cards

K-Fold Cross-Validation

[Resampling] This widely used method estimates test error. The data is randomly divided into K equal-sized parts. One part (k) is left out, the model is fit to the remaining K − 1 parts, and predictions are made for the left-out part. This process is repeated for each of the K parts, and the results are combined.

New cards

LOOCV

[Resampling] Setting K = n yields n-fold or leave-one-out cross-validation.

New cards

Cross-Validation for Classification Problems

[Resampling] The data is divided into K roughly equal-sized parts, and a cross-validation statistic is computed based on the error rate.

New cards

Right Approach

Cross-validation needs to be applied to all steps of a procedure. If initial steps, such as feature selection, use the labels of the training data, this must be included in the validation process.

New cards

The Bootstrap

[Resampling] is a tool to quantify the uncertainty associated with an estimator or statistical learning method. It can estimate the standard error of a coefficient or provide a confidence interval for that coefficient. This approach allows a computer to mimic the process of obtaining new datasets by repeatedly sampling observations from the original dataset with replacement to estimate the variability of an estimate without generating additional samples.

New cards

Other uses of the Bootstrap

[Resampling] The bootstrap is primarily used to obtain standard errors of an estimate, and it can also provide approximate confidence intervals for a population parameter.

New cards

Bootstrap and Prediction Error

[Resampling] Using the bootstrap to estimate prediction error can lead to underestimation of the true prediction error due to the overlap between bootstrap samples and the original data. Cross-validation is a simpler and more attractive approach for estimating prediction error.

New cards

Pre-validation

[Resampling] is designed for comparison of adaptively derived predictors to fixed, pre-defined predictors. It involves forming a "pre-validated" version of the adaptive predictor that hasn’t "seen" the response.

New cards

Bootstrap versus Permutation Tests

[Resampling] The bootstrap samples from the estimated population to estimate standard errors and confidence intervals, while permutation methods sample from an estimated null distribution to estimate p-values and False Discovery Rates for hypothesis tests.