MAS-I ISLR Conceptual Questions

0.0(0)

Studied by 0 people

Call Kai

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Knowt Play

Card Sorting

1/156

There's no tags or description

Looks like no tags are added yet.

Last updated 10:59 PM on 6/19/26

Name	Mastery	Learn	Test	Matching	Spaced	Call with Kai

No analytics yet

Send a link to your students to track their progress

157 Terms

New cards

Indicate whether we would generally expect the performance of a flexible statistical learning method to be better or worse than an inflexible method

The sample size n is extremely large, and the number of predictors p is small

better - a more flexible approach will fit the data closer and with the large sample size a better fit than an inflexible approach would be obtained

New cards

Indicate whether we would generally expect the performance of a flexible statistical learning method to be better or worse than an inflexible method

The number of predictors p is extremely large, and the number of observations n is small

worse - a flexible method would overfit the small number of observations

New cards

Indicate whether we would generally expect the performance of a flexible statistical learning method to be better or worse than an inflexible method

The relationship between the predictors and response is highly non-linear

better - with more degrees of freedom, a flexible model would obtain a better fit

New cards

Indicate whether we would generally expect the performance of a flexible statistical learning method to be better or worse than an inflexible method

The variance of the error terms (siqma^2) is extremely high

worse - flexible methods fit to the noise in the error terms and increase variance

New cards

Problems with a quantitative response

Regression problems

New cards

Problems with a qualitative response

Classification problems

logistic regression is typically used with a qualitative (two-class or binary) response; can be used as classification method, but also regression

New cards

As model flexibility increases, what happens to the training MSE and test MSE?

Training MSE will decrease

Test MSE may not

New cards

Small training MSE & large test MSE means

we are overfitting the data; may be picking up some patterns that are just caused by random chance

regardless of overfitting, we almost always expect the training MSE to be smaller than the test MSE

New cards

reducible error

we can potentially improve the accuracy of the estimate of f by using the most appropriate statistical learning technique to estimate f

[f(X) - fhat(X)]^2

New cards

irreducible error

no matter how well we estimate f, we cannote reduce the error introduced by ɛ

var(ɛ)

New cards

parametric

New cards

non-parametric

no assumption about the form of f is made; estimate f that gets as close to the data points as possible without being too rough or wiggly

requires a very large number of observations to accurately estimate f

New cards

parametric potential disadvantage

the model we choose will usually not match the true unknown form of f

New cards

non-parametric disadvantage

since the do not reduce the problem of estimating f to a small number of parameters, a very large number of observations is required in order to obtain an accurate estimate for f

New cards

non-parametric advantage

potential to accurately fit a wider range of possible shapes for f

New cards

From high to low interpretability & low to high flexibility

subset selection

lasso

least squares

generalized additive models

linear regression comes after lasso, but before GAM

New cards

why would we ever choose to use a more restrictive method instead of a very flexible approach?

if we are mainly interested in inference, then restrictive models are much more interpretable

(linear model)

New cards

inference - what kind of model do we want?

inflexible because easier to interpret

want to better understand the relationship between the response and the predictors

New cards

prediction - what kind of model do we want?

more flexible, but not always most flexible; interpretability not of interest

aim is to accurately predict the response for future observations

New cards

supervised learning

for each observation of the predictor measurement(s), there is an associated response measurement

New cards

unsupervised learning

no associated response

cluster analysis

New cards

The expected test MSE

the sum of the variance, squared bias, and variance of the error terms

can never lie below var(ɛ)

New cards

More flexible methods have higher or lower varaince? higher or lower bias?

More flexible methods have higher variance and lower bias

As flexibility increases, bias tends to initially decrease faster than the variance increases (test MSE decreases), but then bias evens out, and variance significantly increases (test MSE increases)

New cards

Bias-variance trade-off

the relationship between bias, variance, and test set MSE

challenge lies in finding a method for which both the variance and the squared bias are low

New cards

advantages for a flexible approach

obtaining a better fit for non-linear models, decreasing bias

New cards

disadvantages for a flexible approach

requires estimating a greater number of parameters, follow the noise too closely (overfit), increasing variance

New cards

Residual Sum of Squares

the sum of each residual squared for all the observations in the sample. This reflects the amount of variation in the dependent variable not explained by the regression equation

<p>the sum of each residual squared for all the observations in the sample. This reflects the amount of variation in the dependent variable not explained by the regression equation</p>

New cards

least squares coefficient estimates for simple linear regression

Beta hat sub one

Beta hat sub zero

they characterize the least squares line

New cards

True or False: since the coefficient for an interaction term is very small, there is very little evidence of an interaction effect

False. We must examine the p-value of the regression coefficient to determine if the interaction term is statistically significant or not

New cards

polynomial regression vs. linear regression; underlying true relationship is linear; would you expect one training RSS to be lower/higher than the other?

I would expect the polynomial regression to have a lower training RSS than the linear regression because it could make a tighter fit against data that matched with a wider irreducible error (var(ɛ))

New cards

polynomial regression vs. linear regression; underlying true relationship is linear; would you expect one test RSS to be lower/higher than the other?

I would expect the polynomial regression to have a higher test RSS as the overfit from training would have more error than the linear regression

New cards

polynomial regression vs. linear regression; underlying true relationship is not linear, it's unknown; would you expect one training RSS to be lower/higher than the other?

polynomial regression still have lower train RSS than the linear fit because of higher flexibility, no matter what the underlying true relationship is; the more flexible model will closer follow points and reduce train RSS

New cards

polynomial regression vs. linear regression; underlying true relationship is not linear, it's unknown; would you expect one test RSS to be lower/higher than the other?

Not enough info to tell; whichever model the underlying relationship is closer to could have the lower RSS

New cards

logistic regression

p(X) = e^betas / ( 1 + e^betas)

log(p(X) / (1-p(X)) = betas

New cards

odds --> probability

odds = prob / ( 1 - prob)

New cards

What is the probability that the first bootstrap observation is not the jth observation from the original sample?

the probbability that the jth observation is selected as the first bootstrap observation is 1/n

therefore the probability that the jth observation is not the first bootstrap observation is 1-(1/n)

New cards

what is the probability that the 2nd bootstrap observation is not the jth observation from the original sample

1-(1/n)

bootstrap sampling is sampling with replacement

New cards

probability that the jth observation is not in the bootstrap sample

(1-1/n)^n

New cards

probability the the jth observation is in the bootstrap sample

1-(1-1/n)^n

New cards

probability the the jth observation is in the bootstrap sample tends to...

(limit)

1-1/e

New cards

Explain how kk-fold cross-validation is implemented

The data is segmented into kk distinct, (usually) equal-sized 'folds'. A model is trained on k−1k−1 of the folds and tested on the remaining fold. This process is repeated kk times, such that each of the kk folds acts as the test data once. The test performance is recorded and averaged, giving the 'cross-validation' or 'out-of-sample' metric.

New cards

What are the advantages of k-fold cross-validation relative to the validation set approach?

k-fold CV has much lower variability, and all the data is used to both train and test model performance

the validation set approach can over-estimate the test error

New cards

What are the disadvantages of k-fold cross-validation relative to the validation set approach?

The validation set approach is conceptually easier to grasp and has a computational advantage - a model is trained once and tested once (less time consuming)

New cards

What are the advantages of k-fold cross-validation relative to LOOCV?

k-fold CV is less computationally demanding

bias-variance tradeoff (LOOCV has lower bias but higher variance)

New cards

What are the disadvantages of k-fold cross-validation relative to LOOCV?

k-fold cv has an element of randomness

loocv can require less computational power in some cases (least squares regression)

New cards

Suppose that we use some statistical learning method to make a prediction for the response YY for a particular value of the predictor XX. Carefully describe how we might estimate the standard deviation of our prediction.

The bootstrap approach would be appropriate here. If the original data contains n observations, we create B bootstrap samples from the data (sampling n observations with replacement, repeated B times). On each of these datasets, we would then train a supervised learning method and use it to make our estimate for the 'particular value of X'. Once we have these B estimates, we can calculate the standard deviation of them. Doing so provides the bootstrap estimate for the standard error of our estimate.

New cards