Chapter 2: Statistical Learning

0.0(0)
studied byStudied by 0 people
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
Card Sorting

1/8

encourage image

There's no tags or description

Looks like no tags are added yet.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

9 Terms

1
New cards

Statistical Learning

Involves predicting a response variable (Y) based on input features (X). The model can be represented as Y = f(X) + ε, where ε represents measurement errors. A good f(X) allows for predictions, identifies important variables, and helps understand how each component of X affects Y.

2
New cards
Regression Function
The ideal predictor of Y is the regression function f(x) = E(Y |X = x), which minimizes the mean-squared prediction error. Even with a known f(x), prediction errors are still possible due to irreducible error ε = Y − f(x). To estimate f, local averaging can be used
3
New cards
Curse of Dimensionality
Nearest neighbor methods can perform poorly when p is large because nearest neighbors tend to be far away in high dimensions. To reduce variance, a reasonable fraction of N values is needed for averaging, but in high dimensions, a 10% neighborhood may no longer be local.
4
New cards
Parametric and Structured Models
A linear model, fL(X) = β0 + β1X1 + β2X2 + . . . βpXp, is a simple parametric model specified by p+1 parameters. Although almost never correct, linear models often provide interpretable approximations. More flexible models, such as quadratic models or thin-plate splines, can also be used.
5
New cards
Trade-offs
There are trade-offs between prediction accuracy and interpretability, good fit and over/under-fitting, and parsimony and black-box models.
6
New cards
Assessing Model Accuracy
Model accuracy is assessed using training data (MSETr) and fresh test data (MSETe). MSETr may be biased towards overfit models.
7
New cards
Bias-Variance Trade-off
The expected test error can be decomposed into variance, squared bias, and the variance of the error term. As flexibility increases, variance typically increases, and bias decreases.
8
New cards
Classification Problems
In classification, the response variable Y is qualitative. The goal is to build a classifier C(X) that assigns a class label to a future observation X, assess the uncertainty in each classification, and understand the roles of different predictors. The Bayes optimal classifier assigns an observation to the most probable class
9
New cards
Classification Details
The performance of a classifier Ĉ(x) is measured using the misclassification error rate. Nearest-neighbor averaging can be used, but its impact lessens as dimension grows. K-nearest neighbors (KNN) can be used for classification, but the choice of K affects the decision boundary's flexibility.