Data Mining Quiz - Ch.7

0.0(0)

Studied by 0 people

Call with Kai

Knowt Play

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Card Sorting

1/7

Looks like no tags are added yet.

Study Analytics

Name	Mastery	Learn	Test	Matching	Spaced

No study sessions yet.

New cards

Basis Functions

Very simple extensions of linear models

Basis functions b1 (X), b2 (X), … , bK (X) are fixed, known, and hand selected

Transforming X into something else
Like Linear Regression, all of the statistical tools are applicable here too
- Standard Errors
- Coefficient estimates
- F-statistics

New cards

Polynomial Regression

The standard way to extend linear and logistic regression

New cards

Step Functions

Uses step functions to avoid imposing a global structure
Break X into bins, turn into ordered categorical variables/dummy variables
Good for variables that have natural break points
- Ex: 5 year age bins
- Are poor predictors at the breakpoints

New cards

Regression Splines

Type of basis function that is a combination of polynomial regression and step functions
Locations where the coefficients/functions change are called knots
- More knots, more flexible method
Adding a constraint removes a degree of freedom, reducing complexity! (smoothing it out)

New cards

Natural Splines

Splines can have high variance at the outer range of X
Natural spline - adds boundary constraints, must be linear at the boundaries
- Boundaries - the region smaller than the smallest K and the region larger than the largest K

New cards

Smoothing Splines

Different approach, still produces a spline
Places a knot at every value of X
Uses penalty to determine smoothness
λ is the smoothing parameter controlling the trade-off:
- Small λ≈0\lambda \approx 0λ≈0 → very flexible, almost interpolates data → high variance.
- Large λ→∞\lambda \to \inftyλ→∞ → heavily penalizes wiggles → approaches a straight line → low variance, high bias.

New cards

Local Regression

Instead of fitting one global regression to all the data, this fits a regression only around the target point x0
Nearby observations have more influence on the fit at x0, while distant points have little or no effect.
Conceptually, this is similar to K-nearest neighbors (KNN), except:
- KNN predicts by averaging nearby y-values.
- Local regression predicts by fitting a weighted regression locally.

New cards

Generalized Additive Models (GAMs)

The model is additive:
- The effect of each predictor is added together.
- There are no interaction terms by default.
This keeps the model interpretable:
- You can examine how each variable individually affects the response.
Allow you to use splines, natural splines, smoothing splines, or local regression for each predictor.
The only restriction: the contributions of predictors are added together, not multiplied or combined in complex ways (unless you specifically include interaction terms)