Data Mining Quiz - Ch.7

0.0(0)
studied byStudied by 0 people
full-widthCall with Kai
GameKnowt Play
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
Card Sorting

1/7

encourage image

There's no tags or description

Looks like no tags are added yet.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

8 Terms

1
New cards

Basis Functions

Very simple extensions of linear models

  • Polynomial Regression

  • Step Functions (Piecewise-Constant Regression)

  • Splines

Basis functions b1 (X), b2 (X), … , bK (X) are fixed, known, and hand selected

  • Transforming X into something else

  • Like Linear Regression, all of the statistical tools are applicable here too

    • Standard Errors

    • Coefficient estimates

    • F-statistics

2
New cards

Polynomial Regression

The standard way to extend linear and logistic regression

  • Add polynomial terms (X d )

    • Typically d ≤ 4

3
New cards

Step Functions

  • Uses step functions to avoid imposing a global structure

  • Break X into bins, turn into ordered categorical variables/dummy variables

  • Good for variables that have natural break points

    • Ex: 5 year age bins

    • Are poor predictors at the breakpoints

4
New cards

Regression Splines

  • Type of basis function that is a combination of polynomial regression and step functions

  • Locations where the coefficients/functions change are called knots

    • More knots, more flexible method

  • Adding a constraint removes a degree of freedom, reducing complexity! (smoothing it out)

5
New cards

Natural Splines

  • Splines can have high variance at the outer range of X

  • Natural spline - adds boundary constraints, must be linear at the boundaries

    • Boundaries - the region smaller than the smallest K and the region larger than the largest K

6
New cards

Smoothing Splines

  • Different approach, still produces a spline

  • Places a knot at every value of X

  • Uses penalty to determine smoothness

  • λ is the smoothing parameter controlling the trade-off:

    • Small λ≈0\lambda \approx 0λ≈0 → very flexible, almost interpolates data → high variance.

    • Large λ→∞\lambda \to \inftyλ→∞ → heavily penalizes wiggles → approaches a straight line → low variance, high bias.

7
New cards

Local Regression

  • Instead of fitting one global regression to all the data, this fits a regression only around the target point x0

  • Nearby observations have more influence on the fit at x0​, while distant points have little or no effect.

  • Conceptually, this is similar to K-nearest neighbors (KNN), except:

    • KNN predicts by averaging nearby y-values.

    • Local regression predicts by fitting a weighted regression locally.

8
New cards

Generalized Additive Models (GAMs)

  • The model is additive:

    • The effect of each predictor is added together.

    • There are no interaction terms by default.

  • This keeps the model interpretable:

    • You can examine how each variable individually affects the response.

  • Allow you to use splines, natural splines, smoothing splines, or local regression for each predictor.

  • The only restriction: the contributions of predictors are added together, not multiplied or combined in complex ways (unless you specifically include interaction terms)