DATA MINING MIDTERM

0.0(0)
studied byStudied by 0 people
0.0(0)
full-widthCall Kai
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
GameKnowt Play
Card Sorting

1/39

encourage image

There's no tags or description

Looks like no tags are added yet.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

40 Terms

1
New cards

Sum of squared residuals

We estimate the linear regression coefficients by minimizing the _____

2
New cards

RSE( Residual squared error)

The standard deviation of the error and accuracy of the model is measured using ____

3
New cards

P-Value

The _____ can be used to reject the null hypothesis if < 0.05

4
New cards

MSE

The ____ is reported in units of Y

5
New cards

K-Nearest Neighbor

The _____ approach is a non-parametric method that makes a prediction based on the closest training observation

6
New cards

Cross validation (either LOOCV OR K-Fold)

Performing _____ ensures that every observation is selected for the testing data at least once

7
New cards

Decision Boundary (Discriminant function)

Linear discriminant analysis uses a _____ to seperate observations into distinct classes

8
New cards

Prior Probability

The ______ measures the probability that a random chosen observation belongs to class 

9
New cards

Posterior Probability

Refers to updated beliefs or probabilities after new data has been incorporated through Bayes' Theorem

10
New cards

Best Subset Selection

Performing ______ to sub-select predictors requires the user to check every possible combinations of predictors (2p).

11
New cards

Principal Component Analysis (PCA)

The ______ is unsupervised method used to transform the predictors (p) to a linear combination of the predictors (M, p ≥ M).

12
New cards

Knot

A _____ is a location where our coefficients and functions change.

13
New cards

Regression spline

The _______ is a combination of step functions and polynomial regression.

14
New cards

Random Forest

The Decision Tree based model can be improved upon by using bagging and sub-selecting predictors at each split, typically called _______.

15
New cards

Pure Nodes

The goal of splits in trees is to produce homogeneous child nodes, often called ______.

16
New cards

We can relax the additive assumption of linear regression by adding interaction terms.

True

17
New cards

Linear regression is applicable to datasets where p is larger than n.

False

18
New cards

Naive Bayes classifiers assumes that all predictors are independent within classes

True

19
New cards

Classifiers typically return a probability that a given observation belongs to class k.

True

20
New cards

It is expected that the training error rate is lower than the testing error rate.

True

21
New cards

A confusion matrix is used to assess accuracy for classification and regression models.

False

22
New cards

It is good practice to prevent data leakage by reusing the same sample in both training and testing.

False

23
New cards

Both Ridge Regression and Lasso use a shrinkage penalty to regularize the coefficients to reduce the impact of the predictor on the model.

True

24
New cards

Forward and Backward Stepwise Selection are guaranteed to find the best possible combinations of predictors.

False

25
New cards

Cross Validation is often the best method to find the most optimal parameters.

True

26
New cards

Basis Functions are fixed, known functions (bk(X)) that transform X to allow us to use statistical tools like Standard Errors and Coefficient estimates.

True

27
New cards

For splines, it is best practice to use fewer knots to increase flexibility in regions where it may be necessary.

False

28
New cards

Generalized Additive Models allow us to use more than one predictor in our model.

True

29
New cards
30
New cards

Ridge Regression

knowt flashcard image
31
New cards

Smoothing Splines

knowt flashcard image
32
New cards

Linear Regression

knowt flashcard image
33
New cards

Lasso Regression

knowt flashcard image
34
New cards

Linear Regression

knowt flashcard image
35
New cards

Logistic Regression

knowt flashcard image
36
New cards

Ridge Regression

knowt flashcard image
37
New cards

Polynomial Regression

knowt flashcard image
38
New cards

Step Functions

knowt flashcard image
39
New cards

Lasso Regression

knowt flashcard image
40
New cards

Regression Splines

knowt flashcard image