Dimension Reduction - Predictive Analytics Final Exam

0.0(0)
studied byStudied by 0 people
0.0(0)
full-widthCall Kai
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
GameKnowt Play
Card Sorting

1/40

encourage image

There's no tags or description

Looks like no tags are added yet.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

41 Terms

1
New cards

What are the specific traits of forward selection?

1. less complex
2. better with large data sets
3. higher risk
4. lower computational cost

2
New cards

Why is a model with too many predictors not useful?

1. Predictors can be correlated
2. Lead to overfitting in the model

3
New cards

What is the goal of subset selection?

To find a simple model that performs sufficiently well

4
New cards

(True/False) Do you only subset and assess predictive accuracy on test data?

True

5
New cards

What are the traits of an exhaustive search?

1. All predictors are assessed
2. Computationally intensive for big sets of data
3. Does give you the best subset to use

6
New cards

(True/False) The best use of sub setting will balance model fit and complexity.

True

7
New cards

What is the adjusted R^2?

Known as the coefficient of determination, and adjusts the R^2 statistic for the number of predictors (higher the better)

8
New cards

What is Akike Information Criterion? (AIC)

Balances model fit and complexity (lower the better)

9
New cards

What is the Bayesian Information Criterion? (BIC)

Imposes a larger penalty for models with more predictors (strict) (lower the better)

10
New cards

What is forward selection?

Starts with no predictors, adds them one by one, and stops when the addition doesn't improve the performance

11
New cards

What is backward elimination?

Starts with all predictors, eliminates the least useful ones one by one, and stops when all remaining predictors are significant

12
New cards

What is stepwise selection?

Like forward selection, but at each step considers dropping non-significant predictors

13
New cards

What are the specific traits of backward elimination?

1. more complex
2. worse with large data sets
3. lower risk
4. higher computational cost

14
New cards

What are the specific traits of stepwise selection?

1. Intermediate complexity
2. better with large data sets
3. balances risk
4. moderate to high computational cost

15
New cards

What does the R function regsubsets() work with?

Work with numerical quantitative variables

16
New cards

What does the R function stepAIC() work with?

Work with categorical outcome variables

17
New cards

When do you use PCAs?

When you want to reduce the number of features while retaining the most information

18
New cards

How is Principal Component Analysis measured?

By the sum of the variances of the variables (weighted averages of the original variables)

19
New cards

What do PCAs create?

Create new variables that are linear combinations of the original variables

20
New cards

In a PCA, are the linear combinations dependent or independent?

Independent of another

21
New cards

Can PCs be used for categorical variables?

No, only quantitative variables

22
New cards

How does PCs rank variance?

First explains the most variance, next explains the rest, and so on, while all standing alone

23
New cards

What are loadings?

The weight that tells how much each original variable contributes to a PC

24
New cards

What is the magnitude of the loadings?

The variable's influence on the PC (the stronger the influence, the bigger the magnitude)

25
New cards

What is a sign of a loading?

Shows whether a variable moves with a PC or against it

26
New cards

Which of the following is true regarding the retained variables from subsetting?

Have a clearer meaning

27
New cards

Which of the following is true regarding the retained variables from PCAs?

Not directly interpretable

28
New cards

Adjusted R² should be high or low to achieve a good performing model?

High

29
New cards
30
New cards

How does variable selection differ from Principal Component Analysis (PCA)?

Variable selection identifies and retains the most relevant variables whereas PCAs reduce dimensionality while retaining most of the variance

31
New cards

What approach do we use for variable selection?

Select a subset of original variables based on critera

32
New cards

What approach do we use for PCAs?

Transform original variables into new uncorrelated variables (PCs)

33
New cards

How can we describe the nature of variables in variable selection?

Original features are retained

34
New cards

How can we describe the nature of variables in PCAs?

New variables are created from linear combinations of original variables

35
New cards

How can we describe the interpretability in variable selection?

Retained variables have clear meaning and context

36
New cards

How can we describe the interpretability in PCAs?

PCs are not directly interpretable

37
New cards

How can we describe the data structure in variable selection?

The data structure remains the same with fewer variables

38
New cards

How can we describe the data structure in PCAs?

Data structure is altered with new PCs

39
New cards

What is an example of variable selection?

Selecting 10 relevant variables from 100

40
New cards

What is an example of PCAs?

Transforming 100 variables into only 10 PCs

41
New cards