Dimension Reduction - Predictive Analytics Final Exam

0.0(0)

Studied by 0 people

0.0(0)

Call Kai

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Knowt Play

Card Sorting

1/40

There's no tags or description

Looks like no tags are added yet.

Study Analytics

Name	Mastery	Learn	Test	Matching	Spaced

No study sessions yet.

41 Terms

New cards

What are the specific traits of forward selection?

1. less complex
2. better with large data sets
3. higher risk
4. lower computational cost

New cards

Why is a model with too many predictors not useful?

1. Predictors can be correlated
2. Lead to overfitting in the model

New cards

What is the goal of subset selection?

To find a simple model that performs sufficiently well

New cards

(True/False) Do you only subset and assess predictive accuracy on test data?

True

New cards

What are the traits of an exhaustive search?

1. All predictors are assessed
2. Computationally intensive for big sets of data
3. Does give you the best subset to use

New cards

(True/False) The best use of sub setting will balance model fit and complexity.

True

New cards

What is the adjusted R^2?

Known as the coefficient of determination, and adjusts the R^2 statistic for the number of predictors (higher the better)

New cards

What is Akike Information Criterion? (AIC)

Balances model fit and complexity (lower the better)

New cards

What is the Bayesian Information Criterion? (BIC)

Imposes a larger penalty for models with more predictors (strict) (lower the better)

New cards

What is forward selection?

Starts with no predictors, adds them one by one, and stops when the addition doesn't improve the performance

New cards

What is backward elimination?

Starts with all predictors, eliminates the least useful ones one by one, and stops when all remaining predictors are significant

New cards

What is stepwise selection?

Like forward selection, but at each step considers dropping non-significant predictors

New cards

What are the specific traits of backward elimination?

1. more complex
2. worse with large data sets
3. lower risk
4. higher computational cost

New cards

What are the specific traits of stepwise selection?

1. Intermediate complexity
2. better with large data sets
3. balances risk
4. moderate to high computational cost

New cards

What does the R function regsubsets() work with?

Work with numerical quantitative variables

New cards

What does the R function stepAIC() work with?

Work with categorical outcome variables

New cards

When do you use PCAs?

When you want to reduce the number of features while retaining the most information

New cards

How is Principal Component Analysis measured?

By the sum of the variances of the variables (weighted averages of the original variables)

New cards

What do PCAs create?

Create new variables that are linear combinations of the original variables

New cards

In a PCA, are the linear combinations dependent or independent?

Independent of another

New cards

Can PCs be used for categorical variables?

No, only quantitative variables

New cards

How does PCs rank variance?

First explains the most variance, next explains the rest, and so on, while all standing alone

New cards

What are loadings?

The weight that tells how much each original variable contributes to a PC

New cards

What is the magnitude of the loadings?

The variable's influence on the PC (the stronger the influence, the bigger the magnitude)

New cards

What is a sign of a loading?

Shows whether a variable moves with a PC or against it

New cards

Which of the following is true regarding the retained variables from subsetting?

Have a clearer meaning

New cards

Which of the following is true regarding the retained variables from PCAs?

Not directly interpretable

New cards

Adjusted R² should be high or low to achieve a good performing model?

High

New cards

How does variable selection differ from Principal Component Analysis (PCA)?

Variable selection identifies and retains the most relevant variables whereas PCAs reduce dimensionality while retaining most of the variance

New cards

What approach do we use for variable selection?

Select a subset of original variables based on critera

New cards

What approach do we use for PCAs?

Transform original variables into new uncorrelated variables (PCs)

New cards

How can we describe the nature of variables in variable selection?

Original features are retained

New cards

How can we describe the nature of variables in PCAs?

New variables are created from linear combinations of original variables

New cards

How can we describe the interpretability in variable selection?

Retained variables have clear meaning and context

New cards

How can we describe the interpretability in PCAs?

PCs are not directly interpretable

New cards

How can we describe the data structure in variable selection?

The data structure remains the same with fewer variables

New cards

How can we describe the data structure in PCAs?

Data structure is altered with new PCs

New cards

What is an example of variable selection?

Selecting 10 relevant variables from 100

New cards

What is an example of PCAs?

Transforming 100 variables into only 10 PCs

New cards