Data Science - Model Selection Part 2

0.0(0)
studied byStudied by 0 people
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
Card Sorting

1/11

encourage image

There's no tags or description

Looks like no tags are added yet.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

12 Terms

1
New cards

Advantages of Decision Trees (3)

  • Highly Interpretable

  • Can work with both categorical and numeric data

  • No scaling neccessary

2
New cards

Disadvantages of Decision Trees (2)

  • Prone to overfit

  • Sensitive to data variations

3
New cards

Gini Impurity Index (What is it and value meanings)

Shows how likely an observation will be misclassified

  • 0 = Completely Pure = All elements are in the same class

  • 5 = Impure = Elements are distributed among classes

4
New cards

Accuracy

How often is the model correct?

5
New cards

Precision

Of instances predicted as positive, how many are actually correct?

6
New cards

Recall

Out of all of the actual positives, how many did we get?

7
New cards

Pro and con of decision trees

Pro: Reduces overfitting

Cons: Gets slower with larger datasets

8
New cards

Bagging vs Boosting

Bagging: Combines predictions from all models

Boosting: Each model tries to fix errors of previous models

9
New cards

Con of K Means

Influenced by outliers

10
New cards

Ways to pick K

Elbow method: Look for a “bend” in the errors

Silhouette Score: Ranges from -1 to 1, with higher being better

11
New cards

Covariance in GMM

Shape of clusters

12
New cards

One way to estimate params in GMM

Maximum Likliehood Estimation (MLE)

  • Estimates params to maximize probabilities