Applied Machine Learning - Ch6: Model Selection

0.0(0)
studied byStudied by 0 people
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
Card Sorting

1/23

encourage image

There's no tags or description

Looks like no tags are added yet.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

24 Terms

1
New cards

What are the three key steps in the basic machine learning process?

1. Data Input: Collecting and feeding data to the system.

2. Abstraction: Extracting patterns and features from data.

3. Generalization: Applying learned knowledge to unseen data.

2
New cards

What is a model in machine learning?

A model is a structured representation of data patterns used to make predictions or decisions. It can be a mathematical equation, graph, tree, or computational block.

3
New cards

What are the main types of machine learning models?

Supervised Learning: Classification, Regression Unsupervised Learning: Clustering, Association Reinforcement Learning

4
New cards

What is the difference between classification and regression models?

Classification: Predicts categorical outcomes. Regression: Predicts continuous values.

5
New cards

What are descriptive models used for?

They describe datasets and find patterns without a target variable, typically using clustering.

6
New cards

What methods are used to train machine learning models?

Holdout Method, K-Fold Cross-Validation, Bootstrap Sampling, Lazy vs. Eager Learning.

7
New cards

What is the Holdout Method?

It splits data into training and testing sets, usually 70-80% for training and 20-30% for testing. Ranked sampling can handle imbalanced data.

8
New cards

What is K-Fold Cross-Validation?

The dataset is split into 'k' parts. The model is trained on k-1 parts and tested on the remaining one. Popular variants: 10-Fold CV, Leave-One-Out CV (LOOCV).

9
New cards

What is Bootstrap Sampling?

Random sampling with replacement to create training/test datasets. Useful for small datasets.

10
New cards

What's the difference between lazy and eager learners?

Eager Learners: Train a model in advance. Lazy Learners: Store training data and generalize at prediction time.

11
New cards

What is underfitting in machine learning?

A model that is too simple and cannot capture data patterns well. It performs poorly on both training and test data.

12
New cards

What causes underfitting, and how can it be avoided?

Caused by insufficient data or oversimplified models. Avoided by using more data and effective feature selection.

13
New cards

What is overfitting in machine learning?

A model that fits the training data too well, including noise and outliers, leading to poor generalization to new data.

14
New cards

How can overfitting be prevented?

Using cross-validation, holding out validation data, or removing unimportant model components.

15
New cards

What is a confusion matrix?

A table showing correct and incorrect predictions: True Positive (TP), False Positive (FP), True Negative (TN), False Negative (FN).

16
New cards

What are the key performance metrics for evaluating classification models?

Accuracy, Error Rate, Sensitivity (Recall), Specificity, Precision, F-measure, ROC Curve and AUC.

17
New cards

How is accuracy calculated?

Accuracy = (TP + TN) / (Total Predictions)

18
New cards

What does the error rate represent?

Error Rate = 1 - Accuracy

19
New cards

What is sensitivity (recall)?

Proportion of actual positives correctly identified: Sensitivity = TP / (TP + FN)

20
New cards

What is specificity?

Proportion of actual negatives correctly identified: Specificity = TN / (TN + FP)

21
New cards

What is precision?

Proportion of predicted positives that are true positives: Precision = TP / (TP + FP)

22
New cards

What is the F-measure?

Harmonic mean of precision and recall: F1 = 2(Precision Recall) / (Precision + Recall)

23
New cards

What is an ROC curve?

A plot of True Positive Rate vs. False Positive Rate at different thresholds, showing classification performance.

24
New cards

What does the AUC represent?

Area Under the ROC Curve. Higher AUC indicates better model performance. AUC < 0.5 means poor prediction.