1/23
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
What are the three key steps in the basic machine learning process?
1. Data Input: Collecting and feeding data to the system.
2. Abstraction: Extracting patterns and features from data.
3. Generalization: Applying learned knowledge to unseen data.
What is a model in machine learning?
A model is a structured representation of data patterns used to make predictions or decisions. It can be a mathematical equation, graph, tree, or computational block.
What are the main types of machine learning models?
Supervised Learning: Classification, Regression Unsupervised Learning: Clustering, Association Reinforcement Learning
What is the difference between classification and regression models?
Classification: Predicts categorical outcomes. Regression: Predicts continuous values.
What are descriptive models used for?
They describe datasets and find patterns without a target variable, typically using clustering.
What methods are used to train machine learning models?
Holdout Method, K-Fold Cross-Validation, Bootstrap Sampling, Lazy vs. Eager Learning.
What is the Holdout Method?
It splits data into training and testing sets, usually 70-80% for training and 20-30% for testing. Ranked sampling can handle imbalanced data.
What is K-Fold Cross-Validation?
The dataset is split into 'k' parts. The model is trained on k-1 parts and tested on the remaining one. Popular variants: 10-Fold CV, Leave-One-Out CV (LOOCV).
What is Bootstrap Sampling?
Random sampling with replacement to create training/test datasets. Useful for small datasets.
What's the difference between lazy and eager learners?
Eager Learners: Train a model in advance. Lazy Learners: Store training data and generalize at prediction time.
What is underfitting in machine learning?
A model that is too simple and cannot capture data patterns well. It performs poorly on both training and test data.
What causes underfitting, and how can it be avoided?
Caused by insufficient data or oversimplified models. Avoided by using more data and effective feature selection.
What is overfitting in machine learning?
A model that fits the training data too well, including noise and outliers, leading to poor generalization to new data.
How can overfitting be prevented?
Using cross-validation, holding out validation data, or removing unimportant model components.
What is a confusion matrix?
A table showing correct and incorrect predictions: True Positive (TP), False Positive (FP), True Negative (TN), False Negative (FN).
What are the key performance metrics for evaluating classification models?
Accuracy, Error Rate, Sensitivity (Recall), Specificity, Precision, F-measure, ROC Curve and AUC.
How is accuracy calculated?
Accuracy = (TP + TN) / (Total Predictions)
What does the error rate represent?
Error Rate = 1 - Accuracy
What is sensitivity (recall)?
Proportion of actual positives correctly identified: Sensitivity = TP / (TP + FN)
What is specificity?
Proportion of actual negatives correctly identified: Specificity = TN / (TN + FP)
What is precision?
Proportion of predicted positives that are true positives: Precision = TP / (TP + FP)
What is the F-measure?
Harmonic mean of precision and recall: F1 = 2(Precision Recall) / (Precision + Recall)
What is an ROC curve?
A plot of True Positive Rate vs. False Positive Rate at different thresholds, showing classification performance.
What does the AUC represent?
Area Under the ROC Curve. Higher AUC indicates better model performance. AUC < 0.5 means poor prediction.