Machine Learning

0.0(0)
studied byStudied by 0 people
0.0(0)
full-widthCall Kai
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
GameKnowt Play
Card Sorting

1/19

encourage image

There's no tags or description

Looks like no tags are added yet.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced
Call with Kai

No study sessions yet.

20 Terms

1
New cards

Machine Learning Definition (Mitchell)

Machine Learning is the field where a computer's ability to learn is defined: A program learns from experience E with respect to a task T and a performance measure P, if its performance on T, as measured by P, improves with E. The program is not explicitly programmed for each step.

2
New cards

Supervised Learning: Training Data

In Supervised Learning, the training data consists of labeled examples: a set of pairs {(x₁, y₁), (x₂, y₂), …, (xₙ, yₙ)}. Each xᵢ is a feature vector (input), and each yᵢ is its corresponding label (output). The goal is to learn a function mapping features to labels.

3
New cards

Feature Vector Example

A feature vector is a structured representation of an input instance. Example for weather: x₁ = [sun, hot, high, weak]^T, where each element is a value for a feature (Outlook, Temperature, Humidity, Wind). This vector is the input for a classifier.

4
New cards

Unsupervised Learning

In Unsupervised Learning, the input data is unlabeled: {x₁, x₂, …, xₙ}, containing only feature vectors. The goal is to find inherent patterns, such as grouping similar data points (clustering), anomaly detection, or knowledge discovery, without predefined categories.

5
New cards

Example ML Problem: Handwriting Recognition

Task T: Recognize and classify handwritten words. Performance P: Percentage of words correctly classified. Experience E: A database of handwritten words with known classifications (labeled data). This is a supervised learning problem.

6
New cards

Example ML Problem: Self-Driving Car

Task T: Drive on a public four-lane highway using vision sensors. Performance P: Average distance traveled before an error (as judged by a human supervisor). Experience E: A sequence of images and corresponding steering commands recorded while observing a human driver. This is a supervised learning problem.

7
New cards

Decision Tree Learning

Decision Tree Learning approximates discrete-valued target functions where the learned hypothesis h(x) → y is represented as a tree. Internal nodes test a feature/attribute. Branches correspond to attribute values. Leaf nodes provide a final classification decision. Any function in Disjunctive Normal Form (DNF) can be expressed as a decision tree.

8
New cards

Building a Decision Tree: Key Principle

Build a decision tree using a divide-and-conquer, greedy strategy: Always test the most important attribute first (the one that provides the greatest information gain or reduction in entropy/impurity for classification). Then, recursively build subtrees for each resulting subset of data.

9
New cards

Information Gain

Information Gain measures the effectiveness of an attribute for classifying data. It is defined as the reduction in entropy achieved by splitting the dataset based on that attribute. Formally, Gain(S, A) = Entropy(S) - Σ{v∈Values(A)} (|Sv|/|S|) * Entropy(Sv), where Sv is the subset of S where attribute A has value v. The attribute with the highest gain is chosen for splitting.

10
New cards

K-Nearest Neighbors (KNN) Algorithm

K-Nearest Neighbors (KNN) predicts the label for a new instance xnew by: 1) Finding the k training examples closest to xnew (using a distance metric, e.g., Euclidean). 2) For classification: Taking a majority vote among their labels. For k=1, it simply uses the label of the single nearest neighbor.

11
New cards

KNN: Need for Normalization

KNN requires feature normalization (or standardization). Because it uses distance metrics, if features have different scales (e.g., age 0-100 vs. salary 0-100,000), the feature with the larger range will disproportionately dominate the distance calculation, skewing results. Normalization puts all features on a comparable scale.

12
New cards

Generalization in ML

Generalization refers to a model's ability to perform well on unseen data (the test set), not just the training data. Desired properties: 1) Stability: Small changes in training data cause minimal changes in predictions. 2) Consistent performance: Performance metric P remains similar between training and testing. KNN with k > 1 is generally stable due to its voting mechanism.

13
New cards

Bagging (Bootstrap Aggregating)

Bagging reduces variance in unstable models (like deep decision trees). Process: 1) Create K bootstrap samples (datasets of size n) by sampling with replacement from the original training set. 2) Train a separate model (e.g., decision tree) on each sample. 3) Aggregate predictions: average for regression, majority vote for classification. The final model is hbag(x) = (1/K) Σ hi(x).

14
New cards

Problem with Bagging Standard Decision Trees

A problem arises when bagging standard decision trees: If certain features have consistently high information gain, most trees will greedily select those same features at the root. This makes the trees highly correlated, limiting the variance reduction benefit of aggregation.

15
New cards

Random Forests: Core Idea

Random Forests improve upon bagging by decorrelating the trees. They introduce two sources of randomness: 1) Row (Data) Sampling: Bootstrap sampling (bagging). 2) Feature Sampling: At each split in a tree, only a random subset of m features (typically m ≈ √p, where p is the total features) is considered for splitting. This forces trees to differ.

16
New cards

Random Forests: Advantages

Advantages of Random Forests: 1) Increased robustness and accuracy by averaging many decorrelated trees. 2) Very good stability without needing heavy pruning. 3) Computational efficiency and ease of parallelization (trees are independent). 4) Provides estimates of feature importance.

17
New cards

Cross-Validation (CV) Process

K-Fold Cross-Validation process: 1) Randomly split the training data into K non-overlapping folds (subsets). 2) For i = 1 to K: Train the model on the other K-1 folds and validate its performance on the i-th fold. 3) Average the performance (e.g., accuracy) across the K validation folds to get a more reliable estimate of model performance.

18
New cards

Purpose of Cross-Validation

The primary purpose of cross-validation is model evaluation and selection without touching the final test set. It provides a low-bias estimate of a model's generalization performance by using all data for both training and validation in a structured way. It is commonly used for hyperparameter tuning.

19
New cards

Leave-One-Out Cross-Validation (LOOCV)

Leave-One-Out Cross-Validation (LOOCV) is an extreme case where K = n (number of training examples). Each iteration uses n-1 examples for training and the single remaining example for validation. It is computationally expensive but provides an almost unbiased performance estimate, as it maximizes the training data used each time.

20
New cards

Two Layers of Model Evaluation with CV

Proper ML practice involves two layers of evaluation: 1) Model Selection/Validation: Use cross-validation on the training set to compare different models or tune hyperparameters. 2) Final Evaluation: After selecting the best model, evaluate its performance once on the held-out test set to estimate real-world performance. The test set is used only once.