Intro to AI: Machine Learning, Types of Learning, K Calculations

0.0(0)

Studied by 1 person

0.0(0)

Call Kai

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Knowt Play

Card Sorting

1/101

There's no tags or description

Looks like no tags are added yet.

Study Analytics

Name	Mastery	Learn	Test	Matching	Spaced

No study sessions yet.

102 Terms

New cards

Machine Learning

Subset of AI that deals with learning agents
Doesn’t require us to directly manipulate it
Based on past experiences or data to arrive at an output

New cards

Deep learning

Part of ML involving artificial neural networks
Deep because if more than 3 layers then deep learning

New cards

How can you say an agent is learning?

Is learning if it improved performance after making observations about the world

New cards

Machine Learning when an agent is a computer

Observes data
Build a model based data
- Model: hypothesis about the world and software that can solve problems

New cards

Types of Learning

Supervised learning
- Learn a function from labeled data
- EX. There’s an answer key
Unsupervised Learning
- Learn patterns from unlabeled data
- Output is not labeled but you can still make sense of it
Reinforcement Learning
- Learn best actions from experience of rewards and punishments
- Learning by itself

New cards

Supervised Learning

Labeled data
- Input-output pairs where label is the output
Agent is taught by examples of labeled data

<ul><li><p><span style="background-color: transparent;">Labeled data</span></p><ul><li><p><span style="background-color: transparent;">Input-output pairs where label is the output</span></p></li></ul></li><li><p><span style="background-color: transparent;">Agent is taught by <strong><mark data-color="#7bffb9" style="background-color: rgb(123, 255, 185); color: inherit;">examples of labeled data</mark></strong></span></p></li></ul><p></p>

New cards

What does Supervised Learning do with labeled data?

Observes the labeled data and learns a function or builds a model based on that data
Uses the function or model to process input data and give an output

<ul><li><p><span style="background-color: transparent;">Observes the <strong><u><mark data-color="#ffafaf" style="background-color: rgb(255, 175, 175); color: inherit;">labeled data and learns a function</mark></u></strong> or <strong><mark data-color="#a4ddff" style="background-color: rgb(164, 221, 255); color: inherit;">builds a model based on that data</mark></strong></span></p></li><li><p><span style="background-color: transparent;">Uses the function or model to process input data and give an output</span></p></li></ul><p></p>

New cards

Types of Supervised Learning

Classification
Regression

New cards

Classification

Output:
- Finite set of values called classes or labels
- EX. true/false, sunny/rainy/cloudy
Agent learns from observed values to determine what label new observations belong

New cards

Regression

Output:
- Number
- EX. temperature, which can be an integer or a real number
Agent estimates and understands the relationship among variables
Useful for prediction and forecasting

New cards

Supervised Learning Algorithms

Nearest neighbors
Decision trees
Neural networks
Support vector machines
Linear regression

New cards

Unsupervised Learning

Agents learn patterns from input without feedback (unlabeled data)
Example:
- Input: Images of animals
- Output: Groups of similar images

<ul><li><p><span style="background-color: transparent;">Agents <strong><u><mark data-color="#ffc0c0" style="background-color: rgb(255, 192, 192); color: inherit;">learn patterns from input without feedback</mark></u></strong> (unlabeled data)</span></p></li><li><p><span style="background-color: transparent;">Example:</span></p><ul><li><p><span style="background-color: transparent;">Input: Images of animals</span></p></li><li><p><span style="background-color: transparent;">Output: Groups of similar images</span></p></li></ul></li></ul><p></p>

New cards

Types of Unsupervised Learning

Clustering
Association Rule Mining

New cards

Clustering

Input
- Unlabeled dataset
Output
- Sets of similar data (based on defined criteria)
Useful for discovering segments in data and applying different business strategies for each segment

New cards

Association Rule Mining

Output
- Correlations and associations
EX. Which items shoppers tend to purchase together (frequently bought together or market basket analysis)

New cards

Unsupervised Learning Algorithms

K Means Clustering
Hierarchical Clustering
Gaussian Mixture Models
Apriori Algorithm (Association rule mining)

New cards

Reinforcement Learning

Agent learns from rewards and punishments
- Decides on actions towards more rewards
Agent needs to balance exploration and exploitation

New cards

Exploration VS Exploitation

Exploitation: stay with what has given most reward
Exploration: try other options to get additional information
EX:
- Gambling agent that:
  - Chooses a slot machine that gave the most returns (reward)
  - Avoids slot machines that have not (punishment)

New cards

Reinforcement Learning Algorithms

Q-Learning
State-Action-Reward-State-Action (SARSA)
Deep Q Network

New cards

Input of Classification

Labeled Dataset

Instances with labels
- Instances = examples

New cards

Classification: What would an instance need to have?

A set of features/attributes
A label

New cards

Instances = ______

Features = ______

Labels = ______

Rows, Columns, Last Column (usually)

New cards

What is the goal of Classification?

Derive a function (also called a model) based on a dataset
Predict the label of an instance with unknown label

New cards

Steps to Training and Testing a Machine Learning Model (Supervised Learning)

Model Training
Model Testing

New cards

Steps to Training and Testing a Machine Learning Model (Supervised Learning): Model Training

Start with labeled dataset
- Features X is the input
- Labels Y is the target used by the model to make predictions
Model learns from labeled data
Goal: learn the relationship between features and labels, so it can later make accurate predictions

New cards

Steps to Training and Testing a Machine Learning Model (Supervised Learning): Model Testing

Use test features (data that model never saw) to evaluate model
Model uses test features to make predicted labels (output classfications made by model)

New cards

Classification Models: Nearest Neighbors or K Nearest Neighbors

Instances as labeled datapoints in a graph
- Features = “coordinates”

For an unlabeled instance
- Get the K nearest points
- Get the label that represents most of these points

<ul><li><p><span style="background-color: transparent;">Instances as <strong><u><mark data-color="#b5ff95" style="background-color: rgb(181, 255, 149); color: inherit;">labeled datapoints in a graph</mark></u></strong></span></p><ul><li><p><span style="background-color: transparent;">Features = “coordinates”</span></p></li></ul></li></ul><ul><li><p><span style="background-color: transparent;">For an unlabeled instance</span></p><ul><li><p><span style="background-color: transparent;">Get the K nearest points</span></p></li><li><p><span style="background-color: transparent;"><strong><u><mark data-color="#84dfff" style="background-color: rgb(132, 223, 255); color: inherit;">Get the label that represents most of these points</mark></u></strong></span></p></li></ul></li></ul><p></p>

New cards

Classification Models: Decision Trees

A sequence of tests (decisions) induced from dataset
- Each test is based on 1 feature
- Eventually leads to a predicted label

Goal: A tree that consistently leads to the correct labels

Use first the feature that can best distinguish examples by their labels

<ul><li><p><span style="background-color: transparent;">A sequence of tests (decisions) induced from dataset</span></p><ul><li><p><span style="background-color: transparent;">Each test is based on 1 feature</span></p></li><li><p><span style="background-color: transparent;">Eventually leads to a predicted label</span></p></li></ul></li></ul><ul><li><p><span style="background-color: transparent;">Goal: <strong><u><mark data-color="#acff4b" style="background-color: rgb(172, 255, 75); color: inherit;">A tree that consistently leads to the correct labels</mark></u></strong></span></p></li></ul><ul><li><p><span style="background-color: transparent;">Use first the <strong><u><mark data-color="#9dfff1" style="background-color: rgb(157, 255, 241); color: inherit;">feature that can best distinguish examples by their labels</mark></u></strong></span></p></li></ul><p></p>

New cards

What’s the problem with a decision tree?

Overfitting

Fits well with training dataset, but does not do well with new instances
Solution: Random Forest

New cards

Classification Models: Random Forest

Predict labels based on multiple decision trees
Each decision tree is from a random sample of the main dataset
“ensemble method”

<p></p><ul><li><p><span style="background-color: transparent;">Predict labels based on multiple decision trees</span></p></li><li><p><span style="background-color: transparent;">Each decision tree is from a random sample of the main dataset</span></p></li><li><p><span style="background-color: transparent;">“ensemble method”</span></p></li></ul><p></p>

New cards

Classification Models: Support Vector Machines (SVM)

Instances as datapoints, and features as dimensions in a hyperplane
Goal: Linearly divide the labeled datapoints in the dataset
- Make new dimensions if cannot separate
- “Support Vectors”: points closest to boundary
Good in practice; popular in the early 2000s

<ul><li><p><span style="background-color: transparent;">Instances as datapoints, and features as dimensions in a hyperplane</span></p></li><li><p><span style="background-color: transparent;">Goal: Linearly divide the labeled datapoints in the dataset</span></p><ul><li><p><span style="background-color: transparent;">Make new dimensions if cannot separate</span></p></li><li><p><span style="background-color: transparent;">“Support Vectors”: points closest to boundary</span></p></li></ul></li><li><p><span style="background-color: transparent;">Good in practice; popular in the early 2000s</span></p></li></ul><p></p>

New cards

Classification Models: Artificial Neural Networks (ANN)

ANN: layers of neurons connected to each other
- Input layer: takes in input signals (like features)
- Output layer: provides the output (like labels)
- Hidden layers to facilitate computations
- Each layer influence the neuron activation of succeeding layers

Most common method in the past few years!
- Deep learning = multiple hidden layers

Uses back propagation to learn weights and thresholds

New cards

In an ANN, a “neuron” is activated based on what?

Input signals
Weights
Thresholds
Activation function

New cards

Among the classfication models, which one is the most recently popular?

Artificial Neural Networks (ANN)

Because of deep learning (multiple layers)

New cards

How do we evaluate a classification model

Split the dataset:

Training set: used to train the model
Test set: used to evaluate the model

New cards

Model Evaluation of a Classification Model: Computing for Accuracy

Accuracy = # of correct predictions / # of total predictions

New cards

Model Evaluation of a Classification Model: Confusion Matrix

Show correct results against predicted results for each class (i.e. possible values of label)

New cards

Do the Model Evaluation for this example

Accuracy:

Number of test instances: 12

Number of correct predictions: 9

9/12 = 0.75 or 75%

<p>Accuracy: </p><p><span style="background-color: transparent;">Number of test instances: 12</span></p><p><span style="background-color: transparent;">Number of correct predictions: 9</span></p><p><span style="background-color: transparent;">9/12 = 0.75 or 75%</span></p>

New cards

K Nearest Neighbors (KNN) Goal

Goal: Given a new unlabeled instance, predict its label based on nearest neighbors

New cards

KNN For an unlabeled instance

Get the k nearest points
- What is the basis of what’s considered “nearest”?
- What do we do with non-numeric values?
Get the label that represents most of these points
- What do we do with ties?
Conclude that the instance belongs to the representative label

New cards

Distance Metrics to Choose from for KNN

Euclidean distance
Manhattan distance
Hamming Distance
- For binary/categorical data

New cards

Data Transformation Options for KNN

Non-numeric values
Scale issues

New cards

Ways to Determine Majority Vote KNN

Dealing with Ties and Appropriate k

New cards

Euclidean Distance

Assumption: different dimensions are comparable

For 2D plane: √(x₂-x₁)²+(y₂-y₁)² (where x and y are points)
For multiple features: √(△ x₁)²+(△ x₂)² + … + (△ x_m)² (where x is a column)

New cards

Manhattan Distance

Best for datasets where additive differences of features are more appropriate
Add absolute values of column differences
Formula: | Δ x₁ | + | Δ x₂ | + … + | Δ x_m |

New cards

Other Metrics for Distance

Minkowski Distance

Cosine Distance

New cards

Minkowski Distance

Generalization based on value p
Includes Manhattan distance (p = 1) and Euclidean distance (p = 2)

<ul><li><p><span style="background-color: transparent;">Generalization based on value p</span></p></li><li><p><span style="background-color: transparent;">Includes Manhattan distance (p = 1) and Euclidean distance (p = 2)</span></p></li></ul><p></p>

New cards

Cosine Distance

1 – cosine similarity
Inspects the angle between vectors

<ul><li><p><span style="background-color: transparent;">1 – cosine similarity</span></p></li><li><p><span style="background-color: transparent;">Inspects the angle between vectors</span></p></li></ul><p></p>

New cards

What’s the problem that can arise with these metrics?

Scaling

Categorical Features

New cards

Problem: Scaling

Some metrics work when values are of the same scale
Represent same info, but the scale of values are different

Features with much larger values tend to overshadow features with smaller values
Solution: normalize data

New cards

Problem: Categorical Features

Measuring distance between non-numerical features
- Patrons: Some, None, Full
- Type: French, Italian, Thai, Burger
Possible Solutions:
- Convert to numbers
- Count attribute matches

New cards

Hamming Distance

Used for categorical features
Counts number of mismatches among features
Closest point is when all features match
Works for KNN since you still get smallest values

<ul><li><p><span style="background-color: transparent;">Used for <strong><u><mark data-color="#ffadad" style="background-color: rgb(255, 173, 173); color: inherit;">categorical features</mark></u></strong></span></p></li><li><p><span style="background-color: transparent;"><strong><mark data-color="#96c7ff" style="background-color: rgb(150, 199, 255); color: inherit;">Counts number of mismatches</mark></strong> among features</span></p></li><li><p><span style="background-color: transparent;">Closest point is when all features match</span></p></li><li><p><span style="background-color: transparent;"><strong><mark data-color="#f48cff" style="background-color: rgb(244, 140, 255); color: inherit;">Works for KNN</mark></strong> since you still get smallest values</span></p></li></ul><p></p>

New cards

How do we know which Distance metric to use?

Depends on dataset (usually default euclidean)
Important to consider scale and categorical data
Crucial to transform data before choosing and applying a metric
Try to reduce the variance

New cards

Why transform data?

“Format” of data incompatible with distance metric
Can’t apply same distance metric if inconsistent format among features
Inconsistent scaling can skew results to favor certain features

New cards

Data Transformation Types

Categorical to numerical
Numerical to categorical
- Convert to levels
- Bins
Consistent scaling: normalization

New cards

Data Transformation: Categorical to Numerical

Assign a number to each value type

New cards

Data Transformation: Numerical to categorical

Usually do this if using Hamming distance
No need to convert numbers if there are only a few values

If there are many possible values (or even infinite), we can divide values and assign them to bins

<ul><li><p><span style="background-color: transparent;">Usually do this if using <strong><u>Hamming distance</u></strong></span></p></li><li><p><span style="background-color: transparent;">No need to convert numbers if there are only a few values</span></p></li></ul><ul><li><p><span style="background-color: transparent;">If there are many possible values (or even infinite), we can <strong><mark data-color="#ffaeae" style="background-color: rgb(255, 174, 174); color: inherit;">divide values and assign them to bins</mark></strong></span></p></li></ul><p></p>

New cards

Data Transformation: Consistent scaling: normalization

Definition: scale down dataset so that all values fall between 0 and 1
Reduces bias on data
Standard preprocessing step in machine learning
Improves generalization, enabling better predictions on new data

New cards

Types of Scaling under Normalization

Min-Max Scaling
Standard Scaling

New cards

Min-Max Scaling Formula

x: value to be scaled

min: minimum value of the feature

max:maximum value of the feature

New cards

KNN: Majority Vote

After getting nearest points, label of majority is predicted label of new instance

EX. 2 Blue, 1 Red

Majority Label: Blue

New cards

Why do we usually choose an odd value for k in KNN?

To reduce the chances of a tie when determining the majority vote among nearest neighbors

New cards

True or False: Ties still occur in KNN even if k is odd?

True — Ties can still happen if two or more points are equally distant from the new instance, increasing the number of nearest points considered.

New cards

Give an example of when a tie might still occur in KNN even with an odd k.

If k = 3 and two points share the same distance for the 3rd nearest neighbor (e.g., both 1.50 units away), you’d effectively have 4 nearest points — causing a tie.

New cards

What can we do if ties cannot be completely avoided in KNN?

Apply tie-breaking methods

Random label choice
Consulting a domain expert
Comparing the next nearest instance
Weighting closer instances more heavily

New cards

How do we know which k to use?

Train multiple kNN models with different values of k
Apply evaluation methods (e.g. accuracy) on each model and pick the one that performs best

New cards

What factors affect how long KNN takes to predict a label for one instance?

Each prediction requires multiple distance computations
- The number of instances in the training set

Each distance computation requires multiple operations
- Proportional to the number of features

New cards

In KNN, what do the “operations” in distance computation depend on?

The number of features — each feature requires operations like squaring, adding, etc.

New cards

kNN’s time complexity for one new instance

O(mn)

m: number of features
n: number of instances (training set)

New cards

Questions to Ask/Answer when making a decision tree

How do we branch and determine terminal node?
How do we choose which features to use for branching?

New cards

GINI Index: How do we choose which feature to use?

For each feature, compute Gini index for each of its categories
Compute for the weighted average of the feature’s Gini indices
Select the feature with smallest (weighted average) Gini index

New cards

GINI Index Steps

Identify target label (outcome you’re trying to predict)
List all features (attributes)
Compute GINI for each category of a feature
Compute Weighted Average GINI for the feature
Repeat for other features
Choose feature with lowest Weighted Gini
Repeat process for each branch

New cards

What to do with new groups?

Leaf (terminal node)
- All instances in the group have the same label
- E.g. all yes or all no
Branch
- Instances have mixed labels
- e.g. mix of yes and no
- Considered as a new group to split

New cards

GINI Index Formula

GINI = 1- (j₁/ total of t)² - (j₂/ total of t)²

t : node (e.g. the category like none/some/full for Patrons)

j : class (e.g. the label like Yes/No for Will Wait)

p ( j | t ) : relative frequency of the class in the group

<p><strong>GINI = 1- (j<sub>1 </sub>/ total of t)<sup>2</sup> - (j<sub>2 </sub>/ total of t)<sup>2</sup></strong></p><p><span style="background-color: transparent;">t : node (e.g. the category like none/some/full for Patrons)</span></p><p><span style="background-color: transparent;">j : class (e.g. the label like Yes/No for Will Wait)</span></p><p><span style="background-color: transparent;">p ( j | t ) : relative frequency of the class in the group</span></p>

New cards

How do we compute for the weighted average of the feature’s Gini indices?

multiply GINI(category) by:

total # in category / total # in all categories

Then sum all values to get the weighted average
Aka GINI split

<ul><li><p><span style="background-color: transparent;">multiply GINI(category) by:</span></p></li></ul><p style="text-align: center;"><strong>total # in category / total # in all categories</strong></p><ul><li><p><span style="background-color: transparent;">Then sum all values to get the weighted average</span></p></li><li><p><span style="background-color: transparent;">Aka GINI split</span></p></li></ul><p></p>

New cards

In a decision tree, number of branches equals to what?

number of categories

New cards

Clustering

Type of unsupervised learning (unlabeled data)
Learn patterns and derive groups (clusters) of similar instances

Each unlabeled instance is assigned to a cluster

New cards

Examples of clustering methods

K-Means clustering
Hierarchical clustering
Gaussian mixture models
Spectral clustering

New cards

How do we evaluate clustering results?

Inertia (elbow method)
Silhouette score

New cards

K-Means Clustering

Intuition: instances in the same cluster should be close to each other

Specify k: target number of clusters
Select k random instances as centroids of their clusters
Repeat
- Assign each instance to the cluster of the closest centroid
- For each cluster, get the mean for each feature and set the new centroid
Stop when cluster memberships stop changing

New cards

Hierarchical Clustering

Dendrogram
Instances = terminal nodes
Branches connect nodes/subgroups at different levels
Clusters can be derived from a dendrogram

New cards

Dendrogram

Tree diagram depicting closeness through its branches
Branches show which instances/groups are close to each other
Derive clusters from generating a dendrogram

New cards

2 ways to generate a dendrogram

Agglomerative clustering: bottom up
1. Start with treating each instance as one cluster
2. Repeatedly merge 2 closest clusters

Divisive clustering: top down

New cards

What can other Clustering Methods handle that K-Means cannot?

non-spherical boundaries

New cards

Other Clustering Methods

Gaussian Mixture Models
Spectral Clustering

New cards

Gaussian Mixture Models

Soft assignment since there’s a probability (?)
Learns a pattern from the dataset so it can make different gaussian models from the different probabilities
Covariance, not just means, determine the shape

New cards

Spectral Clustering

More based on whether it’s connected
Affinity (closest pair wise) and degree of connection to the next value
Graph-based machine learning technique that uses the eigenvectors of a graph's Laplacian matrix to find clusters within data, especially for non-convex shapes

New cards

Clustering Evaluation: How do we know if a clustering result is good?

Base it on

Homogeneity
Heterogeneity

New cards

Homogeneity

How similar the instances are within a cluster
More homogenous = more similar
Points in a cluster should ideally be similar to each other

New cards

Heterogeneity

How different the instances are across different clusters
More heterogeneous = more different
The further points are from one cluster to another, the better

New cards

How do we ensure best quality of clustering?

Try clustering methods with different k values and get the clustering with the best quality

New cards

Inertia

Measures homogeneity (within cluster sum of squares)
Elbow method to estimate best k
within-cluster sum of squares of distances
You want to keep inertia low

New cards

Silhouette score

Measures and weighs both homogeneity and heterogeneity
Compute a score for every instance, and then get the average
- Homogeneity score
- Heterogeneity score
- Combining the homogeneity and heterogeneity scores

New cards

Steps to Compute Inertia

For each cluster, determine the cluster centroid
For each instance, square its distance from its cluster centroid
Sum all these squares

New cards

Inertia Formula

C: cluster

i: instance

d(i, μ(C)): distance between i and cluster C’s centroid

New cards

Inertia: Elbow Method

Used when picking between multiple k-means clusterings (different values of k)
Provides a balance between low k and low inertia
Process involves calculating a metric called inertia (AKA Within-Cluster Sum of Squares, or WCSS) and plotting it.

New cards

Inflection point

Tangent line forms an angle closest to 45 degrees
Determines the best k
In practice, we often just estimate from the shape of the graph

New cards

Inertia: Elbow Method Steps

Perform K-Means with different k values
Get the inertia score for each final clustering
Plot the k values and inertias
- x-axis: k values
- y-axis: inertia

New cards

Silhouette Score: a(i): homogeneity score

Get the average distance between the instance and other instances within the same cluster
The smaller, the better

<ul><li><p><span style="background-color: transparent;">Get the average distance between the instance and other instances within the same cluster</span></p></li><li><p><span style="background-color: transparent;">The smaller, the better</span></p></li></ul><p></p>

100

New cards

Silhouette Score: b(i) heterogeneity score

Get the average distance between an instance and instances from its nearest neighboring cluster
The larger, the better

<ul><li><p><span style="background-color: transparent;">Get the average distance between an instance and instances from its nearest neighboring cluster</span></p></li><li><p><span style="background-color: transparent;">The larger, the better</span></p></li></ul><p></p>