Week 6 - Unsupervised learning

0.0(0)
studied byStudied by 6 people
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
Card Sorting

1/23

encourage image

There's no tags or description

Looks like no tags are added yet.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

24 Terms

1
New cards

What are 2 types of clustering?

- K-Means Clustering
- Hierarchical Clustering

2
New cards

What is K-Means Clustering?

Algorithm that organising data points into K groups of data

3
New cards

What is within-cluster variance V(Ck)?

For a cluster Ck, the average distance of the points in Ck

4
New cards

What does K-Means Clustering aim to do?

Minimise the sum of within-cluster variance V(Ck) - this means your cluster points are closer together i.e the relationship is stronger

5
New cards

K-Means Clustering algorithm

1. Select k different data points to be the initial cluster centres μ_i at t=0
2. Move to t=1
3. To form the cluster Ci at time t, find all points in the dataset where the cluster centre from t=t-1 that it's closest to is μ_i
4. Recalculate cluster centres for time t
5. If clusters have changed, repeat step 2 with time t=t+1, otherwise, return the resultant clusters

6
New cards

How do you recalculate cluster centres?

Find the average of all points in the cluster

7
New cards

Formula for recalculating cluster centre

(sum of points) / (number of points in the cluster)

8
New cards

How do we define distance between two points?

Metric

9
New cards

What are 2 types of distance metrics?

- Euclidean
- Manhattan

10
New cards

How do you calculate the Euclidean distance metric?

Given two points p1 and p2 in the Dth dimension, for every i in D, the root of the sum of (p_1i - p_2i)^2

<p>Given two points p1 and p2 in the Dth dimension, for every i in D, the root of the sum of (p_1i - p_2i)^2</p>
11
New cards

How do you calculate the Manhattan distance metric?

Given two points p1 and p2 in the Dth dimension, for every i in D, the sum of the magnitude of p_1i - p_2i

<p>Given two points p1 and p2 in the Dth dimension, for every i in D, the sum of the magnitude of p_1i - p_2i</p>
12
New cards

What is Hierarchical Clustering?

Algorithm that progressively reduces the number of clusters until we reach the desired number of clusters

13
New cards

Hierarchical Clustering algorithm

1. Start with K = n clusters, where n = number of data points (each point is its own cluster)
2. At time t, select the closest pair of clusters and merge them into one
3. If all points are in one cluster, stop, otherwise, repeat step 2

14
New cards

How do we define distance between two clusters?

Linkage criterion

15
New cards

What are the 3 types of linkage criterion?

- Single-Linkage Criterion
- Complete-Linkage Criterion
- Average-Linkage Criterion

16
New cards

Single-Linkage Criterion

Defines distance between two clusters as the shortest distance between points from one cluster to another

<p>Defines distance between two clusters as the shortest distance between points from one cluster to another</p>
17
New cards

Complete-Linkage Criterion

Defines distance between two clusters as the largest distance between points from one cluster to another

<p>Defines distance between two clusters as the largest distance between points from one cluster to another</p>
18
New cards

Average-Linkage Criterion

Defines distance between two clusters as the average distance between points from one cluster to another

<p>Defines distance between two clusters as the average distance between points from one cluster to another</p>
19
New cards

What are dendograms?

Visualisation of the entire hierarchical clustering process

<p>Visualisation of the entire hierarchical clustering process</p>
20
New cards

What 2 things does a dendogram show?

- Which points are clustered
- End result at each clustering stage

21
New cards

What does the x-axis of a dendogram show?

Initial clusters/points

22
New cards

What does the y-axis of a dendogram show?

Distance at split/merge of clusters

23
New cards

How do we visualise joining of clusters?

Draw a vertical line

24
New cards

What does a longer vertical line on a dendogram represent?

Two clusters are well-separated