1/23
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
What are 2 types of clustering?
- K-Means Clustering
- Hierarchical Clustering
What is K-Means Clustering?
Algorithm that organising data points into K groups of data
What is within-cluster variance V(Ck)?
For a cluster Ck, the average distance of the points in Ck
What does K-Means Clustering aim to do?
Minimise the sum of within-cluster variance V(Ck) - this means your cluster points are closer together i.e the relationship is stronger
K-Means Clustering algorithm
1. Select k different data points to be the initial cluster centres μ_i at t=0
2. Move to t=1
3. To form the cluster Ci at time t, find all points in the dataset where the cluster centre from t=t-1 that it's closest to is μ_i
4. Recalculate cluster centres for time t
5. If clusters have changed, repeat step 2 with time t=t+1, otherwise, return the resultant clusters
How do you recalculate cluster centres?
Find the average of all points in the cluster
Formula for recalculating cluster centre
(sum of points) / (number of points in the cluster)
How do we define distance between two points?
Metric
What are 2 types of distance metrics?
- Euclidean
- Manhattan
How do you calculate the Euclidean distance metric?
Given two points p1 and p2 in the Dth dimension, for every i in D, the root of the sum of (p_1i - p_2i)^2
How do you calculate the Manhattan distance metric?
Given two points p1 and p2 in the Dth dimension, for every i in D, the sum of the magnitude of p_1i - p_2i
What is Hierarchical Clustering?
Algorithm that progressively reduces the number of clusters until we reach the desired number of clusters
Hierarchical Clustering algorithm
1. Start with K = n clusters, where n = number of data points (each point is its own cluster)
2. At time t, select the closest pair of clusters and merge them into one
3. If all points are in one cluster, stop, otherwise, repeat step 2
How do we define distance between two clusters?
Linkage criterion
What are the 3 types of linkage criterion?
- Single-Linkage Criterion
- Complete-Linkage Criterion
- Average-Linkage Criterion
Single-Linkage Criterion
Defines distance between two clusters as the shortest distance between points from one cluster to another
Complete-Linkage Criterion
Defines distance between two clusters as the largest distance between points from one cluster to another
Average-Linkage Criterion
Defines distance between two clusters as the average distance between points from one cluster to another
What are dendograms?
Visualisation of the entire hierarchical clustering process
What 2 things does a dendogram show?
- Which points are clustered
- End result at each clustering stage
What does the x-axis of a dendogram show?
Initial clusters/points
What does the y-axis of a dendogram show?
Distance at split/merge of clusters
How do we visualise joining of clusters?
Draw a vertical line
What does a longer vertical line on a dendogram represent?
Two clusters are well-separated