Week 6 - Unsupervised learning

0.0(0)
studied byStudied by 0 people
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
Card Sorting

1/12

encourage image

There's no tags or description

Looks like no tags are added yet.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

13 Terms

1
New cards

Within-cluster variance

The Euclidean distance to define for a cluster C_k of points p ∈ R^D

<p>The Euclidean distance to define for a cluster C_k of points p ∈ R^D</p>
2
New cards

How to calculate the within-cluster variance?

(1/size of the cluster)*the sum of the square of distances between every distinct pair of points in our cluster

3
New cards

Why do we use the within-cluster variance?

To create a cluster assignment with as low variance as possible

4
New cards

K-means clustering algorithm steps

1. Select K different data points to be the initial cluster centre at time t = 0
2. At time t, assign each data point to the cluster C_k^(t) with the closest cluster centre μ_k^(t - 1)
3. For each cluster, recalculate their cluster centres as the average of all points in the cluster
4. If the clusters have changed go to step 2 movie time to t + 1. Else, stop and return final clusters

5
New cards

Hierarchical clustering

A set of nested clusters organized as a hierarchical tree

<p>A set of nested clusters organized as a hierarchical tree</p>
6
New cards

Linkage criterion

Defining the distance between two clusters

7
New cards

Metric

Defining the distance between two points

8
New cards

Manhattan distance metric

The sum of all real distances between two points

<p>The sum of all real distances between two points</p>
9
New cards

Euclidean distance metric

The shortest path between two points

<p>The shortest path between two points</p>
10
New cards

Single-linkage criterion

The minimum (best case) of distances between points from one cluster to another

11
New cards

Complete-linkage criterion

The maximum (worst case) of distances between points from one cluster to another

12
New cards

Average-linkage criterion

The average distance between points from one cluster to another

13
New cards

Dendograms

Diagrams that quickly visualise the entire hierarchical clustering process