Unsupervised Learning

5.0(1)
studied byStudied by 4 people
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
Card Sorting

1/57

flashcard set

Earn XP

Description and Tags

Complete for lecture slides, does not include reading material on association rule mining

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

58 Terms

1
New cards

what is machine learning

the ability to simulate human learning - learning from past behaviours, past historical data

2
New cards

what are the types of machine learning

supervised learning
unsupervised learning
reinforcement learning

3
New cards

what is unsupervised learning

data is unlabelled and the algorithm tries to find patterns and relationships in the data

4
New cards

what is reinforcement learning

the algorithm interacts with the environment by taking actions to maximise rewards and minimise punishment signals (not labelled output)

5
New cards

what are the types of unsupervised learning methods

clustering
dimensionality reduction
association rules

6
New cards

what is dimensionality reduction

(e.g. PCA) - reduce the number of inputs to a smaller number for a system

7
New cards

what is clustering

to partition a data set into subsets (clusters), so that the data in each cluster share some common trait - often similarity or proximity for some defined distance measure.

most commonly used unsupervised learning method

8
New cards

what is a cluster

a collection of objects similar between them, and dissimilar to objects belonging to another cluster

<p>a collection of objects similar between them, and dissimilar to objects belonging to another cluster</p>
9
New cards

what are patterns

a physical object, or a data point

<p>a physical object, or a data point</p>
10
New cards

how are patterns represented

as feature vectors: a d-dimensional array where each dimension/element in the array is a feature

<p>as feature vectors: a d-dimensional array where each dimension/element in the array is a feature</p>
11
New cards

what are features

an attribute of an object

<p>an attribute of an object</p>
12
New cards

what are the uses of clustering

  • social networks - Facebook friends for who communicates the most for marketing, terror networks, allocation of company resources

  • customer segmentation - clustering supermarket customers to target ads

  • gene network - understanding gene interactions, identifying genes linked to disease | human genome project in 90s causing clustering boom

13
New cards

how does clustering work

an algorithm is used to group data points together based on pattern similarity

14
New cards

how can we calculate pattern similarity

distance metrics (most common)

15
New cards

what are the different distance metrics

Euclidean
Manhattan
Correlation
Minkowski
Mahalanobis

16
New cards

what is Euclidean distance

the shortest distance between two points. all features have the same weight, but IRL one feature may be more relevant

17
New cards

what is Manhattan distance

the sum of the absolute differences between points across all dimensions

18
New cards

what is correlation

metric using categorical data (not relying on numeric data e.g. if X bought this, will they buy that)

19
New cards

what is Minkowski distance

a generalisation of Euclidean distance, takes into account different weighting of features

20
New cards

what is Mahalanobis distance

used in finance / fraud detection, accounts for outliers in the data and anomaly detection

21
New cards

how do you calculate Euclidean distance

Pythagoras theorem - when given a matrix, each row is a pattern, and each column is an attribute. to get the distance between 2 points, need to look at 2 rows at a time only.

<p>Pythagoras theorem - when given a matrix, each row is a pattern, and each column is an attribute. to get the distance between 2 points, need to look at 2 rows at a time only.  </p>
22
New cards

how do you calculate Manhattan distance

calculate the difference between each feature and sum them together

<p>calculate the difference between each feature and sum them together </p>
23
New cards

how do we choose a similarity metric

  • application dependant

  • determine whether shape / distance / scale as more important metric

24
New cards

what are the different clustering algorithms

  • k means clustering

  • hierarchical clustering

  • fuzzy clustering

  • DBSCAN

25
New cards

how does k-means clustering work

  1. Place K points into the feature space. These points represent initial cluster centroids.

  2. Assign each pattern to the closest cluster centroid.

  3. When all objects have been assigned, recalculate the positions of the K centroids.

  4. Repeat Steps 2 and 3 until the assignments do not change.

26
New cards

what are the pros of k-means clustering

  • if K is small, computationally faster than hierarchical clustering

  • may produce tighter clusters than hierarchical, especially if clusters are globular (naturally round globs in the data space)

27
New cards

what are the cons of k-means clustering

  • requires k to be known - hard to determine what K should be

  • the randomly selected start centroids has impact meaning different initial partitions can result in different final clusters

    • Does the algorithm converge to the same results with different selections of initial cluster centroids? If not, what should we do in practice?

      • for real world dataset - usually not

  • potential to create empty clusters (not always bad)

  • hard assignment - data points can only be assigned to 1 cluster

    • in real life applications might not always have sharp boundary between clusters

  • does not work well with elongated or different variation clusters that are non-globular

<ul><li><p>requires k to be known - hard to determine what K should be</p></li><li><p>the randomly selected start centroids has impact meaning different initial partitions can result in different final clusters</p><ul><li><p>Does the algorithm converge to the same results with different selections of initial cluster centroids? If not, what should we do in practice?</p><ul><li><p>for real world dataset - usually not</p></li></ul></li></ul></li><li><p>potential to create empty clusters (not always bad)</p></li><li><p>hard assignment - data points can only be assigned to 1 cluster</p><ul><li><p>in real life applications might not always have sharp boundary between clusters</p></li></ul></li><li><p>does not work well with elongated or different variation clusters that are non-globular</p></li></ul><p></p>
28
New cards

what is hierarchical clustering (Agglomerative)

Agglomerative = bottom-up approach

creates a hierarchy of clusters where each pattern is originally in its own cluster, and intermediate clusters are created before finally merging into 1 cluster - producing a dendrogram as the output. It doesn’t need k to be defined.

29
New cards

what is a dendrogram

a tree structure that shows the hierarchical relationship between objects

30
New cards

how is a dendrogram used

shows us all the possible clusters as opposed to a specific set of clusters

  • cutting the tree creates clusters (any number of clusters, just filter the output to however many clusters is needed, no need to rerun outcome)

    • cutting higher up - less clusters

    • cutting lower - more clusters

<p>shows us all the possible clusters as opposed to a specific set of clusters </p><ul><li><p>cutting the tree creates clusters (any number of clusters, just filter the output to however many clusters is needed, no need to rerun outcome)</p><ul><li><p>cutting higher up - less clusters</p></li><li><p>cutting lower - more clusters</p></li></ul></li></ul><p></p>
31
New cards

how does the hierarchical clustering algorithm work?

*requires computing the distance between every cluster and every other cluster

  1. Each item is assigned to its own cluster (n clusters of size one)

  2. Let the distances between the clusters equal the distances between the objects they contain

  3. Find the closest pair of clusters and merge them into a single cluster (one less cluster)

  4. Re-compute the distances between the new cluster and each of the old clusters

  5. Repeat steps 3 and 4 until there is only one cluster left

32
New cards

how do we re-compute distances between clusters

  • when a cluster contain only 1 initial data point

    • use Euclidean distance for initial data points

  • when a cluster contain more than 1 data point

    • single / complete / average linkage

    • Ward, McQuitty, Median, Centroid metrics

33
New cards

what is single linkage

The smallest distance between any pairs from the two clusters (one from each) being compared/measured

<p>The smallest distance between any pairs from the two clusters (one from each) being compared/measured</p>
34
New cards

what is complete linkage

the largest distance between two pairs from the two clusters (one from each) being measured

<p>the largest distance between two pairs from the two clusters (one from each) being measured</p>
35
New cards

what is average linkage

the average distance between pairs

<p>the average distance between pairs</p>
36
New cards

what are the pros of HC

  • Can produce an ordering of the objects, which may be informative for data display.

  • Smaller clusters are generated, which may be helpful for discovery.

37
New cards

what are the cons of HC

  • No provision can be made for a relocation of objects that may have been 'incorrectly' grouped at an early stage because unlike k-means, clusters do not change (local greedy search)

  • Use of different distance metrics for measuring distances between clusters may generate different results.

38
New cards

what is fuzzy c-means clustering

fuzzification of k-means where data points do not have a hard assignment / sharp boundary between clusters

39
New cards

how does fuzzy clustering work

  • an item’s cluster membership is a weight between 0 and 1 (instead of 1 or 0 to show membership or non-member)

  • distance to a centroid is multiplied by membership weight

<ul><li><p>an item’s cluster membership is a weight between 0 and 1 (instead of 1 or 0 to show membership or non-member)</p></li><li><p>distance to a centroid is multiplied by membership weight</p></li></ul><p></p>
40
New cards

what is DBSCAN

a density based clustering algorithm

  • divides the data set into dense regions (considered clusters) separated by sparse regions

41
New cards

what parameters are needed for DBSCAN

  • MinPts - specifies the minimum number of points required to form a dense region

  • Epsilon - defines the radius of the neighbourhood around a given core point, the maximum distance between two points for them to be considered part of the same neighbourhood/impacts size of the circle searching for points

  • density - number of points within a specific radius

  • core point - the interior of a cluster

42
New cards

what is DBSCAN used for

clustering regions in an image

<p>clustering regions in an image </p>
43
New cards

how does DBSCAN work

  1. randomly picks a point in the space as center

  2. draws the circle around the center point and checks num of points within (density)

  3. If data points were found, use boarder point as center and redraw radius to neighbouring points

  4. when it can no longer expand, choose a new starting point at an unclustered data point

  5. Clusters can only be created if they have > minPts

44
New cards

how can we evaluate cluster quality

  • using different metrics for different methods

    • cohesion

    • separation

  • using a supervised approach

45
New cards

what is cohesion

similarity within a cluster

<p>similarity within a cluster</p>
46
New cards

what is separation

distance between clusters

<p>distance between clusters</p>
47
New cards

how can we use a supervised approach

  • when we know something about the ‘true’ clusters, we compare the clustering arrangement to the true clustering arrangement to determine how good the output clusters are.

  • used when developing and testing new clustering algorithms

48
New cards

how can we compare clustering arrangements

  • do human verification of clusters for low dimensional data using guesses or expert knowledge

  • using metrics like weighted-kappa

49
New cards

what is weighted-kappa

using 2 different clustering algorithms and seeing how similar they both are, good kappa score means more similar algorithms

50
New cards

how does weighted-kappa work

  • uses a 2x2 contingency table where all unique pairs of genes are compared and considered (n(n-1)2 comparisons)

  • for each pair, have they been clusters in 1 arrangement, and have the same pair been clustered in the 2nd arrangement (we want pairs clustered in the expert arrangement to also be clustered in the new arrangement and vice versa)

can convert the contingency table to a number - agreement strength

<ul><li><p>uses a 2x2 contingency table where all unique pairs of genes are compared and considered (n(n-1)2 comparisons)</p></li><li><p>for each pair, have they been clusters in 1 arrangement, and have the same pair been clustered in the 2nd arrangement (we want pairs clustered in the expert arrangement to also be clustered in the new arrangement and vice versa)</p></li></ul><p>can convert the contingency table to a number - agreement strength</p>
51
New cards

what is the outcome of comparing clusters

similar clustering arrangements = more consistent clustering method

<p>similar clustering arrangements = more consistent clustering method</p>
52
New cards

what is association rule mining

methodology to discover relationships between large data sets in the form of association rules, developed for shopping basket data

53
New cards

what are association rules

the if-then statements that shows the probability of a relationship between items in a large data set e.g. relationship between different items being bought in a single transaction

<p>the if-then statements that shows the probability of a relationship between items in a large data set e.g. relationship between different items being bought in a single transaction </p>
54
New cards

what are the association rule mining parameters

  • support

  • confidence

  • lift

don’t need to know formula - just understand how it works

55
New cards

what is support

how often a rule is applicable to a given data set

Freq(x,y) / N

<p>how often a rule is applicable to a given data set</p><p class="has-focus">Freq(x,y) / N</p>
56
New cards

what is confidence

  • determines how frequently items Y appear in transactions that contain X

    • measures reliability of the inference made by a rule, estimate of conditional probability

Freq(x,y) / freq(x)

Freq = support

<ul><li><p>determines how frequently items Y appear in transactions that contain X</p><ul><li><p>measures reliability of the inference made by a rule, estimate of conditional probability</p></li></ul></li></ul><p>Freq(x,y) / freq(x) </p><p class="has-focus">Freq = support</p>
57
New cards

what is lift

How much more frequency the Y item is found in transactions with X compared to transactions with Y only

Confidence(x,y) / support(y)

E.g. people that buy X are Z times more likely to buy Y

58
New cards

what are the applications of association rule mining

  • relates easy interpretation and actionable insights

    • e.g. Spotify - users that listened to X songs might also enjoy Y songs

    • e.g. placing items in a certain way in a store to maximise the rules placing items closer together

    • middle rules - surprising rules like nappies + beer

    • recommender systems for Netflix or amazon