SECT 5

0.0(0)
Studied by 0 people
call kaiCall Kai
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
GameKnowt Play
Card Sorting

1/69

encourage image

There's no tags or description

Looks like no tags are added yet.

Last updated 12:41 PM on 3/13/26
Name
Mastery
Learn
Test
Matching
Spaced
Call with Kai

No analytics yet

Send a link to your students to track their progress

70 Terms

1
New cards

Unsupervised Learning

  • Process of analyzing data without pre-classified labels in order to uncover hidden meaning and structure.

  • Algorithm creates and designs labels based on similarity

2
New cards

Purpose of Unsupervised learning

  • basis for exploration, interpretation and supervised learning

  • discover meaning and structure of the data

3
New cards

Clustering

process of grouping unlabeled data where similiar ones are in one cluster and dissimilar in another

4
New cards

Purpose of Clustering

  • Data Understanding

  • Class Identification

  • Outlier and Noise detection

5
New cards

Clustering Methods

  • Partition based

  • Hierarchical Based

  • Density based

6
New cards

Partition Based

Divide data in k groups based on similarity

7
New cards

K means Clustering Idea

  • Each point is assigned to cluster with nearest centroid

  • each cluster is represented by a centroid

  • Centroids values update till their assignments stabilize

8
New cards

K means Setup

  • K value i.e no. of clusters must be defined

  • Centroids must be initialized i.e can be random pts, diff values result differently

9
New cards

K means Algo

works best for compact, well-defined clusters

<p>works best for compact, well-defined clusters</p>
10
New cards

K means pros

  • efficient O(tkn)

  • easy

  • widely used

11
New cards

K means cons

  • sensitive to outlier and noise

  • must define k

  • assumes convex and spherical clusters

  • may converge to local optimum

  • requires a well-defined mean

12
New cards

k medoids idea

  • each cluster is centered around a medoid (most central point)

  • points are assigned to nearest medoid

  • each iteration chooses medoid as most representative point

13
New cards

k medoids over k means

  • medoids are actual data points

  • robust to noise n outliers

14
New cards

k medoid algo

useful when robustness and interpretability are more imp than speed

<p>useful when robustness and interpretability are more imp than speed</p>
15
New cards

k medoids pros

  • easy to implement

  • medoids are real datapoints

  • works with any distance measure

  • robust to noise n outliers

16
New cards

k medoids cons

  • may converge to local optimum

  • must predefine k

  • assumes convex n spherical clusters

  • computationally more expensive

17
New cards

PAM (Partitioning Around Medoids)

  • classic k medoids algo

  • for medoid selection uses systematic swap

  • more robust but costly

18
New cards

CLARA(Clustering LARge Applications)

  • runs PAM on a sample

  • reduces runtime but quality depends on sample

19
New cards

CLARANS(Clustering Large Applications upon RANdomized Search)

  • Randomized version of PAM

  • only on a subset of possible swaps

  • balances efficiency and quality

20
New cards

Commonalities of k means n k medoids

belongingness to a cluster is dependent on distance to the center element, thus represents Vornoi diagram

21
New cards

Limitations of k means and k medoids

  • assums a convex partioning

  • must predefine k

22
New cards

Expectation Maximization

With incomplete data, it helps find missing values with expectation and helps to refine the model with maximization

23
New cards

Need for EM

  • provides a principled, probabilitic process

  • genereal use in clustering, Handling Missing Values, HMMs

24
New cards

EM For Clustering Idea

  • Each cluster is a gaussian distribution(cov, mean, weights)

  • points assigned probabailitically

  • best when clusters are non spherical and need probabilitic assignment

25
New cards

EM for clustering pros

  • captures elliptical/non-spherical clusters

  • more flexible

  • soft clustering

  • grounded in probabilitic work

26
New cards

EM Clustering Steps

  1. assign pts to cluster distributions

  2. reestimate mean n variance

<ol><li><p>assign pts to cluster distributions</p></li><li><p>reestimate mean n variance</p></li></ol><p></p>
27
New cards

EM Clustering cons

  • computationally heavy

  • must predefine k

  • sensitive to initialization and can converge to local optimum

  • assume all clusters follow same distribution

28
New cards

Silhouette coefficient

measures how well a point fits in its own cluster vs othger clusters

used for evaluating clusters

29
New cards

Silhouette Coefficient Formula

knowt flashcard image
30
New cards

Hierachical Clustering

builds a hierachy of nested clusters without defining the no. of clusters

31
New cards

Hierachical Clustering idea

  • in the beginning each pt is it’s own cluster

  • clusters are merged or split

  • produces a dendogram to cshow the different clusters

32
New cards

Types of hierachical Clustering

  • Single linkage

  • complete linkage

  • centroid linkage

33
New cards

Dendogram

  • is a tree diagram that shows the arrangement of clusters produced by hierarchlical clustering

  • a cut in the dendogram shows hoe to partition the clusters

34
New cards

Hierarchical clustering Algo

knowt flashcard image
35
New cards

Single Linkage

  • Distance is defined as the minimum distance between any pair of points

  • O(n2 )

  • sensitive to noise and outliers

  • good for detecting arbitarily shaped clusters

<ul><li><p>Distance is defined as the minimum distance between any pair of points</p></li><li><p>O(n<sup>2 </sup>)</p></li><li><p>sensitive to noise and outliers</p></li><li><p>good for detecting arbitarily shaped clusters</p></li></ul><p></p>
36
New cards

Complete Linkage

  • maximum distance between any pair of points

  • O(n2 )

  • favours compact, spherical clusters

  • still sensitive to outliers

<ul><li><p>maximum distance between any pair of points</p></li><li><p>O(n<sup>2 </sup>)</p></li><li><p>favours compact, spherical clusters</p></li><li><p>still sensitive to outliers</p></li></ul><p></p>
37
New cards

Centroid Linkage

  • Distance is defined as distance between the centroids of 2 clusters

  • O(n)

  • can produce inversions

  • considers all points

<ul><li><p>Distance is defined as distance between the centroids of 2 clusters</p></li><li><p>O(n)</p></li><li><p>can produce inversions</p></li><li><p>considers all points</p></li></ul><p></p>
38
New cards

Density Based Clustering

Clusters are defined dense regions of points separated by areas of low density

39
New cards

density based clustering idea

  • points inside a cluster are densely connected

  • sparse regions act as separators

  • noise n outliers are unassigned

40
New cards

Density based clustering advantages

  • tackles noise and outliers

  • no need to define no. of clusters

  • works for arbitary shaped clusters

41
New cards

Core object

  • An object with atleast minpoint no. of neighbours within ε

  • forms heart of the region

<ul><li><p>An object with atleast minpoint no. of neighbours within <span><span>ε</span></span></p></li><li><p><span><span>forms heart of the region</span><span><br></span></span></p></li></ul><p></p>
42
New cards

border object

lies within neighnourhood of core object but it itself has <Minpt no. of neighbours

43
New cards

Directly density reachable

a point p is directly density reachable to a point q if p lies within ε of q and q is core object

<p>a point p is directly density reachable to a point q if p lies within <span>ε of </span>q and q is core object</p><p></p>
44
New cards

density reachable

  • a point p is density reachable to point q if there exist a chain from q to p such that each point in the chain is directly density reachable pointing towards p

  • q is core object p can be a border object

45
New cards

Density connected

  • two points p and q are density connected if both are density reachable to a common core object

  • p and q can be border objects

<ul><li><p>two points p and q are density connected if both are density reachable to a common core object</p></li><li><p>p and q can be border objects</p></li><li><p></p></li></ul><p></p>
46
New cards

DBSCAN (Density BasedSpatial Clustering with Applications of Noise

  • identify core obj and find the minpoints within ε

  • grow clusters by connectly density reachable points

  • pts that are not density reachable to any core obj are labelled as noise

47
New cards

DBSCAN Algo

knowt flashcard image
48
New cards

DBSCAN Pros

  • detectsd noise n outliers

  • detects arbitary shape

  • works well for evenly dense clusters

49
New cards

DBSCAN Cons

  • Require 2 params

  • sensitive to param changes

  • only for uniformly dense

  • degrades in in performance for high dimension

50
New cards

OPTICS(Ordering Points to Identify Clustering Structure)

  • use an ordering of points based on density

  • produces a reachability plot

  • works for varying density

  • generalizes DBSCAN ; is flexible

51
New cards

Reachability plot

shows valleys for clusters and peaks for sparse regions

52
New cards

core distance

minimum ε such that point becomes a core object

53
New cards

Reachability distance

measures how far a point is from prev point

<p>measures how far a point is from prev point</p><p></p>
54
New cards

why clustering given deep learning era

  • unsupervised learning is essential:1st step of exploration, reveal structure

  • complements deep learning: used for pretraining n representation learning

  • practically applicable : faster, cheaper and simpler

55
New cards

Drawbacks of clustering

  • curse of dimensionality: distances becomes less meaningful, clusters lose separation

  • partition based: vornoi breaks down in high dim

  • hierarchical: overpowered by noise

  • density: sparseness

56
New cards

Clustering in High dimension

  • ADAPT: dimensionality reduction, subset clustering: search clusters in subsets, feature selection or weighting: reduce noise dimensions

  • However distance is still unreliable and most of it depends on preprocessing chossing

  • Solution: instead of grouping in clusters we find patterns as in association rule mining

57
New cards

Transaction Data

  • Transaction DB: set of transactions

  • each transactions contains a set of item I

  • I is the itemset i.e collection of items

58
New cards

Support

Fraction of Transactions that contain an itemset

<p>Fraction of Transactions that contain an itemset</p><p></p>
59
New cards

Frequent Itemset

has support >= minsup threshold

60
New cards

Association rule

defines relationship between 2 itsemsets in a database

61
New cards

Confidence

knowt flashcard image
62
New cards

Types of Associations

  • Single dim

  • multi dim

  • binary/boolean

  • quantitative

63
New cards

Association Rule Mining Steps

  1. Find all freq itemsets (min Sup)

  2. generate association rules based on thes(min sup, min conf)

64
New cards

Problem with no. of itemset

possible no. of freq itemsets is exponential, i.e if an itemset is freq its subsets r also freq

=> 2no. of elems in freq itemset - 1

65
New cards

Brute Force Approach

  • Each itemset has candidates for a candidate freq itemset

  • count support of each candidate by scanning database

  • => O(NMW) N= no. of transcations, M= no. of candidates, W= width of Itemset

66
New cards

Improvement on Brute force

  • Reduce no. of Candidates: Pruning

  • Reduce no. of Transcations: as it increases

  • Reduce no. of NM comparisons: Use efficient data structures for either one

67
New cards

Apriori Principle Idea

  • If itemset is freq, then all its subsets are freq

  • If itemset is infreq then superset need not be tested

68
New cards

Apriori Algo

knowt flashcard image
69
New cards

Association Rules generation

knowt flashcard image
70
New cards

Uses of Assoc Rule Mining

  • Test doc clustering

  • microarray clustering

  • market basket analysis

  • Reccomendation system

Explore top notes

note
2. Using Classes & Objects
Updated 312d ago
0.0(0)
note
Nutrition Tips
Updated 1057d ago
0.0(0)
note
Chapter 50: Behavioral Ecology
Updated 1170d ago
0.0(0)
note
Evolution!
Updated 1377d ago
0.0(0)
note
LAW W.8
Updated 453d ago
0.0(0)
note
2. Using Classes & Objects
Updated 312d ago
0.0(0)
note
Nutrition Tips
Updated 1057d ago
0.0(0)
note
Chapter 50: Behavioral Ecology
Updated 1170d ago
0.0(0)
note
Evolution!
Updated 1377d ago
0.0(0)
note
LAW W.8
Updated 453d ago
0.0(0)

Explore top flashcards

flashcards
The Live and Speak Challenge
51
Updated 546d ago
0.0(0)
flashcards
1984 Study Guide
86
Updated 672d ago
0.0(0)
flashcards
Exam 1
89
Updated 1124d ago
0.0(0)
flashcards
Particles
38
Updated 109d ago
0.0(0)
flashcards
Verbs 1 - Russian
100
Updated 754d ago
0.0(0)
flashcards
science test w6 t4
69
Updated 483d ago
0.0(0)
flashcards
The Live and Speak Challenge
51
Updated 546d ago
0.0(0)
flashcards
1984 Study Guide
86
Updated 672d ago
0.0(0)
flashcards
Exam 1
89
Updated 1124d ago
0.0(0)
flashcards
Particles
38
Updated 109d ago
0.0(0)
flashcards
Verbs 1 - Russian
100
Updated 754d ago
0.0(0)
flashcards
science test w6 t4
69
Updated 483d ago
0.0(0)