Unsupervised Learning

0.0(0)
studied byStudied by 1 person
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
Card Sorting

1/17

encourage image

There's no tags or description

Looks like no tags are added yet.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

18 Terms

1
New cards
Labeled observations
Each observation is a tuple (x,y) of feature vector x and output label y which are related according to an unknown function f(x) = y
2
New cards
Supervised Learning
* Labeled observations: Each observation is a tuple (x,y) of feature vector x and output label y which are related according to an unknown function f(x) = y
* During training: Learn the relationship between x and y i.e., find a function (or model) ℎ(x) that best fits the observations
* Goal: Learned model accurately predicts the output label of a previously unseen, test feature input (generalization)
* Labels : ‘Teachers’ during training, and ‘validator’ of results during testing
3
New cards
Unsupervised Learning
Unlabeled data set of feature vectors
Unlabeled data set of feature vectors
4
New cards
What can unsupervised learning deduce from classification data?
find sub-groups (or clusters) among observations with similar traits (clustering)
find sub-groups (or clusters) among observations  with similar traits (clustering)
5
New cards
What can unsupervised learning deduce from regression data?
find patterns within feature vector to identify a lower dimensional representation (dimensionality reduction)
find patterns within feature vector to identify a  lower dimensional representation  (dimensionality reduction)
6
New cards
Challenges of unsupervised learning
* No simple goal as in supervised learning
* Validation of results is subjective
* Often more used in exploratory data analysis
7
New cards
Advantages of unsupervised learning
* Labeled data expensive and difficult to collect; unlabeled data cheap and abundant
* Compressed representation saves on storage and computation
* Reduce noise, irrelevant attributes in high dimensional data
* Pre-processing step for supervised learning
8
New cards
What is clustering?
knowt flashcard image
9
New cards
What are clusters formed based on?
Clustering is subjective: clusters are formed based on a user-specified measure of similarity that depends on domain knowledge.
10
New cards
what tyoe of classification is clustering
unsupervised learning - since labels are derived only from the observations
11
New cards
Normalization of Feature Vectors
so that no data is more weighted than the others. All data is between 0 and 1 - drawback is that its sensitive to outliers
so that no data is more weighted than the others. All data is between 0 and 1 - drawback is that its sensitive to outliers
12
New cards
Z-score standardization
all feature attributes have mean 0 and standard deviation 1. - drawback - not bounded range
all feature attributes have mean 0 and standard deviation 1. - drawback - not bounded range
13
New cards
Types of Clustering Algorithms
* Partitional
* Hierarchical
* Model-Based
14
New cards
Partitional Clustering
* Generates a single partition of the data to recover natural clusters
* Input: Feature vectors
* Examples: K-means, K-medoids
* Goal: assign N observations into K (K
15
New cards
Hierarchical Clustering
* Generates a sequence of nested partitions
* Input: Distance Matrix
* Example: agglomerative clustering, divisive clustering
16
New cards
Model-Based Clustering
* Assumes that data is generated i.i.d. from a mixture of distributions, each of which determines a different cluster
* Example: Expectation-Maximization (EM)
17
New cards
Measure of intra-cluster similarity
knowt flashcard image
18
New cards
Dissimilarity within a clustering structure C
knowt flashcard image