Unsupervised Learning

0.0(0)

Studied by 1 person

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Card Sorting

1/17

There's no tags or description

Looks like no tags are added yet.

Study Analytics

Name	Mastery	Learn	Test	Matching	Spaced

No study sessions yet.

18 Terms

New cards

Labeled observations

Each observation is a tuple (x,y) of feature vector x and output label y which are related according to an unknown function f(x) = y

New cards

Supervised Learning

* Labeled observations: Each observation is a tuple (x,y) of feature vector x and output label y which are related according to an unknown function f(x) = y
* During training: Learn the relationship between x and y i.e., find a function (or model) ℎ(x) that best fits the observations
* Goal: Learned model accurately predicts the output label of a previously unseen, test feature input (generalization)
* Labels : ‘Teachers’ during training, and ‘validator’ of results during testing

New cards

Unsupervised Learning

Unlabeled data set of feature vectors

New cards

What can unsupervised learning deduce from classification data?

find sub-groups (or clusters) among observations with similar traits (clustering)

New cards

What can unsupervised learning deduce from regression data?

find patterns within feature vector to identify a lower dimensional representation (dimensionality reduction)

New cards

Challenges of unsupervised learning

* No simple goal as in supervised learning
* Validation of results is subjective
* Often more used in exploratory data analysis

New cards

Advantages of unsupervised learning

* Labeled data expensive and difficult to collect; unlabeled data cheap and abundant
* Compressed representation saves on storage and computation
* Reduce noise, irrelevant attributes in high dimensional data
* Pre-processing step for supervised learning

New cards

What is clustering?

New cards

What are clusters formed based on?

Clustering is subjective: clusters are formed based on a user-specified measure of similarity that depends on domain knowledge.

New cards

what tyoe of classification is clustering

unsupervised learning - since labels are derived only from the observations

New cards

Normalization of Feature Vectors

so that no data is more weighted than the others. All data is between 0 and 1 - drawback is that its sensitive to outliers

New cards

Z-score standardization

all feature attributes have mean 0 and standard deviation 1. - drawback - not bounded range

New cards

Types of Clustering Algorithms

* Partitional
* Hierarchical
* Model-Based

New cards

Partitional Clustering

* Generates a single partition of the data to recover natural clusters
* Input: Feature vectors
* Examples: K-means, K-medoids
* Goal: assign N observations into K (K

New cards

Hierarchical Clustering

* Generates a sequence of nested partitions
* Input: Distance Matrix
* Example: agglomerative clustering, divisive clustering

New cards

Model-Based Clustering

* Assumes that data is generated i.i.d. from a mixture of distributions, each of which determines a different cluster
* Example: Expectation-Maximization (EM)

New cards

Measure of intra-cluster similarity

New cards

Dissimilarity within a clustering structure C