Unsupervised Learning and Clustering Techniques

0.0(0)

Studied by 0 people

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Card Sorting

1/89

There's no tags or description

Looks like no tags are added yet.

Study Analytics

Name	Mastery	Learn	Test	Matching	Spaced

No study sessions yet.

90 Terms

New cards

Unsupervised Learning

Analyzing data without labeled response variables.

New cards

Supervised Learning

Predicting outcomes using features and response variables.

New cards

Clustering

Grouping similar observations into distinct subgroups.

New cards

Hidden Structures

Unseen patterns within the data set.

New cards

Distance Measure

Metric to quantify similarity between data points.

New cards

Euclidean Distance

Straight-line distance between two points in space.

New cards

Manhattan Distance

Distance measured along axes at right angles.

New cards

Cosine Similarity

Cosine of the angle between two vectors.

New cards

High-dimensional Space

Data represented by vectors of large size.

New cards

Subgroups Discovery

Identifying distinct groups within a dataset.

New cards

Data Visualization

Representing data to reveal patterns or insights.

New cards

Subjectivity in Analysis

Interpretation varies based on data context.

New cards

Gene Expression Measurements

Data used to group breast cancer patients.

New cards

Shoppers Characterization

Grouping shoppers by browsing and purchase history.

New cards

Movie Ratings Clustering

Grouping movies based on viewer ratings.

New cards

Clustering Problem Setup

Grouping points based on proximity in data.

New cards

Similarity Definition

Criteria for determining observation closeness.

New cards

Sky Objects Catalog

2 billion objects characterized by 7 radiation dimensions.

New cards

Cluster Members

Observations within a cluster are similar.

New cards

Dissimilar Clusters

Members of different clusters are not alike.

New cards

Domain-specific Consideration

Knowledge-based criteria for similarity assessment.

New cards

Broad Class of Methods

Various techniques for subgroup discovery in data.

New cards

Cosine Similarity

Ranges from -1 (opposite) to 1 (same)

New cards

Orthogonality

Indicates zero similarity between vectors

New cards

Cosine Distance

Calculated as 1 minus cosine similarity

New cards

Sparse Data

Data with many zero values, often binary

New cards

Jaccard Similarity

Measures similarity between finite sample sets

New cards

High Distance

Indicates low similarity between points

New cards

Low Distance

Indicates high similarity between points

New cards

Hierarchical Clustering

Clusters formed through agglomerative or divisive methods

New cards

Agglomerative Clustering

Combines nearest clusters into one cluster

New cards

Divisive Clustering

Starts with one cluster and splits recursively

New cards

K-means Clustering

Partitions data into K distinct, non-overlapping clusters

New cards

Centroid

Average of all data points in a cluster

New cards

Euclidean Distance

Assumed distance metric in K-means clustering

New cards

Initial Cluster Assignment

Randomly assign points to clusters or select centroids

New cards

Convergence in K-means

No points move between clusters, centroids stabilize

New cards

Cluster Assignment Process

Assign points to nearest centroid iteratively

New cards

Random Initialization Effect

Random selection can lead to different clustering results

New cards

Compact Clusters

Clusters with smallest distances within themselves

New cards

Selecting K

Determining the optimal number of clusters

New cards

Iterative Process

Reassign points and update centroids until stable

New cards

K-means Algorithm

Iteratively assigns points to clusters based on centroids

New cards

Distance Metric

Measures similarity or dissimilarity between data points

New cards

K-means clustering

Requires pre-specifying the number of clusters K.

New cards

Silhouette score

Measures cluster cohesion versus separation.

New cards

Silhouette range

Values range from -1 to +1.

New cards

High silhouette value

Indicates good cluster matching.

New cards

Hierarchical clustering

Does not require pre-defined number of clusters.

New cards

Agglomerative clustering

Bottom-up approach to cluster merging.

New cards

Dendrogram

Visual representation of hierarchical clustering.

New cards

Centroid

Average location of points in a cluster.

New cards

Cluster merging

Repeatedly combine nearest clusters until stopping criterion.

New cards

Euclidean distance

Distance measure for determining cluster nearness.

New cards

Key operation

Combine two nearest clusters iteratively.

New cards

Stopping criterion

Condition to end the clustering process.

New cards

Cluster representation

Location of clusters determined by centroids.

New cards

Distance measurement

Assessing cluster proximity using centroid distances.

New cards

Data point

Individual observation in a clustering dataset.

New cards

Centroid example

Average of data points in a cluster.

New cards

Fusion height

Indicates similarity of merged observations.

New cards

Bottom of dendrogram

Indicates high similarity between observations.

New cards

Top of dendrogram

Indicates low similarity between observations.

New cards

Preventing feature dominance

Solutions to avoid skewed clustering results.

New cards

Cluster analysis

Examining data points grouped into clusters.

New cards

Picking k

Choosing the optimal number of clusters.

New cards

Cohesion

Similarity of an object to its own cluster.

New cards

Feature Scaling

Rescaling features to a common range, e.g., [0,1].

New cards

Euclidean Distance

A common measure of similarity in clustering.

New cards

Manhattan Distance

Distance calculated as the sum of absolute differences.

New cards

Cosine Similarity

Measure of similarity based on angle between vectors.

New cards

Jaccard Index

Similarity measure for comparing sets of data.

New cards

Pearson Correlation

Statistical measure of linear correlation between variables.

New cards

Clustering

Grouping data points based on similarity.

New cards

Cluster Centroid

Average point representing a cluster's members.

New cards

Exploratory Analysis

Analyzing data to discover patterns without prior hypotheses.

New cards

Cluster Profiling

Describing and understanding characteristics of clusters.

New cards

Fresh Food Lovers

Cluster of customers favoring organic and fresh foods.

New cards

Distance Measure Choice

Selecting appropriate metric for clustering data analysis.

New cards

Data Preparation

Preprocessing data before applying clustering algorithms.

New cards

Customer Segmentation

Dividing customers into groups for targeted marketing.

New cards

Transaction History

Record of customer purchases used for clustering.

New cards

Dendrogram

Tree-like diagram representing data clustering hierarchy.

New cards

Unsupervised Learning

Learning patterns from data without labeled responses.

New cards

Customer Features

Attributes like age and income used for clustering.

New cards

Marketing Strategies

Tailored approaches based on customer segment characteristics.

New cards

Cluster Analysis

Technique to identify patterns in data without explanations.

New cards

Buying Behavior

Patterns in customer purchases used for segmentation.

New cards

Data Mining

Extracting useful information from large datasets.

New cards

Numerical Features

Quantitative attributes used for analysis and clustering.