1/19
A set of flashcards covering key terms and concepts related to Unsupervised Machine Learning.
Name | Mastery | Learn | Test | Matching | Spaced | Call with Kai |
|---|
No study sessions yet.
Unsupervised Learning
A type of machine learning where the model works without labeled data to discover patterns or structures in the data.
Clustering
The process of grouping a set of objects such that objects in the same group are more similar to each other than to those in other groups.
Dimensionality Reduction
The process of reducing the number of random variables under consideration, obtaining a set of principal variables.
Association Rule Learning
A method used to discover interesting relations between variables in large databases, commonly used in market basket analysis.
Anomaly Detection
The identification of rare items or events which raise suspicions by differing significantly from the majority of the data.
K-Means Clustering
A clustering method that partitions data into K clusters by minimizing the distance between data points and the cluster centroids.
PCA (Principal Component Analysis)
A technique to reduce the dimensionality of data while preserving as much variance as possible.
Silhouette Score
A measure used to evaluate how well each object lies within its cluster, calculated as the difference between a point's distance to its own cluster and to the nearest cluster.
DBSCAN (Density-Based Spatial Clustering of Applications with Noise)
A clustering algorithm that groups closely packed points together and marks points in low-density regions as outliers.
Eigenvectors and Eigenvalues
Concepts used in PCA; eigenvalues indicate the variance explained by an eigenvector, which represents principal component directions.
Covariance Matrix
A matrix that captures how several variables vary together and serves as a key component in PCA.
Elbow Method
A technique to choose the number of clusters (K) in K-Means by determining the point where the increase in K starts to yield diminishing returns.
Mean Vector
The vector containing means of all dimensions in a dataset, calculated by averaging the data points.
Inertia
Also known as the Within-Cluster Sum of Squares (WCSS), it measures how tightly grouped the members of a cluster are.
Feature Scaling
The method of normalizing data features to ensure that each feature contributes equally to the distance calculations.
K-Means++
An improved initialization technique for K-Means that selects initial centroids to avoid poor clustering and enhance convergence speed.
Curse of Dimensionality
A phenomenon where the feature space becomes increasingly sparse due to the exponential increase in volume associated with adding dimensions.
t-SNE (t-Distributed Stochastic Neighbor Embedding)
A nonlinear dimensionality reduction technique particularly suited for visualizing high-dimensional data.
UMAP (Uniform Manifold Approximation and Projection)
A modern technique for dimensionality reduction and visualization that preserves the structure of complex data.
Gaussian Mixture Models (GMM)
A probabilistic model that assumes data points are generated from a mixture of several Gaussian distributions.