1/33
Vocabulary flashcards covering key terms and definitions related to bin packing, the First-Fit Decreasing algorithm, fundamental data-clustering concepts, distance metrics, cluster evaluation, K-Means, the Kappa metric, and common clustering applications.
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
Bin Packing Problem
Combinatorial problem of packing n items of sizes x1…xn into the minimum number of bins of fixed capacity c.
Bin Capacity (c)
Maximum size or weight that a single bin can hold in the bin-packing problem.
First-Fit Decreasing (FFD)
Greedy bin-packing algorithm that sorts items in decreasing order and inserts each item into the first bin in which it fits.
FFD Key Steps
1) Create empty bins, 2) sort items descending, 3) place each item into the first suitable bin, 4) discard empty bins.
Bin Packing Applications
Scheduling adverts, creating music compilations, filling recycling bins, suitcase packing, etc.
Pattern
An individual object or data point described by a set of features (attributes).
Feature (Variable)
Measurable attribute of a pattern, e.g., colour, weight, speed.
Feature Space
m-dimensional space formed by m features in which patterns are represented as points.
Cluster
Set of patterns that are similar to each other and dissimilar to patterns in other sets.
Data Clustering
Process of grouping objects into k mutually exclusive clusters based on a similarity or distance measure.
Mutually Exclusive Clustering
Constraint where each object belongs to exactly one cluster.
Cluster Representation Vector (C)
Vector where ci = j indicates that object i belongs to cluster j.
Distance Metric
Mathematical function that quantifies similarity or dissimilarity between two data points.
Euclidean Distance
Straight-line distance between two n-dimensional points: sqrt(Σ (xi − yi)²).
Manhattan Distance
Sum of absolute differences between coordinates of two points.
Correlation-Based Distance
Similarity measure derived from statistical correlations such as Pearson, Spearman, or Kendall.
Cluster Worth
Quantitative assessment of clustering quality using metrics like sum of squares, homogeneity, or separation.
Sum of Squares Within Cluster
Metric equal to Σ ||xi − c||² for all points xi in a cluster with centre c; used by K-Means.
Homogeneity (H)
Measure of cluster density; lower intra-cluster distances imply higher homogeneity.
Separation (S)
Measure of distance between cluster centres; higher inter-cluster distances imply better separation.
Number of Clusters (k)
Parameter specifying how many clusters a clustering solution should contain; often difficult to determine.
Centroid-Based Clustering
Family of algorithms that represent each cluster by its centre (centroid); K-Means is the classic example.
Hierarchical Clustering
Approach that builds a tree of clusters either by agglomerating or dividing clusters iteratively.
Density-Based Clustering
Method that groups points that are closely packed together, marking sparse regions as noise.
Distribution-Based Clustering
Technique that assumes data are generated by mixtures of underlying probability distributions.
Optimisation-Based Clustering
View of clustering as a search problem solvable by heuristics like hill climbing or simulated annealing.
K-Means Clustering
Iterative algorithm that partitions data into k clusters by minimising within-cluster sum of squares.
K-Means Centre (Centroid)
Mean vector of all points currently assigned to a cluster; updated each iteration.
K-Means Termination Criteria
Stop when centroids no longer change or after a fixed number of iterations.
Kappa Metric (κ)
Agreement measure adapted from medical statistics to compare similarity of two clustering arrangements.
Kappa Guideline
Ranges: ≤0 Very Poor, 0–0.2 Poor, 0.2–0.4 Fair, 0.4–0.6 Moderate, 0.6–0.8 Good, 0.8–1.0 Very Good.
Clustering Applications – Retail Marketing
Segment households by income, size, occupation, proximity to urban areas for targeted promotion.
Clustering Applications – Health Insurance
Identify household risk groups using doctor visits, chronic conditions, household size, and age.
Data Visualisation in Clustering
Plotting clusters in 2-D or 3-D when the number of features is small to observe separations.