WEEK 12 CS2004 – Bin Packing and Data Clustering

0.0(0)
studied byStudied by 0 people
GameKnowt Play
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
Card Sorting

1/33

flashcard set

Earn XP

Description and Tags

Vocabulary flashcards covering key terms and definitions related to bin packing, the First-Fit Decreasing algorithm, fundamental data-clustering concepts, distance metrics, cluster evaluation, K-Means, the Kappa metric, and common clustering applications.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

34 Terms

1
New cards

Bin Packing Problem

Combinatorial problem of packing n items of sizes x1…xn into the minimum number of bins of fixed capacity c.

2
New cards

Bin Capacity (c)

Maximum size or weight that a single bin can hold in the bin-packing problem.

3
New cards

First-Fit Decreasing (FFD)

Greedy bin-packing algorithm that sorts items in decreasing order and inserts each item into the first bin in which it fits.

4
New cards

FFD Key Steps

1) Create empty bins, 2) sort items descending, 3) place each item into the first suitable bin, 4) discard empty bins.

5
New cards

Bin Packing Applications

Scheduling adverts, creating music compilations, filling recycling bins, suitcase packing, etc.

6
New cards

Pattern

An individual object or data point described by a set of features (attributes).

7
New cards

Feature (Variable)

Measurable attribute of a pattern, e.g., colour, weight, speed.

8
New cards

Feature Space

m-dimensional space formed by m features in which patterns are represented as points.

9
New cards

Cluster

Set of patterns that are similar to each other and dissimilar to patterns in other sets.

10
New cards

Data Clustering

Process of grouping objects into k mutually exclusive clusters based on a similarity or distance measure.

11
New cards

Mutually Exclusive Clustering

Constraint where each object belongs to exactly one cluster.

12
New cards

Cluster Representation Vector (C)

Vector where ci = j indicates that object i belongs to cluster j.

13
New cards

Distance Metric

Mathematical function that quantifies similarity or dissimilarity between two data points.

14
New cards

Euclidean Distance

Straight-line distance between two n-dimensional points: sqrt(Σ (xi − yi)²).

15
New cards

Manhattan Distance

Sum of absolute differences between coordinates of two points.

16
New cards

Correlation-Based Distance

Similarity measure derived from statistical correlations such as Pearson, Spearman, or Kendall.

17
New cards

Cluster Worth

Quantitative assessment of clustering quality using metrics like sum of squares, homogeneity, or separation.

18
New cards

Sum of Squares Within Cluster

Metric equal to Σ ||xi − c||² for all points xi in a cluster with centre c; used by K-Means.

19
New cards

Homogeneity (H)

Measure of cluster density; lower intra-cluster distances imply higher homogeneity.

20
New cards

Separation (S)

Measure of distance between cluster centres; higher inter-cluster distances imply better separation.

21
New cards

Number of Clusters (k)

Parameter specifying how many clusters a clustering solution should contain; often difficult to determine.

22
New cards

Centroid-Based Clustering

Family of algorithms that represent each cluster by its centre (centroid); K-Means is the classic example.

23
New cards

Hierarchical Clustering

Approach that builds a tree of clusters either by agglomerating or dividing clusters iteratively.

24
New cards

Density-Based Clustering

Method that groups points that are closely packed together, marking sparse regions as noise.

25
New cards

Distribution-Based Clustering

Technique that assumes data are generated by mixtures of underlying probability distributions.

26
New cards

Optimisation-Based Clustering

View of clustering as a search problem solvable by heuristics like hill climbing or simulated annealing.

27
New cards

K-Means Clustering

Iterative algorithm that partitions data into k clusters by minimising within-cluster sum of squares.

28
New cards

K-Means Centre (Centroid)

Mean vector of all points currently assigned to a cluster; updated each iteration.

29
New cards

K-Means Termination Criteria

Stop when centroids no longer change or after a fixed number of iterations.

30
New cards

Kappa Metric (κ)

Agreement measure adapted from medical statistics to compare similarity of two clustering arrangements.

31
New cards

Kappa Guideline

Ranges: ≤0 Very Poor, 0–0.2 Poor, 0.2–0.4 Fair, 0.4–0.6 Moderate, 0.6–0.8 Good, 0.8–1.0 Very Good.

32
New cards

Clustering Applications – Retail Marketing

Segment households by income, size, occupation, proximity to urban areas for targeted promotion.

33
New cards

Clustering Applications – Health Insurance

Identify household risk groups using doctor visits, chronic conditions, household size, and age.

34
New cards

Data Visualisation in Clustering

Plotting clusters in 2-D or 3-D when the number of features is small to observe separations.