WEEK 12 CS2004 – Bin Packing and Data Clustering

Algorithms and Their Applications - CS2004 (2024-2025)

1. Introduction to Topics Covered

Overview of topics studied so far:
Concepts of Computation and Algorithms
Comparative analysis of algorithms
Mathematical foundations including Big-Oh notation
Computational Complexity
Data structures
Sorting Algorithms
Graph traversal algorithms
Heuristic Search techniques
Hill Climbing and Simulated Annealing methods
Parameter Optimization Applications
Evolutionary Computation
Swarm Intelligence

2. Focus of Current Lecture

We will delve into specific algorithms:
Bin Packing (brief overview)
Data Clustering (in-depth analysis)

3. Bin Packing Introduction

Definition: The bin packing problem involves the task of fitting a number of items into the smallest number of bins (containers), each with a specified capacity.
Example: Given items of sizes (x1, x2, \ldots, x_n) that need to fit into bins of capacity (c).
If items are sized 5, 1, 4, 2, 3 and capacity is 6, the goal is to minimize the number of bins used.

4. Applications of Bin Packing

Numerous applications exist, including but not limited to:
TV/radio advertisements
CD or tape compilations
Organizing files in recycle bins.

5. Bin Packing Algorithms

Combinatorial Problem: Many methods can address bin packing challenges.
First-Fit Decreasing (FFD) Algorithm:
Sorts items in descending order before packing.
Uses empty bins, numbered sequentially.
Each item is placed in the first bin that can accommodate it, starting with the largest item first.
Empty bins at the end of packing are ignored.

Example of First-Fit Decreasing

Items: 3, 1, 5, 6, 7, 8, 3, 2, 9, 1
Bin capacity: 10
Sorted order for packing: 9, 8, 7, 6, 5, 3, 3, 2, 1, 1

6. Introduction to Data Clustering

Definition: Data clustering is the process of grouping objects into sets (clusters) based on their characteristics/features.
Goal: Objects within the same cluster share common traits, typically based on similarity or proximity, while objects in different clusters are dissimilar.
Assumption: Each set is mutually exclusive; an object cannot belong to more than one cluster.

7. How Clustering Works

The approach involves:
Measuring similarities among various data rows based on their feature values (e.g. represented in a feature vector).
Clusters can be represented visually if only a few features are considered.

8. Importance of Clustering

Utility: Understanding relationships among objects aids in data analysis:
Simplifies complex data modeling.
Serves as a pre-processing step to further data exploration.
Can provide insights into unknown properties of objects.

9. Real-World Applications of Data Clustering

Retail Marketing: Clustering assists in identifying household segments based on factors such as:
Income levels
Household size
Occupation
Geography
Sports Science: Used to analyze players based on performance metrics (points, rebounds, assists, etc.).
Health Insurance: Groups households based on medical needs and demographics (doctor visits, chronic conditions, etc.).

10. Clustering Representation

Each cluster can be represented by a vector (e.g., C = {1, 2, 3, 1, 2}) denoting membership of objects (rows) to clusters.

11. Measuring Data Similarity

Clustering methods rely on distance metrics to determine how closely related data points are:
Examples include Euclidean, Correlation, Pearson, Spearman, and Manhattan distances.

12. Evaluating Cluster Worth

Assessing the effectiveness of clustering results involves:
Selecting appropriate metrics (e.g., density and separation of clusters).
Utilizing methods such as the K-Means algorithm, which computes the sum of squared distances within clusters.

13. Challenges in Clustering

Determining the correct number of clusters is often difficult:
Some methods require predefined clusters, while others find clusters dynamically.

14. Clustering Methods Overview

Various approaches exist:
Centroid-based (e.g., K-Means)
Hierarchical
Density-based
Distribution-based
Optimization algorithms (like Hill Climbing).

15. K-Means Clustering Method

Requires prior knowledge of the number of clusters (k).
Steps:

Assign objects randomly to clusters.
Compute the new centers of each cluster.
Allocate objects to the nearest center and minimize error.
Repeat until stable centers or a specified iteration is reached.

16. Validating Clustering Results

After clustering, verification involves:
Checking if the chosen method is effective.
Confirming results by comparing with established knowledge.

17. Kappa Metric for Comparing Clusters

Kappa metric evaluates agreements between different clustering results:
It ranges from -1 to 1, indicating the level of agreement from very poor (>0) to very good (=1).

18. Next Steps in Learning

Upcoming topics include Traveling Salesperson Problem and practical lab sessions focusing on K-Means clustering.