Bin Packing and Data Clustering

studied byStudied by 3 people
5.0(1)
Get a hint
Hint

what is bin packing?

1 / 35

flashcard set

Earn XP

Description and Tags

36 Terms

1

what is bin packing?

fitting items effectively into a container

New cards
2

what is the bin packing problem?

a combinatorial problem

a number of items n of size x1…xn need to be put into the smalles number of bins of capacity c

<p>a combinatorial problem</p><p>a number of items n of size x1…xn need to be put into the smalles number of bins of capacity c</p>
New cards
3

what are bin packing algorithms?

algorithms that solve the bin packing problem

New cards
4

what are the applications of bin packing algorithms?

filling recycle bins

tape compilations

tv advertisement time

New cards
5

what is an example of a bin packing algorithm?

first-fit decreasing algorithm

many other algorithms exist

New cards
6

what is the first-fit decreasing algorithm?

  1. n empty bin are created and numbered 1..n

  2. items to be packed are sorted in descending order

  3. starting from largest item, pack item into the first bin it fits into

  4. any empty bins at the end are discarded/ignored

<ol><li><p>n empty bin are created and numbered 1..n</p></li><li><p>items to be packed are sorted in descending order</p></li><li><p>starting from largest item, pack item into the first bin it fits into</p></li><li><p>any empty bins at the end are discarded/ignored</p></li></ol>
New cards
7

what is data clustering?

the process of arranging objects (as data points) into a number of sets [k] according to a metric [e.g., distance]

New cards
8

what is a pattern?

physical objects

New cards
9

what is a cluster?

a set/ category of objects

New cards
10

what is a feature?

an attribute of an object

objects with similar features are grouped together

data can be grouped based on 1 feature or multiple (eg. group by age vs. group by age and weight)

<p>an attribute of an object</p><p>objects with similar features are grouped together</p><p>data can be grouped based on 1 feature or multiple (eg. group by age vs. group by age and weight)</p>
New cards
11

how do objects relate in data clustering?

  • each set ideally shares a common trait (eg. similarity or proximity for some defined distance measure)

  • each set is called a cluster/ group

  • each set is mutually exclusive (item cannot be in multiple clusters)

  • objects in 1 cluster are dissimilar to the objects in other clusters

New cards
12

given a set of data, what do we already know about it?

the features of each object in the data set - features represented using a feature vector

New cards
13

what is a feature vector?

in a feature vector there is an object x, and the [ ] shows the values of its features e.g., first 1 has 2 features, 2nd x has d features. compare the feature vectors of objects to cluster the date

<p>in a feature vector there is an object x, and the [ ] shows the values of its features e.g., first 1 has 2 features, 2nd x has d features. compare the feature vectors of objects to cluster the date</p>
New cards
14

what type of data can we use for clustering?

data in a table/matrix X, where xi is the ith row of the table and xij is the jth variable (feature) of row i

New cards
15

how do we interpret data in a matrix to be used for clustering?

  • columns = features = objects with similar features grouped together. (header shows the actual feature, the y axis shows that object’s value in a certain feature)

  • rows = the objects

rows need to be clustered together based on how similar the values of their feature are

<ul><li><p>columns = features = objects with similar features grouped together. (header shows the actual feature, the y axis shows that object’s value in a certain feature)</p></li><li><p>rows = the objects</p></li></ul><p>rows need to be clustered together based on how similar the values of their feature are</p>
New cards
16

how can data clustering be visualised?

is using 2 features we can plot data as a graph

cannot do this with hundreds of features

<p>is using 2 features we can plot data as a graph</p><p>cannot do this with hundreds of features</p>
New cards
17

why do we need to clustering?

it is a data analysis technique which helps us understand how objects are related

  • less complex to model

  • clustering is used as a pre-processing tool

  • can give us insight into an object’s unknown properties

New cards
18

what are the applications of data clustering?

retail - identify households via: household income, household size, distance from urban area

sports science - identify players via: points per game, assists per game, steals per game

health science - health companies cluster via: household size, number of doctor visits, average household age

New cards
19

when implementing clustering after data has been collected, what do we need to consider?

  • cluster representation

  • data similarity - how can we determine if one data object is similar to another data object - eg. using distance or a different metric

  • cluster worth - how to evaluate our cluster results and know if they are good or not

  • clustering method - what clustering method do we use

  • number of clusters - do we need to define the number of clusters to make ourselves (kmeans = yes vs. hierachal = no)

New cards
20

how are clusters represented?

index 0 is in cluster 1, since there are two 1s in the set C that means there are 2 items in cluster 1, eg. index 1 is in cluster 2

<p>index 0 is in cluster 1, since there are two 1s in the set C that means there are 2 items in cluster 1, eg. index 1 is in cluster 2</p>
New cards
21

what is data similarity?

the distance between the various data points from the data. different metrics can be used to measure similarity e.g., distance, and different methods are used to measure the different metrics.

New cards
22

how does data similarity affect clustering?

  • rows are compared to each other

  • clustering methods use the measure data similarity to place similar rows into the same cluster

  • a better distance function means better cluster algorithm performance

New cards
23

what methods are there to measure data similarity?

  • Euclidean distance - used by k means method

  • correlations

    • pearson

    • spearman

    • kenda

  • manhattan distance

choose simplest one for given data

New cards
24

what is Euclidean distance

the shortest distance between 2 points

  • 2D - length of the right angle constructed between the 2 points aka Pythagoras's theorem. needed if we have a graph or don’t directly have th exact distances

<p>the shortest distance between 2 points</p><ul><li><p>2D - length of the right angle constructed between the 2 points aka Pythagoras&apos;s theorem. needed if we have a graph or don’t directly have th exact distances</p></li></ul>
New cards
25

what is cluster worth?

used to evaluate the goodness of clustering algorithm results - how good the actual clusters created were

  • different ways to judge the worth of a clustering arrangement so need to choose the right worth metric for success. each metric has different methods to measure it

New cards
26

what type of metrics can be used to measure cluster worth?

knowt flashcard image
New cards
27

what is the sum of squares by cluster metric?

for one cluster, calculate each elements distance from the cluster’s centre, square each distance and sum them

<p>for one cluster, calculate each elements distance from the  cluster’s centre, square each distance and sum them</p>
New cards
28

what is the impact of the number of clusters on a clustering method?

  • Some applications need the number of clusters a solution creates specified and some applications do not eg. hierarchal clustering will cluster into a tree format and not K number of clusters, K-means groups into K clusters

  • Determining the number of clusters is very difficult

  • Need a method to find the number clusters and their contents

New cards
29

what is a clustering method/ algorithms?

sequence of events that carries out the data clustering

New cards
30

what are the different types of clustering algorithms?

  • Centroid-based clustering

    • K-Means

  • Hierarchical clustering - categorise data into a tree

  • Density-based clustering - determine if dense or sparse items

  • Distribution-based clustering

  • Optimisation / Search AI

    • Hill Climbing and Simulated Annealing

New cards
31

what is K-means clustering?

  1. requires the number of clusters K to be defined and maintains each cluster’s means - centre points

  2. judges the worth of a clustering arrangement based on the square of how far each item in the cluster is from the centre

New cards
32

how does the k-means algorithm work?

  1. create k amount of centres (means) and place them in random locations on the matrix. each centre will form a cluster a cluster

  2. for each data point/ object/ row, assign it to the closest centre. (for each row, calc the distance between object and each centre and pick centre with shortest distance)

  3. for each cluster, calculate the mean of all the data points in that cluster (not including the centre). the centre positions become the means

  4. the algorithm terminates when the centres (means) do not change or a fixed number of iterations have been reached

New cards
33

what is the pseudocode for the k-means algorithm?

knowt flashcard image
New cards
34

how can we evaluate the clustering method we used?

obtain insight into how the cluster method performed based on

  • cluster worth

  • expert knowledge

  • run method many times and compare the clustering arrangements

this can helps us see if we selected right method, select correct metric for cluster worth, ensure results are consistent

New cards
35

what metrics can we use to compare 2 clustering arrangements?

Kappa metric

New cards
36

what can we determine about clustering arrangements based on the kappa score?

if the clustering arrangements have a high kappa score, then the clustering arrangements are similar.

if the method produces 2 or more similar clustering arrangements then the method itself is consistent

<p>if the clustering arrangements have a high kappa score, then the clustering arrangements are similar.</p><p></p><p>if the method produces 2 or more similar clustering arrangements then the method itself is consistent</p>
New cards

Explore top notes

note Note
studied byStudied by 8 people
Updated ... ago
4.0 Stars(1)
note Note
studied byStudied by 9 people
Updated ... ago
5.0 Stars(1)
note Note
studied byStudied by 5 people
Updated ... ago
5.0 Stars(1)
note Note
studied byStudied by 13 people
Updated ... ago
4.0 Stars(1)
note Note
studied byStudied by 9 people
Updated ... ago
5.0 Stars(1)
note Note
studied byStudied by 26 people
Updated ... ago
5.0 Stars(1)
note Note
studied byStudied by 11 people
Updated ... ago
5.0 Stars(1)
note Note
studied byStudied by 13 people
Updated ... ago
5.0 Stars(1)

Explore top flashcards

flashcards Flashcard29 terms
studied byStudied by 10 people
Updated ... ago
5.0 Stars(1)
flashcards Flashcard152 terms
studied byStudied by 1 person
Updated ... ago
5.0 Stars(1)
flashcards Flashcard83 terms
studied byStudied by 1 person
Updated ... ago
5.0 Stars(1)
flashcards Flashcard100 terms
studied byStudied by 15 people
Updated ... ago
5.0 Stars(1)
flashcards Flashcard20 terms
studied byStudied by 6 people
Updated ... ago
4.0 Stars(1)
flashcards Flashcard55 terms
studied byStudied by 8 people
Updated ... ago
5.0 Stars(1)
flashcards Flashcard20 terms
studied byStudied by 98 people
Updated ... ago
5.0 Stars(1)
flashcards Flashcard26 terms
studied byStudied by 4 people
Updated ... ago
5.0 Stars(1)