Clustering

0.0(0)
studied byStudied by 0 people
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
Card Sorting

1/9

flashcard set

Earn XP

Description and Tags

INFS QUIZ 2

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

10 Terms

1
New cards

Business Analytics

Analysis of data using tools and technique, to gain insight and use the insight to make decisions

2
New cards

Data mining

The process of discovering patterns and knowledge from large dataset

  • CLUSTERING AND CLASSIFICATION etc.

  • descriptive or predictive

3
New cards

descriptive of pred? A retailer trying to group products that are purchased together

descript.

4
New cards

Descrip or pred? A bank loan officer wants to analyze the data in order to predict if a potential customer (loan applicant) is risky or safe. T

pred

5
New cards

Clustering

takes unlabled data & creates subgroups of alike data

identifying subgroups that are similar or different

6
New cards

clus example : Suppose we have a list of cars with information on various characteristics such as price, wheelbase, transmission, and horse power. We explore these characteristics to form several clusters such that

– the characteristics of cars in the same cluster are similar – the characteristics of the cars in different clusters are different

7
New cards

which algorithm

k means!! —> an algorythm (c means and hierachle as well)

8
New cards

how to kmean

Use set.seed at top of chunk -same randome generated numbers centorids, dont use sequence or same set seed

Step 0. Decide k, the number of clusters that are needed Given k, the k-mean algorithm is implemented as follows:

– Step 1: Randomly choose k points as the initial centroids of the k clusters (the centroid is the center, i.e. mean point of the cluster)

– Step 2: Assign each object to the nearest cluster centroid

– Step 3: Re-compute (update) the centroids (center of the assigned objects) using the current cluster membership

– Step 4: Repeat Steps 2 and 3, stop when the assignment of the members (objects) does not change

9
New cards

K-Means shows wat

K is the number of cluters i want

Centroids will be picked from the data points, same amount of k means as centroidss

points closest to centroids are assigned to the cluster

the centroids are recomuted based on the mean of each cluster

process repeats until the centroids stop moving = final clusters

10
New cards

K means is for what colums

numeric colums