1/9
INFS QUIZ 2
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
Business Analytics
Analysis of data using tools and technique, to gain insight and use the insight to make decisions
Data mining
The process of discovering patterns and knowledge from large dataset
CLUSTERING AND CLASSIFICATION etc.
descriptive or predictive
descriptive of pred? A retailer trying to group products that are purchased together
descript.
Descrip or pred? A bank loan officer wants to analyze the data in order to predict if a potential customer (loan applicant) is risky or safe. T
pred
Clustering
takes unlabled data & creates subgroups of alike data
identifying subgroups that are similar or different
clus example : Suppose we have a list of cars with information on various characteristics such as price, wheelbase, transmission, and horse power. We explore these characteristics to form several clusters such that
– the characteristics of cars in the same cluster are similar – the characteristics of the cars in different clusters are different
which algorithm
k means!! —> an algorythm (c means and hierachle as well)
how to kmean
Use set.seed at top of chunk -same randome generated numbers centorids, dont use sequence or same set seed
Step 0. Decide k, the number of clusters that are needed Given k, the k-mean algorithm is implemented as follows:
– Step 1: Randomly choose k points as the initial centroids of the k clusters (the centroid is the center, i.e. mean point of the cluster)
– Step 2: Assign each object to the nearest cluster centroid
– Step 3: Re-compute (update) the centroids (center of the assigned objects) using the current cluster membership
– Step 4: Repeat Steps 2 and 3, stop when the assignment of the members (objects) does not change
K-Means shows wat
K is the number of cluters i want
Centroids will be picked from the data points, same amount of k means as centroidss
points closest to centroids are assigned to the cluster
the centroids are recomuted based on the mean of each cluster
process repeats until the centroids stop moving = final clusters
K means is for what colums
numeric colums