1/24
4323
Name | Mastery | Learn | Test | Matching | Spaced |
|---|
No study sessions yet.
1. Which of the following is not true about PCA?
A. It searches for the directions that data have the largest variance.
B. PCA can be used for visualizing data in lower dimensions.
C. Not all principal components are orthogonal to each other.
D. PCA is an unsupervised method.
E. Maximum number of principal components is less than or equal to the number of features.
C. Not all principal components are orthogonal to each other.
2. What would we expect when we get features in lower dimensions using PCA?
A. The features may not carry all information present in data.
B. The features will have better interpretability.
C. The features must carry all information present in data.
D. None of the above.
A. The features may not carry all information present in data.
23 observations, 10 features,
Standard deviations (1...p=10)
[1] 2.0308, 1.3559, 1.1132 ...
How many PCs are expected to obtain?
A. 10
B. 22
C. 1
D. 5
E. 23
A. 10
4. What is the variance of the first principal component?
A. 2.0308
B. 5
C. 0.4124
D. 4.1241
E. 1
D. 4.1241
2.0308^2 = 4.1241
5. What proportion of variance can the first two principal components explain together?
A. 0.1839
B. 0.5963
C. 0.4124
D. 0.7202
E. 1
B. 0.5963
(2.0308^2+1.3559^2)/p=10
6. Which two PCs are highly correlated?
A. PC2 and PC3
B. Any two PCs are highly correlated
C. PC1 and PC2
D. PC1 and PC4
E. None of the PCs are highly correlated
E. None of the PCs are highly correlated
7. Which of the following can be the first 2 principal components loading vectors after applying PCA?
A. (0.5,0.5,0.5,0.5) and (0.5,0.5,0.5,-0.5)
B. (0.5,0.5,0.5,0.5) and (0.5,0.5,-0.5,-0.5)
C. (0.5,0.5,0.5,0.5) and (0.71,0,0,0.71)
D. (0.5,0.5,0.5,0.5) and (0.71,0.71,0,0)
B. (0.5,0.5,0.5,0.5) and (0.5,0.5,-0.5,-0.5)
first check unit length = 1 by v2^2 (can round to be 1)
A. (0.5)^2+(0.5)^2+(0.5)^2+(-0.5)^2 = 1 GOOD
B. (0.5)^2+(0.5)^2+(-0.5)^2+(-0.5)^2 = 1 GOOD
C. (0.71)^2+(0)^2+(0)^2+(0.71)^2 = 1.0082 round = 1 GOOD
now check if the dot product is orthogonal = 0
dot product = v1 * v2 so,
B.
0.5 * 0.5 = 0.25
0.5 * 0.5 = 0.25
0.5 * -0.5 = -0.25
0.5 * -0.5 = -0.25
sum = 0.25+0.25-0.25-0.25 = 0 = 0
ORTHOGONAL b/c 0 = 0 GOOD
Which statement about Support vector machines (SVM) is true?
A. SVM is a unsupervised learning algorithm.
B. We cannot perform SVM using R for more than two classes classification.
C. Using the 1-vs-all approach, the classifier will use K SVMs (K is the number of classes in your data set).
D. Using the 1-vs-1 approach, the classifier will use K SVMs (K is the number of classes in your data set).
C. Using the 1-vs-all approach, the classifier will use K SVMs (K is the number of classes in your data set).
Which of the following is required by K-means clustering?
A. number of clusters
B. initial guess of cluster assignments for each observation
C. defined distance metric
D. all of the above
D. all of the above
Which of the statement about K-means and K-nearest neighbors (KNN) is true?
A. K-means is a clustering algorithm.
B. KNN can do feature deduction.
C. K-means and KNN are essentially the same.
D. K-means is a supervised learning method; KNN is unsupervised learning method.
A. K-means is a clustering algorithm.
Which of the following method can be used for choosing optimal number of clusters in K-means algorithm?
A. All of the above
B. Cross-validation
C. Manhattan method
D. Elbow method
E. None of the above
D. Elbow method
Which of the following statements about K-means algorithm is correct?
A. For different initializations, the K-means algorithm will definitely give the same clustering results.
B. The centroids in the K-means algorithm may not be any observed data points.
C. The K-means algorithm is not sensitive to outliers.
D. The K-means algorithm can always reach the global optima.
B. The centroids in the K-means algorithm may not be any observed data points.
Considering the K-means algorithm, after current iteration, we have 2 centroids (-1,2) and (2,1). Will points (2,3) and (2,0.6) be assigned to the same cluster in the next iteration?
A. Yes
B. No
A. Yes
distance formula = sqrt((px-cx)^2+(py-cy)^2)
use distance formula, Distance from P1 = (2,3)
(-1,2)
d = sqrt((2-(-1))^2+(3-2)^2) = sqrt(3^2+1^2) = sqrt(10) = 3.16
(2,1)
d = sqrt((2-2)^2+(3-1)^2) = sqrt(0+2^2) = 2
look for smaller #
p1 is closer to centroid 2
then do same with P2 = (2,0.6), you end up getting
(-1,2) = 3.31 and (2,1) = 0.4
so P2 is closer to centroid 2 as well
So b/c they are closer to the same centroid.
YES, they will be assigned to the same cluster
(if they had different closest centroids, it would have been NO)
Considering the K-means algorithm, if points (3,0,1), (2,2,3), and (-2,1,2) are the only points which are assigned to the first cluster now, what is the new centroid for this cluster?
A. (0,1,2)
B. (2,1,3)
C. (1,1,2)
D. (2,0,1)
C. (1,1,2)
add all x's = 3 + 2 + (-2) = 3
add all y's = 0 + 2 + 1 = 3
add all z's = 1 + 3 + 2 = 6
then divide all
3/3, 3/3, 6/3 = (1,1,2)
In which situation should we consider clustering analysis?
A. Suppose we have a set of data on the top 500 firms in the US. For each firm we record profit, number of employees, industry and the CEO salary. We want to understand which factors affect CEO salary.
B. Company may employ 5 sales person and we want to help assigning customers to those sale persons such that the customers assigned to each one are as similar as possible.
C. Working for a loan company, we would like to predict whether a customer will default his/her loan, based on customer's income, education level, credit score, and so on.
D. We are working on weather prediction, and we would like to use a learning algorithm to predict tomorrow's temperature.
B. Company may employ 5 sales person and we want to help assigning customers to those sale persons such that the customers assigned to each one are as similar as possible.
first list all the pairs and list all the numbers in the matrix,
read from left to top
(1,4) = 0.3, (1,3) = 0.4, (1,2) = 0.7
(2,3) = 0.5, (2,4) = 0.8
(3,4) = 0.45
Which dendrogram will we obtain?
so now that we have all the pairs, create a cluster going from the smallest number to the last smallest number so,
smallest = (1,4) = 0.3, so we currently have {1,4} now we need #'s 2 and 3, so next choose the smallest number in either points 2 or 3, we'll do point 3 (so choose pairs with 3 in it)
(1,3) = 0.4, (2,3) = 0.5, (3,4) = 0.45, choose: (1,3)=0.4
now we have {1,3,4} with (1,4)=0.3 & (1,3)=0.4
lastly we need 2, so pairs with 2 in it and find smallest number
(1,2)=0.7, (2,3)=0.5, (2,4)=0.8, choose: (2,3)=0.5
we now have all {1,2,3,4} with the pairs
(1,4)=0.3, (1,3)=0.4, and (2,3)=0.5
now build the dendrogram
dendrogram
we got points 1 & 4 together with 0.3
then point 3 is 0.4
lastly point 2 is 0.5
Regarding question 16 (the question i asked before this), if you would like to obtain two clusters, what would be your clustering assignment?
A. {2}, {1,3,4}
B. {1,3}, {2,4}
C. {3}, {1,2,4}
D. {1,4}, {2,3}
E. {1,2}, {3,4}
A. {2}, {1,3,4}
to get 2 clusters, you always cut the dendrogram just below the last merge, so we cut the 2 merge off from the other merges with 1,3,4
In this problem, perform K-means clustering manually, with K=2, on a small example with n=5 observations and p=2 features.
The observations are as follows.
Obs. | X1 X2
1 1 4
2 6 2
3 1 3
4 0 4
5 5 1
Presume that the initial cluster assignment is:
Cluster 1={#1,#3}, Cluster 2={#2,#4,#5}
Proceed to perform K-means algorithm by hand, what is the final cluster assignment?
A. {#1,#2} and {#3,#4,#5}
B. {#1,#4} and {#2,#3,#5}
C. {#1,#3} and {#2,#4,#5}
D. {#1,#3,#4} and {#2,#5}
E. None of the above
D. {#1,#3,#4} and {#2,#5}
first find centroid 1 and 2
cluster 1 = (1,3) observations
cluster 2 = (2,4,5) observations
(1,3) centroid 1
1+1/2 4+3/2 = (1,3.5)
(2,4,5) centroid 2
6+0+5/3 2+4+1/3 = (3.67,2.33)
next go through each observation and find the distance for which cluster is closest to it
Obs 1 (1,4)
c1 = sqrt((1-1)^2+(4-3.5)^2) = 0.5
c2 = sqrt((1-3.67)^2+(4-2.33)^2) = 3.14
obs 1 = cluster 1
Obs 2 (6,2)
c1 = 5.22
c2 = 2.35
obs 2 = cluster 2
Obs 3 (1,3)
c1 = 0.5
c2 = 2.75
obs 3 = cluster 1
Obs 4 (0,4)
c1 = 1.11
c2 = 4.03
obs 4 = cluster 1
Obs 5 (5,1)
c1 = 4.71
c2 = 1.88
obs 5 = cluster 2
so this makes
obs 1,3,4 all in cluster 1 and obs 2,5 in cluster 2
D. {#1,#3,#4} and {#2,#5}
which of the following metrics, do we have for finding similarity between two clusters in hierarchical clustering?
A. Single linkage
B. Complete linkage
C. Average linkage
D. All of the above
D. All of the above
Which of the following is finally produced by Hierarchical Clustering?
A. tree showing how close things are to each other
B. final estimate of cluster centroids
C. assignment of each point to clusters
D. all of the mentioned
A. tree showing how close things are to each other
The purpose of clustering analysis is to
A. classify variables as dependent or independent
B. find natural groupings among observations
C. make predictions
D. dimension reduction
B. find natural groupings among observations
A ______ or tree graph is a graphical representation for displaying clustering results. Vertical lines represent clusters that are joined together. The position of the line on the scale indicates the distances at which clusters were joined.
A. scattergram
B. scree plot
C. gap statistics
D. dendrogram
D. dendrogram
The ____ method uses information on all pairs of distances, not merely the minimum or maximum distances.
A. single linkage
B. medium linkage
C. complete linkage
D. average linkage
D. average linkage
Which is the statement below about silhouette coefficient is not true?
A. Observations with a large silhouette coefficient value (e.g. around 1) are very well clustered.
B. Silhouette coefficient value is between -1 and 1, inclusive.
C. Silhouette coefficient only measures how well-separated a cluster is from other clusters.
D. Observations with a negative silhouette coefficient value are probably placed in the wrong cluster.
C. Silhouette coefficient only measures how well-separated a cluster is from other clusters.
We would like to use Silhouette coefficient to determine the optimal number of clusters for an unsupervised learning technique. From the results below, what would be a optimal number of clusters?
K | Average value of Silhouette coefficients
2 0.55
3 0.89
5 0.37
A. 5
B. 2
C. 3
D. 6
C. 3
just choose highest number