Hierarchical Clustering

0.0(0)
Studied by 0 people
call kaiCall Kai
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
GameKnowt Play
Card Sorting

1/89

encourage image

There's no tags or description

Looks like no tags are added yet.

Last updated 2:34 PM on 4/27/26
Name
Mastery
Learn
Test
Matching
Spaced
Call with Kai

No analytics yet

Send a link to your students to track their progress

90 Terms

1
New cards

An unsupervised learning algorithm that groups objects into clusters according to their similarities and does not require the number of clusters to be defined prior to implementation. a) K-Means Clustering b) Gaussian Mixture Model c) Hierarchical Clustering d) DBSCAN

c

2
New cards

The tree-like diagram that visualizes the hierarchical clustering process, displaying all steps followed from individual data points to the complete merged dataset. a) Scatter plot b) Heatmap c) Dendrogram d) Histogram

c

3
New cards

The type of hierarchical clustering that uses a bottom-up approach, where each data point starts as its own cluster before iteratively merging with the closest cluster. a) Divisive Clustering b) Agglomerative Clustering c) Spectral Clustering d) Density-Based Clustering

b

4
New cards

The type of hierarchical clustering that uses a top-down approach, starting with a single cluster containing all data points before recursively splitting into smaller clusters. a) Agglomerative Clustering b) Divisive Clustering c) K-Means Clustering d) Mean-Shift Clustering

b

5
New cards

In a dendrogram, each leaf node represents: a) A merged cluster b) An individual data point c) The distance between clusters d) The optimal number of clusters

b

6
New cards

In a dendrogram, the height of the branches indicates: a) The size of each cluster b) The number of data points c) The distance between clusters d) The optimal cluster count

c

7
New cards

The method of determining a specific number of clusters from a dendrogram by drawing a horizontal line across it is called: a) Pruning b) Splitting c) Cutting d) Trimming

c

8
New cards

The distance metric defined as the straight-line distance between two points in a multidimensional space, expressed as √((x₂ − x₁)² + (y₂ − y₁)²). a) Manhattan Distance b) Euclidean Distance c) Minkowski Distance d) Mahalanobis Distance

b

9
New cards

The distance metric that measures the straight-line distance between two points, where smaller values indicate higher similarity. a) Cosine Distance b) Euclidean Distance c) Hamming Distance d) Chebyshev Distance

b

10
New cards

The linkage criterion that defines the distance between two clusters as the minimum distance between any pair of points from the two clusters. a) Complete Linkage b) Average Linkage c) Single Linkage d) Ward's Method

c

11
New cards

The linkage criterion that defines the distance between two clusters as the maximum distance between any pair of points from the two clusters. a) Single Linkage b) Complete Linkage c) Average Linkage d) Ward's Method

b

12
New cards

The linkage criterion that computes the mean distance between all pairs of points across two clusters, providing a balance between single and complete linkage. a) Single Linkage b) Complete Linkage c) Average Linkage d) Centroid Linkage

c

13
New cards

The linkage method that minimizes total within-cluster variance by merging clusters that result in the smallest increase in the sum of squared distances. a) Single Linkage b) Complete Linkage c) Average Linkage d) Ward's Method

d

14
New cards

In the case study, the dataset used for hierarchical clustering contains membership data for how many individuals? a) 100 b) 150 c) 200 d) 500

c

15
New cards

In the case study, the two primary features selected for clustering analysis are: a) Age and Annual Income b) Annual Income and Spending Score c) Spending Score and Gender d) Age and Spending Score

b

16
New cards

According to the case study, the Spending Score ranges from: a) 0 to 50 b) 1 to 100 c) 1 to 10 d) 0 to 200

b

17
New cards

The case study uses which Python library for generating the dendrogram? a) sklearn.cluster b) scipy.cluster.hierarchy c) matplotlib.pyplot d) seaborn

b

18
New cards

The case study uses which scikit-learn class for training the agglomerative clustering model? a) KMeans b) AgglomerativeClustering c) Birch d) MeanShift

b

19
New cards

In the dendrogram, the imaginary cut line is drawn at a y-value of: a) 100 b) 150 c) 200 d) 250

b

20
New cards

Based on the dendrogram analysis, the optimal number of clusters identified is: a) 3 b) 4 c) 5 d) 6

c

21
New cards

The evaluation metric used in the case study to assess clustering quality is: a) Adjusted Rand Index b) Davies-Bouldin Index c) Silhouette Score d) Calinski-Harabasz Index

c

22
New cards

The Silhouette Score formula is defined as s = (b − a) / max(a, b), where 'a' represents: a) Mean nearest-cluster distance b) Mean intra-cluster distance c) Maximum inter-cluster distance d) Minimum intra-cluster distance

b

23
New cards

In the Silhouette Score formula, 'b' represents: a) Mean intra-cluster distance b) Mean nearest-cluster distance c) Maximum intra-cluster distance d) Minimum inter-cluster distance

b

24
New cards

For predicting new data points, the case study calculates the ______ of each cluster to assign the nearest one. a) Median b) Mode c) Centroids d) Standard deviation

c

25
New cards

In the case study prediction, the new customer data point with Annual Income = 70k and Spending Score = 60 is assigned using: a) np.mean() b) np.argmin() c) np.argmax() d) np.min()

b

26
New cards

The cluster label assigned to customers with "High Income, Low Spending" is: a) Standard b) Target c) Careful d) Sensible

c

27
New cards

The cluster label assigned to customers with "High Income, High Spending" (Primary Marketing Target) is: a) Careful b) Standard c) Target d) Spendthrift

c

28
New cards

The cluster label assigned to customers with "Low Income, High Spending" is: a) Sensible b) Careful c) Spendthrift d) Standard

c

29
New cards

The cluster label assigned to customers with "Low Income, Low Spending" is: a) Standard b) Sensible c) Careful d) Target

b

30
New cards

The cluster label assigned to customers with "Average Income, Average Spending" is: a) Sensible b) Careful c) Target d) Standard

d

31
New cards

The document states that hierarchical clustering is specifically designed for: a) Real-time prediction b) Big Data processing c) Exploratory Data Analysis (EDA) d) Supervised classification

c

32
New cards

According to the limitations, the time complexity of hierarchical clustering is: a) O(n) b) O(n log n) c) O(n²) d) O(2ⁿ)

c

33
New cards

The document describes the algorithm as "greedy" because it will not: a) Handle missing values b) Backtrack once two points have been clustered together c) Process categorical data d) Converge properly

b

34
New cards

The case study's dendrogram figure size is set to: a) (10, 7) b) (15, 8) c) (12, 6) d) (20, 10)

b

35
New cards

In the case study, the model uses the linkage method: a) 'single' b) 'complete' c) 'average' d) 'ward'

d

36
New cards

In the case study model training, the distance metric used is: a) 'manhattan' b) 'euclidean' c) 'cosine' d) 'minkowski'

b

37
New cards

Hierarchical clustering requires the number of clusters to be defined prior to its implementation.

False

38
New cards

A dendrogram's horizontal line represents the merge of clusters.

True

39
New cards

In agglomerative clustering, all data points start as a single cluster before being split.

False

40
New cards

The first step in the agglomerative clustering process is to compute the distance matrix.

False

41
New cards

In the step-by-step process for divisive clustering, the cluster with the highest heterogeneity is chosen for splitting.

True

42
New cards

Euclidean distance is the only distance metric that can be used in hierarchical clustering.

False

43
New cards

Single linkage calculates the distance between two clusters as the maximum distance between any pair of points.

False

44
New cards

Complete linkage tends to produce more compact and evenly shaped clusters.

True

45
New cards

Ward's Method is particularly effective for categorical data.

False

46
New cards

One advantage of hierarchical clustering is that prior knowledge of the number of clusters is unnecessary.

True

47
New cards

Hierarchical clustering is deterministic, meaning it produces the same result upon each execution using the same dataset.

True

48
New cards

The model is ideal for Big Data applications due to its high processing efficiency.

False

49
New cards

In the case study, features are extracted using dataset.iloc[:, [3, 4]].values.

True

50
New cards

The case study's dendrogram is created using sch.dendrogram(sch.linkage(X, method='ward')).

True

51
New cards

The AgglomerativeClustering model in the case study is trained with n_clusters=3.

False

52
New cards

Hierarchical clustering has a native .predict() method that can be used directly for new data points.

False

53
New cards

The case study saves the cluster visualization as 'customer_clusters.png'.

True

54
New cards

In healthcare applications, hierarchical clustering can be used to group patients based on symptoms, medical history, or genetic data.

True

55
New cards

One limitation listed is the model's sensitivity to outliers, where one noisy data point can spoil the whole tree.

True

56
New cards

The case study uses np.argmax(distances) to assign a new data point to the nearest centroid.

False

57
New cards

In the biological data analysis application, hierarchical clustering is often visualized using dendrograms to show evolutionary relationships.

True

58
New cards

The tree-like structure that visualizes the hierarchy of clusters is called a ______. a) histogram b) scatter plot c) dendrogram d) heatmap

c

59
New cards

Drawing a horizontal line across a dendrogram to partition it into a specific number of clusters is called ______. a) pruning b) cutting c) splitting d) trimming

b

60
New cards

The type of hierarchical clustering that uses a bottom-up approach is ______ Clustering. a) Divisive b) Agglomerative c) Spectral d) Density-Based

b

61
New cards

The type of hierarchical clustering that uses a top-down approach is ______ Clustering. a) Agglomerative b) Divisive c) K-Means d) Mean-Shift

b

62
New cards

The first step in the agglomerative clustering process is to ______ Clusters. a) Compute Distance b) Initialize c) Merge d) Split

b

63
New cards

In agglomerative clustering, after initializing clusters, the next step is to compute the ______ Matrix. a) Confusion b) Distance c) Covariance d) Correlation

b

64
New cards

In divisive clustering, the process stops when each data point forms its own cluster or a desired number of clusters is ______. a) initialized b) achieved c) predicted d) visualized

b

65
New cards

The straight-line distance between two points in multidimensional space is the ______ distance. a) Manhattan b) Euclidean c) Minkowski d) Mahalanobis

b

66
New cards

The linkage criterion using the minimum distance between any pair of points from two clusters is ______ Linkage. a) Complete b) Average c) Single d) Ward's

c

67
New cards

The linkage criterion using the maximum distance between any pair of points from two clusters is ______ Linkage. a) Single b) Complete c) Average d) Ward's

b

68
New cards

The linkage method that minimizes total within-cluster variance is ______ Method. a) Single b) Complete c) Average d) Ward's

d

69
New cards

The dataset used in the case study is the ______ Customers Dataset. a) Supermarket b) Mall c) Retail d) Wholesale

b

70
New cards

The case study focuses on Annual Income and ______ Score. a) Credit b) Spending c) Loyalty d) Risk

b

71
New cards

The dendrogram in the case study is created using the library \_\_\_\_\_\_.cluster.hierarchy. a) sklearn b) scipy c) pandas d) numpy

b

72
New cards

The imaginary cut line on the dendrogram is drawn at a y-value of ______. a) 100 b) 150 c) 200 d) 250

b

73
New cards

Based on the dendrogram analysis, the optimal number of clusters is ______. a) 3 b) 4 c) 5 d) 6

c

74
New cards

The clustering model in the case study is evaluated using the ______ Score. a) F1 b) Silhouette c) Accuracy d) Precision

b

75
New cards

To predict new data, the case study calculates the ______ of each cluster. a) medians b) centroids c) modes d) ranges

b

76
New cards

The new customer data point with Annual Income = 70 and Spending Score = 60 is assigned to the nearest centroid using np.\_\_\_\_\_\_(distances). a) argmax b) argmin c) max d) min

b

77
New cards

The cluster with "High Income, Low Spending" is labeled as ______. a) Standard b) Target c) Careful d) Sensible

c

78
New cards

The cluster with "High Income, High Spending" (Primary Marketing Target) is labeled as ______. a) Careful b) Target c) Spendthrift d) Standard

b

79
New cards

The cluster with "Low Income, High Spending" is labeled as ______. a) Sensible b) Spendthrift c) Careful d) Standard

b

80
New cards

The cluster with "Low Income, Low Spending" is labeled as ______. a) Standard b) Sensible c) Careful d) Target

b

81
New cards

The cluster with "Average Income, Average Spending" is labeled as ______. a) Sensible b) Careful c) Target d) Standard

d

82
New cards

Hierarchical clustering is specifically designed for ______ Data Analysis (EDA). a) Explanatory b) Exploratory c) Extended d) Experimental

b

83
New cards

The time complexity of hierarchical clustering is O(______). a) n b) n² c) n log n d) log n

b

84
New cards

The algorithm is described as "______" because it will not backtrack once two points have been clustered together. a) stable b) greedy c) deterministic d) efficient

b

85
New cards

One advantage is that the model is ______, producing the same result upon each execution using the same dataset. a) stochastic b) probabilistic c) deterministic d) random

c

86
New cards

In healthcare, hierarchical clustering helps identify patterns in diseases and classify patients into ______ groups. a) age b) risk c) gender d) geographic

b

87
New cards

The case study's dendrogram x-axis is labeled '______ (Data Points)'. a) Clusters b) Customers c) Samples d) Observations

b

88
New cards

The case study's dendrogram y-axis is labeled '______ Distances'. a) Manhattan b) Euclidean c) Minkowski d) Cosine

b

89
New cards

The document states that the model is suitable for high-quality, high-______ analysis for small to mid-sized data. a) speed b) detail c) throughput d) volume

b

90
New cards

The model training line in the case study is: hc = AgglomerativeClustering(n_clusters=5, metric='euclidean', \_\_\_\_\_\_='ward'). a) method b) linkage c) criterion d) algorithm

b