1/89
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced | Call with Kai |
|---|
No analytics yet
Send a link to your students to track their progress
An unsupervised learning algorithm that groups objects into clusters according to their similarities and does not require the number of clusters to be defined prior to implementation. a) K-Means Clustering b) Gaussian Mixture Model c) Hierarchical Clustering d) DBSCAN
c
The tree-like diagram that visualizes the hierarchical clustering process, displaying all steps followed from individual data points to the complete merged dataset. a) Scatter plot b) Heatmap c) Dendrogram d) Histogram
c
The type of hierarchical clustering that uses a bottom-up approach, where each data point starts as its own cluster before iteratively merging with the closest cluster. a) Divisive Clustering b) Agglomerative Clustering c) Spectral Clustering d) Density-Based Clustering
b
The type of hierarchical clustering that uses a top-down approach, starting with a single cluster containing all data points before recursively splitting into smaller clusters. a) Agglomerative Clustering b) Divisive Clustering c) K-Means Clustering d) Mean-Shift Clustering
b
In a dendrogram, each leaf node represents: a) A merged cluster b) An individual data point c) The distance between clusters d) The optimal number of clusters
b
In a dendrogram, the height of the branches indicates: a) The size of each cluster b) The number of data points c) The distance between clusters d) The optimal cluster count
c
The method of determining a specific number of clusters from a dendrogram by drawing a horizontal line across it is called: a) Pruning b) Splitting c) Cutting d) Trimming
c
The distance metric defined as the straight-line distance between two points in a multidimensional space, expressed as √((x₂ − x₁)² + (y₂ − y₁)²). a) Manhattan Distance b) Euclidean Distance c) Minkowski Distance d) Mahalanobis Distance
b
The distance metric that measures the straight-line distance between two points, where smaller values indicate higher similarity. a) Cosine Distance b) Euclidean Distance c) Hamming Distance d) Chebyshev Distance
b
The linkage criterion that defines the distance between two clusters as the minimum distance between any pair of points from the two clusters. a) Complete Linkage b) Average Linkage c) Single Linkage d) Ward's Method
c
The linkage criterion that defines the distance between two clusters as the maximum distance between any pair of points from the two clusters. a) Single Linkage b) Complete Linkage c) Average Linkage d) Ward's Method
b
The linkage criterion that computes the mean distance between all pairs of points across two clusters, providing a balance between single and complete linkage. a) Single Linkage b) Complete Linkage c) Average Linkage d) Centroid Linkage
c
The linkage method that minimizes total within-cluster variance by merging clusters that result in the smallest increase in the sum of squared distances. a) Single Linkage b) Complete Linkage c) Average Linkage d) Ward's Method
d
In the case study, the dataset used for hierarchical clustering contains membership data for how many individuals? a) 100 b) 150 c) 200 d) 500
c
In the case study, the two primary features selected for clustering analysis are: a) Age and Annual Income b) Annual Income and Spending Score c) Spending Score and Gender d) Age and Spending Score
b
According to the case study, the Spending Score ranges from: a) 0 to 50 b) 1 to 100 c) 1 to 10 d) 0 to 200
b
The case study uses which Python library for generating the dendrogram? a) sklearn.cluster b) scipy.cluster.hierarchy c) matplotlib.pyplot d) seaborn
b
The case study uses which scikit-learn class for training the agglomerative clustering model? a) KMeans b) AgglomerativeClustering c) Birch d) MeanShift
b
In the dendrogram, the imaginary cut line is drawn at a y-value of: a) 100 b) 150 c) 200 d) 250
b
Based on the dendrogram analysis, the optimal number of clusters identified is: a) 3 b) 4 c) 5 d) 6
c
The evaluation metric used in the case study to assess clustering quality is: a) Adjusted Rand Index b) Davies-Bouldin Index c) Silhouette Score d) Calinski-Harabasz Index
c
The Silhouette Score formula is defined as s = (b − a) / max(a, b), where 'a' represents: a) Mean nearest-cluster distance b) Mean intra-cluster distance c) Maximum inter-cluster distance d) Minimum intra-cluster distance
b
In the Silhouette Score formula, 'b' represents: a) Mean intra-cluster distance b) Mean nearest-cluster distance c) Maximum intra-cluster distance d) Minimum inter-cluster distance
b
For predicting new data points, the case study calculates the ______ of each cluster to assign the nearest one. a) Median b) Mode c) Centroids d) Standard deviation
c
In the case study prediction, the new customer data point with Annual Income = 70k and Spending Score = 60 is assigned using: a) np.mean() b) np.argmin() c) np.argmax() d) np.min()
b
The cluster label assigned to customers with "High Income, Low Spending" is: a) Standard b) Target c) Careful d) Sensible
c
The cluster label assigned to customers with "High Income, High Spending" (Primary Marketing Target) is: a) Careful b) Standard c) Target d) Spendthrift
c
The cluster label assigned to customers with "Low Income, High Spending" is: a) Sensible b) Careful c) Spendthrift d) Standard
c
The cluster label assigned to customers with "Low Income, Low Spending" is: a) Standard b) Sensible c) Careful d) Target
b
The cluster label assigned to customers with "Average Income, Average Spending" is: a) Sensible b) Careful c) Target d) Standard
d
The document states that hierarchical clustering is specifically designed for: a) Real-time prediction b) Big Data processing c) Exploratory Data Analysis (EDA) d) Supervised classification
c
According to the limitations, the time complexity of hierarchical clustering is: a) O(n) b) O(n log n) c) O(n²) d) O(2ⁿ)
c
The document describes the algorithm as "greedy" because it will not: a) Handle missing values b) Backtrack once two points have been clustered together c) Process categorical data d) Converge properly
b
The case study's dendrogram figure size is set to: a) (10, 7) b) (15, 8) c) (12, 6) d) (20, 10)
b
In the case study, the model uses the linkage method: a) 'single' b) 'complete' c) 'average' d) 'ward'
d
In the case study model training, the distance metric used is: a) 'manhattan' b) 'euclidean' c) 'cosine' d) 'minkowski'
b
Hierarchical clustering requires the number of clusters to be defined prior to its implementation.
False
A dendrogram's horizontal line represents the merge of clusters.
True
In agglomerative clustering, all data points start as a single cluster before being split.
False
The first step in the agglomerative clustering process is to compute the distance matrix.
False
In the step-by-step process for divisive clustering, the cluster with the highest heterogeneity is chosen for splitting.
True
Euclidean distance is the only distance metric that can be used in hierarchical clustering.
False
Single linkage calculates the distance between two clusters as the maximum distance between any pair of points.
False
Complete linkage tends to produce more compact and evenly shaped clusters.
True
Ward's Method is particularly effective for categorical data.
False
One advantage of hierarchical clustering is that prior knowledge of the number of clusters is unnecessary.
True
Hierarchical clustering is deterministic, meaning it produces the same result upon each execution using the same dataset.
True
The model is ideal for Big Data applications due to its high processing efficiency.
False
In the case study, features are extracted using dataset.iloc[:, [3, 4]].values.
True
The case study's dendrogram is created using sch.dendrogram(sch.linkage(X, method='ward')).
True
The AgglomerativeClustering model in the case study is trained with n_clusters=3.
False
Hierarchical clustering has a native .predict() method that can be used directly for new data points.
False
The case study saves the cluster visualization as 'customer_clusters.png'.
True
In healthcare applications, hierarchical clustering can be used to group patients based on symptoms, medical history, or genetic data.
True
One limitation listed is the model's sensitivity to outliers, where one noisy data point can spoil the whole tree.
True
The case study uses np.argmax(distances) to assign a new data point to the nearest centroid.
False
In the biological data analysis application, hierarchical clustering is often visualized using dendrograms to show evolutionary relationships.
True
The tree-like structure that visualizes the hierarchy of clusters is called a ______. a) histogram b) scatter plot c) dendrogram d) heatmap
c
Drawing a horizontal line across a dendrogram to partition it into a specific number of clusters is called ______. a) pruning b) cutting c) splitting d) trimming
b
The type of hierarchical clustering that uses a bottom-up approach is ______ Clustering. a) Divisive b) Agglomerative c) Spectral d) Density-Based
b
The type of hierarchical clustering that uses a top-down approach is ______ Clustering. a) Agglomerative b) Divisive c) K-Means d) Mean-Shift
b
The first step in the agglomerative clustering process is to ______ Clusters. a) Compute Distance b) Initialize c) Merge d) Split
b
In agglomerative clustering, after initializing clusters, the next step is to compute the ______ Matrix. a) Confusion b) Distance c) Covariance d) Correlation
b
In divisive clustering, the process stops when each data point forms its own cluster or a desired number of clusters is ______. a) initialized b) achieved c) predicted d) visualized
b
The straight-line distance between two points in multidimensional space is the ______ distance. a) Manhattan b) Euclidean c) Minkowski d) Mahalanobis
b
The linkage criterion using the minimum distance between any pair of points from two clusters is ______ Linkage. a) Complete b) Average c) Single d) Ward's
c
The linkage criterion using the maximum distance between any pair of points from two clusters is ______ Linkage. a) Single b) Complete c) Average d) Ward's
b
The linkage method that minimizes total within-cluster variance is ______ Method. a) Single b) Complete c) Average d) Ward's
d
The dataset used in the case study is the ______ Customers Dataset. a) Supermarket b) Mall c) Retail d) Wholesale
b
The case study focuses on Annual Income and ______ Score. a) Credit b) Spending c) Loyalty d) Risk
b
The dendrogram in the case study is created using the library \_\_\_\_\_\_.cluster.hierarchy.
a) sklearn
b) scipy
c) pandas
d) numpy
b
The imaginary cut line on the dendrogram is drawn at a y-value of ______. a) 100 b) 150 c) 200 d) 250
b
Based on the dendrogram analysis, the optimal number of clusters is ______. a) 3 b) 4 c) 5 d) 6
c
The clustering model in the case study is evaluated using the ______ Score. a) F1 b) Silhouette c) Accuracy d) Precision
b
To predict new data, the case study calculates the ______ of each cluster. a) medians b) centroids c) modes d) ranges
b
The new customer data point with Annual Income = 70 and Spending Score = 60 is assigned to the nearest centroid using np.\_\_\_\_\_\_(distances).
a) argmax
b) argmin
c) max
d) min
b
The cluster with "High Income, Low Spending" is labeled as ______. a) Standard b) Target c) Careful d) Sensible
c
The cluster with "High Income, High Spending" (Primary Marketing Target) is labeled as ______. a) Careful b) Target c) Spendthrift d) Standard
b
The cluster with "Low Income, High Spending" is labeled as ______. a) Sensible b) Spendthrift c) Careful d) Standard
b
The cluster with "Low Income, Low Spending" is labeled as ______. a) Standard b) Sensible c) Careful d) Target
b
The cluster with "Average Income, Average Spending" is labeled as ______. a) Sensible b) Careful c) Target d) Standard
d
Hierarchical clustering is specifically designed for ______ Data Analysis (EDA). a) Explanatory b) Exploratory c) Extended d) Experimental
b
The time complexity of hierarchical clustering is O(______). a) n b) n² c) n log n d) log n
b
The algorithm is described as "______" because it will not backtrack once two points have been clustered together. a) stable b) greedy c) deterministic d) efficient
b
One advantage is that the model is ______, producing the same result upon each execution using the same dataset. a) stochastic b) probabilistic c) deterministic d) random
c
In healthcare, hierarchical clustering helps identify patterns in diseases and classify patients into ______ groups. a) age b) risk c) gender d) geographic
b
The case study's dendrogram x-axis is labeled '______ (Data Points)'. a) Clusters b) Customers c) Samples d) Observations
b
The case study's dendrogram y-axis is labeled '______ Distances'. a) Manhattan b) Euclidean c) Minkowski d) Cosine
b
The document states that the model is suitable for high-quality, high-______ analysis for small to mid-sized data. a) speed b) detail c) throughput d) volume
b
The model training line in the case study is: hc = AgglomerativeClustering(n_clusters=5, metric='euclidean', \_\_\_\_\_\_='ward').
a) method
b) linkage
c) criterion
d) algorithm
b