CAP4770 Exam 2

0.0(0)
studied byStudied by 0 people
0.0(0)
full-widthCall with Kai
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
GameKnowt Play
Card Sorting

1/44

encourage image

There's no tags or description

Looks like no tags are added yet.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced
Call with Kai

No study sessions yet.

45 Terms

1
New cards

T/F - Hierarchical clustering requires a predetermined number of clusters.

False

2
New cards

T/F - Normalization is essential in clustering to ensure all variables contribute equally to the distance measures.

True

3
New cards

T/F - K-means clustering assigns records to clusters based on probabilities.

False

4
New cards

T/F - Binary similarity measures are preferred when working with continuous data.

False

5
New cards

Which of the following is an unsupervised learning task?

  • Clustering

  • Regression

  • Classification

  • Reinforcement learning

Clustering

6
New cards

Which distance measure is most commonly used in clustering but has limitations such as sensitivity to outliers?

  • Cosine similarity

  • Manhattan distance

  • Euclidean distance

  • Jaccard similarity

Euclidean distance

7
New cards

In clustering, normalization of numerical variables ensures that:

  • Larger variables dominate the clustering process

  • All variables contribute equally to the distance measures

  • Only important variables contribute to distance measures

  • Clustering becomes faster

All variables contribute equally to the distance measures

8
New cards

In k-means clustering, the number of clusters is:

  • Randomly chosen after clustering

  • Automatically determined by the algorithm

  • Predefined before running the algorithm

Predefined before running the algorithm

9
New cards

What does hierarchical clustering require that can make it computationally expensive?

  • An n x n distance matrix

  • A small dataset

  • A random seed

  • A predetermined number of clusters

An n x n distance matrix

10
New cards

T/F - The k-means algorithm always guarantees the globally optimal clustering solution.

False

11
New cards

T/F - K-means clustering is best suited for datasets with spherical clusters.

True

12
New cards

T/F - K-means is more sensitive to outliers compared to K-medoids clustering.

True

13
New cards

T/F - In centroid linkage clustering, the distance between clusters is calculated by averaging the distances between all points in the two clusters.

False

14
New cards

T/F - The sum of squared errors (SSE) decreases as the number of clusters (k) increases in k-means clustering.

True

15
New cards

T/F - The “elbow method” is used to determine the most appropriate number of clusters in k-means clustering.

True

16
New cards

T/F - K-means clustering can be applied to both continuous and categorical data without modification.

False

17
New cards

What is the main goal of k-means clustering?

  • Minimize between-cluster variance

  • Maximize between-cluster variance

  • Minimize within-cluster variance

  • Maximize within-cluster variance

Minimize within-cluster variance

18
New cards

Which of the following statements about K-medoids is true?

  • It only uses squared Euclidean distance to calculate distances

  • It is less sensitive to outliers compared to k-means

  • It uses centroids to define clusters

  • It assumes clusters are spherical

It is less sensitive to outliers compared to k-means

19
New cards

In k-means clustering, what is the purpose of the “elbow method”?

  • To determine the optimal number of clusters

  • To minimize outliers

  • To increase the number of centroids

  • To reduce the within-cluster variance

To determine the optimal number of clusters

20
New cards

Maximum Coordinate Distance is the same as _________. Both refer to the distance between two points where only the maximum absolute difference across any of the dimensions is considered.

  • Euclidean distance

  • Manhattan distance

  • Chebyshev distance

  • City block distance

Chebyshev distance

21
New cards

T/F - In agglomerative hierarchical clustering, the clustering process starts with each instance as an individual cluster.

True

22
New cards

T/F - Divisive clustering is a bottom-up approach to hierarchical clustering.

False

23
New cards

T/F - Single linkage clustering calculates the maximum distance between points in two clusters.

False

24
New cards

T/F - Ward’s method minimizes the loss of information at each step by using Error Sum of Squares (ESS).

True

25
New cards

T/F - Dendrograms are tree-like diagrams used to show the order of clustering and the distance between clusters.

True

26
New cards

In complete linkage clustering, how is the distance between two clusters measured?

  • By the average distance between all pairs of points

  • By the distance between the farthest points

  • By the distance between centroids

  • By the distance between the closest points

By the distance between the farthest points

27
New cards

What characteristic is commonly associated with single linkage clustering?

  • Spherical clusters

  • Large compact clusters

  • Globular clusters

  • Elongated, chain-like clusters

Elongated, chain-like clusters

28
New cards
<p>Which of the following instances is most similar to A?</p><ul><li><p>B</p></li><li><p>C</p></li><li><p>D</p></li><li><p>E</p></li></ul><p></p>

Which of the following instances is most similar to A?

  • B

  • C

  • D

  • E

B

29
New cards
<p>In the figure, two thresholds are represented by a solid red line and a dotted clue line. How many clusters would be formed at the solid red line threshold and at the dotted blue line threshold, respectively?</p><ul><li><p>5, 5</p></li><li><p>5, 7</p></li><li><p>6, 11</p></li><li><p>7, 7</p></li></ul><p></p>

In the figure, two thresholds are represented by a solid red line and a dotted clue line. How many clusters would be formed at the solid red line threshold and at the dotted blue line threshold, respectively?

  • 5, 5

  • 5, 7

  • 6, 11

  • 7, 7

5, 7

30
New cards

Which of the following is NOT a feature of DBSCAN?

  • Handles noise and outliers

  • Discover clusters of arbitrary shapes

  • Sensitive to initial cluster centroids

  • Does not require you to pre-specify the number of clusters

Sensitive to initial cluster centroids

31
New cards

In DBSCAN, a point is classified as a core point if:

  • It has at least a specified number of points within the epsilon radius

  • It is on the boundary of the cluster

  • It does not belong to any cluster

  • It has a large distance to its nearest neighbors

It has at least a specified number of points within the epsilon

32
New cards

What is the main parameter that defines the neighborhood of a point in DBSCAN?

  • Variance

  • Eps (epsilon)

  • k-nearest neighbords

  • Number of clusters

Epsilon

33
New cards

Which of the following methods does DBSCAN use to form clusters?

  • Identifying regions of high data density

  • Maximizing intra-cluster similarity

  • Minimizing distances to centroids

  • Dividing data into equal partitions

Identifying regions of high data density

34
New cards

T/F - In DBSCAN, all border points have more than MinPts neighbors within the Eps radius.

False

35
New cards

T/F - DBSCAN can find clusters of arbitrary shapes, unlike k-means, which assumes spherical clusters.

True

36
New cards

T/F - DBSCAN requires the number of clusters to be specified before clustering begins.

False

37
New cards

T/F - The elbow method helps in determining the optimal number of clusters for k-means clustering.

True

38
New cards

T/F - External indices require ground truth labels to evaluate clustering results.

True

39
New cards

T/F - A higher Dunn Index indicates poor clustering quality.

False

40
New cards

In a GMM, the “soft clustering: property means that:

  • Each data point is assigned a probability of belonging to each cluster

  • Each data point is assigned to exactly one cluster

Each data point is assigned a probability of belonging to each cluster

41
New cards

T/F - the Silhouette Score ranges from 0 to 1, where 1 indicates the worst clustering quality.

False

42
New cards

What does the Dunn Index measure?

  • Cohesion and dispersion

  • Silhouette coefficient

  • Intra-cluster compactness and inter-cluster separation

  • Cluster variance

Intra-cluster compactness and inter-cluster separation

43
New cards

Which of the following is an internal index for cluster evaluation?

  • Precision

  • Purity

  • Accuracy

  • Silhouette Index

Silhouette Index

44
New cards

What does a high Silhouette Score indicate about clusters?

  • Clusters overlap significantly

  • Clusters are well-separated and cohesive

  • Clusters are poorly defined

  • Clusters are compact but not well-separated

Clusters are well-separated and cohesive

45
New cards

Which of the following best describes the purpose of a Gaussian Mixture Model?

  • To perform hierarchical clustering by combining clusters in a tree-like cluster

  • To separate data into clusters using a distance-based metric

  • To model data with a single Gaussian distribution

  • To model data with overlapping clusters by representing each cluster as a Gaussian distribution

To model data with overlapping clusters by representing each cluster as a Gaussian distribution