Note
0.0
(0)
Rate it
Take a practice test
Chat with Kai
undefined Flashcards
0 Cards
0.0
(0)
Explore Top Notes
F451
Note
Studied by 1 person
5.0
(1)
Chapter 7 - The Economics of Big Business
Note
Studied by 11 people
5.0
(1)
Chapter 6: Earth Systems and Resources
Note
Studied by 4 people
5.0
(1)
Ecology: Communitiy
Note
Studied by 7 people
5.0
(1)
Related Rates
Note
Studied by 160 people
5.0
(1)
Chapter 15 - Governments in the final decade of 20th Century
Note
Studied by 20 people
5.0
(1)
Home
Clustering Methods
Clustering Methods
Clustering
Clustering aims to find natural groupings within data.
Samples within a group are more similar to each other than samples from different groups.
Expressed visually using dendrograms (tree diagrams).
Clustering imposes structure, which may not always be present in the data.
Applications of Clustering
Classification of species (taxonomy).
Classification of vegetation communities.
Classification of soil types.
Used to classify areas for sampling.
Less suitable for ecological communities where there are intermediate cases.
Clustering vs. Discriminant Function Analysis
Clustering:
Identifies groups without predefined categories.
An unsupervised method, using the data to define the groups.
Discriminant Function Analysis:
Predefines groups and determines the differences between them.
A supervised method.
Not covered but similar to principal components analysis with predefined groups.
Steps in Clustering Analysis
Generate distance or dissimilarity matrices.
Choose a clustering approach.
Types of Clustering Approaches
Agglomerative vs. Divisive.
Agglomerative: Builds up clusters by adding samples.
Divisive: Starts with one big group and divides it.
Hierarchical vs. Non-Hierarchical.
Hierarchical: Once a sample is in a group, it stays there.
Non-Hierarchical: Samples can switch groups based on an iterative measure.
Weighted: Emphasizes certain groupings.
Hierarchical Agglomerative Cluster Analysis
Starts with a pairwise similarity or dissimilarity matrix.
The most similar samples join first, then builds up.
Groups are combined until there is one large group.
Represented in a dendrogram.
Types of Linkage in Hierarchical Clustering
Single Linkage:
Uses the smallest dissimilarity.
Also known as the nearest neighbor method.
Produces cluster chains and elongated dendrograms.
Complete Linkage:
Uses the largest dissimilarity.
Sensitive to outliers.
Average Linkage:
Uses the group means.
Most common is the UPGMA (Unweighted Pair Group Method with Arithmetic Averages).
UPGMA (Unweighted Pair Group Method with Arithmetic Averages)
Uses averages of different linkages.
A step-by-step process is used for understanding.
Weighted pair group mean methods can also be used, weighting some of the differences.
Unweighted paired groups method uses centroids.
Ward's minimal variance: Forms clusters by minimizing the within-cluster sum of squares; similar sized clusters.
Uses Euclidean distance.
The others can use any type.
Example of UPGMA
Using a matrix with samples and species.
Bay Curtis similarities are produced.
Identify the highest similarity to form the first cluster.
Calculate the next cluster by averaging the differences.
Use proportional averaging to get the final clusters.
Non-Hierarchical Clustering
Uses iterative processes and randomization.
Samples are rearranged until the optimal cluster is achieved.
K-means clustering is a common method, where K is the number of clusters.
Membership is evaluated by defined criteria.
K-means is based on metric Euclidean data.
Determining the Number of Clusters (K)
Elbow Plot (Scree Plot):
Plots the weighted sum of squares.
Look for the elbow to determine the optimal number of clusters.
Calinsky-Harabasz Criterion:
KolinskyHarabasz = \frac{BetweenClusterVariance}{WithinClusterVariance}
Uses the ratio of between-cluster variance to within-cluster variance.
Higher values mean distinct and well-separated Clusters.
Optimal K is the peak on the Calinsky-Harabasz plot.
Silhouette Width:
Method compares how similar objects are within clusters compared to others.
Gap Statistic:
Method compares the total within variation for different clusters and compares that under a different reference distribution.
Applications and Limitations of Cluster Analysis
Useful for classifying things like soil samples, species, vegetation communities.
Less useful for environmental scientists and ecologists due to its rigid classification.
Ordinal methods are better for clinal data and gradients.
Note
0.0
(0)
Rate it
Take a practice test
Chat with Kai
undefined Flashcards
0 Cards
0.0
(0)
Explore Top Notes
F451
Note
Studied by 1 person
5.0
(1)
Chapter 7 - The Economics of Big Business
Note
Studied by 11 people
5.0
(1)
Chapter 6: Earth Systems and Resources
Note
Studied by 4 people
5.0
(1)
Ecology: Communitiy
Note
Studied by 7 people
5.0
(1)
Related Rates
Note
Studied by 160 people
5.0
(1)
Chapter 15 - Governments in the final decade of 20th Century
Note
Studied by 20 people
5.0
(1)