Note

0.0(0)

Take a practice test

Chat with Kai

undefined Flashcards

Explore Top Notes

Studied by 1 person

Chapter 7 - The Economics of Big Business

Studied by 11 people

Chapter 6: Earth Systems and Resources

Studied by 4 people

Ecology: Communitiy

Studied by 7 people

Studied by 160 people

Chapter 15 - Governments in the final decade of 20th Century

Studied by 20 people

Clustering Methods

Clustering

Clustering aims to find natural groupings within data.
Samples within a group are more similar to each other than samples from different groups.
Expressed visually using dendrograms (tree diagrams).
Clustering imposes structure, which may not always be present in the data.

Applications of Clustering

Classification of species (taxonomy).
Classification of vegetation communities.
Classification of soil types.
Used to classify areas for sampling.
Less suitable for ecological communities where there are intermediate cases.

Clustering vs. Discriminant Function Analysis

Clustering:
- Identifies groups without predefined categories.
- An unsupervised method, using the data to define the groups.
Discriminant Function Analysis:
- Predefines groups and determines the differences between them.
- A supervised method.
- Not covered but similar to principal components analysis with predefined groups.

Steps in Clustering Analysis

Generate distance or dissimilarity matrices.
Choose a clustering approach.

Types of Clustering Approaches

Agglomerative vs. Divisive.
- Agglomerative: Builds up clusters by adding samples.
- Divisive: Starts with one big group and divides it.
Hierarchical vs. Non-Hierarchical.
- Hierarchical: Once a sample is in a group, it stays there.
- Non-Hierarchical: Samples can switch groups based on an iterative measure.
Weighted: Emphasizes certain groupings.

Hierarchical Agglomerative Cluster Analysis

Starts with a pairwise similarity or dissimilarity matrix.
The most similar samples join first, then builds up.
Groups are combined until there is one large group.
Represented in a dendrogram.

Types of Linkage in Hierarchical Clustering

Single Linkage:
- Uses the smallest dissimilarity.
- Also known as the nearest neighbor method.
- Produces cluster chains and elongated dendrograms.
Complete Linkage:
- Uses the largest dissimilarity.
- Sensitive to outliers.
Average Linkage:
- Uses the group means.
- Most common is the UPGMA (Unweighted Pair Group Method with Arithmetic Averages).

UPGMA (Unweighted Pair Group Method with Arithmetic Averages)

Uses averages of different linkages.
A step-by-step process is used for understanding.
Weighted pair group mean methods can also be used, weighting some of the differences.
Unweighted paired groups method uses centroids.
Ward's minimal variance: Forms clusters by minimizing the within-cluster sum of squares; similar sized clusters.
- Uses Euclidean distance.
- The others can use any type.

Example of UPGMA

Using a matrix with samples and species.
Bay Curtis similarities are produced.
Identify the highest similarity to form the first cluster.
Calculate the next cluster by averaging the differences.
Use proportional averaging to get the final clusters.

Non-Hierarchical Clustering

Uses iterative processes and randomization.
Samples are rearranged until the optimal cluster is achieved.
K-means clustering is a common method, where K is the number of clusters.
Membership is evaluated by defined criteria.
K-means is based on metric Euclidean data.

Determining the Number of Clusters (K)

Elbow Plot (Scree Plot):
- Plots the weighted sum of squares.
- Look for the elbow to determine the optimal number of clusters.
Calinsky-Harabasz Criterion:
- KolinskyHarabasz = \frac{BetweenClusterVariance}{WithinClusterVariance}
- Uses the ratio of between-cluster variance to within-cluster variance.
- Higher values mean distinct and well-separated Clusters.
- Optimal K is the peak on the Calinsky-Harabasz plot.
Silhouette Width:
- Method compares how similar objects are within clusters compared to others.
Gap Statistic:
- Method compares the total within variation for different clusters and compares that under a different reference distribution.

Applications and Limitations of Cluster Analysis

Useful for classifying things like soil samples, species, vegetation communities.
Less useful for environmental scientists and ecologists due to its rigid classification.
Ordinal methods are better for clinal data and gradients.

Note

0.0(0)

Take a practice test

Chat with Kai

undefined Flashcards

Explore Top Notes

Studied by 1 person

Chapter 7 - The Economics of Big Business

Studied by 11 people

Chapter 6: Earth Systems and Resources

Studied by 4 people

Ecology: Communitiy

Studied by 7 people

Studied by 160 people

Chapter 15 - Governments in the final decade of 20th Century

Studied by 20 people