Note

0.0(0)

Take a practice test

Chat with Kai

Knowt Play

Explore Top Notes

Studied by 13 people

Chapitre 4 - les médias comme entreprises privées

Studied by 5 people

Early Childhood Health: Physical Growth

Studied by 10 people

2024Chem. Unit 6 Test

Studied by 1 person

AMSCO AP World History 6.6, 6.7

Studied by 229 people

AP Human Geography Ultimate Guide (copy)

Studied by 58 people

Multivariate Analysis and Cluster Analysis Notes

Multivariate Analysis: Distance and Dissimilarity Measures

Introduction to Distance and Dissimilarity Measures

Multivariate analysis focuses on distance and dissimilarity measures.
Principal components analysis (PCA) indirectly uses Euclidean distance measures.
These measures are crucial for analyzing data and forming groupings, which leads to cluster analysis.
Distance matrices can also be called dissimilarity matrices; sometimes similarity matrices are used.

Overview of the Next Few Lectures

Concentrate on distance/dissimilarity matrices and similarity matrices.
Discuss transformations and standardizations of data.
Cover a range of cluster analysis methods, focusing on hierarchical and non-hierarchical clustering.
Emphasis on identifying and describing patterns, similar to PCA.

Key Concepts and Hypothesis Testing

Focus on describing patterns rather than hypothesis testing in PCA and clustering.
Hypothesis testing will be covered in more detail next week.

Resources and Practice

Recommended readings: Quinn & Keough (both old and new editions).
Practice examples and interpret data using available online resources.
Utilize datasets from both old and new editions for lectures and labs.
Examples in the lab build upon the background provided in the lecture.

Why Use Distance/Dissimilarity Matrices?

Analyzing multivariate data, like invertebrate species at different sites, necessitates considering interrelationships among variables.
Avoid issues with Type I errors and account for species interactions.
Simplify data into a more manageable format.

Applications of Distance and Dissimilarity Matrices

Clustering: Form groups from distance/dissimilarity matrices to classify data.
Ordination: Rearrange data in multi-dimensional space to produce a map (non-metric multi-dimensional scaling).
Statistical Testing: Use Anosim/permanova to statistically test differences.
Applicable to both natural experiments and manipulative experiments.

Defining Distance and Dissimilarity

Evaluate species composition or morphology of organisms.
Determine how different or alike samples are using distance and dissimilarity measures.
Dissimilarity is often used for non-metric data, while distance is used for metric data.
Matrices are often visualized in two or three-dimensional space.

Similarity vs. Dissimilarity Matrices

Dissimilarity matrices are bounded by 0 and 100: 0 means identical, 100 means nothing in common.
Similarity matrices: 100 means identical, 0 means nothing in common.
Confusing similarity and dissimilarity can lead to incorrect results.

Mutual Absences

Mutual absences: when a species is not present in both samples being compared.
Mutual absences can cause statistical and biological problems.
Linking sites based on mutual absences may not be biologically meaningful in ecological studies.
Example: Antarctica and the tropics both lacking emus doesn't make them similar.
In habitat data, mutual zeros may be important (e.g., 0% litter cover).

Simple Example and Dissimilarity Matrix

Link sites together based on species composition.
A dissimilarity matrix indicates how close sites are in multivariate space.
Lower numbers indicate closer proximity in multivariate space.
Example numbers: Sites 2 and 4 are most similar; sites 1 and 3 are most dissimilar.

Calculating Distances and Dissimilarities

Use metric measurements for continuous or ratio data.
Use non-metric measurements for ordinal or nominal count data.

Euclidean Distance

Metric distance used in principal components analysis.
Based on Pythagoras' theorem, actually Euclidean theory.
d = \sqrt{\sum (xi - yi)^2}
Where xi and yi are the values of the ith variable for the two points being compared.
Link samples with exact same values, bounded by zero, with no upper limit.

Metric vs. Non-Metric and Bray-Curtis

In two-dimensional space, Euclidean distance can be derived geometrically.
- Non-metric distance measurements cannot derive the distance.
Metric uses absolute derived distances and non-metric data uses ranks.

Bray Curtis

A commonly used non-metric measurement in Ecology.
Derived by botanists to avoid linking samples based on joint absences.
Also known as percentage dissimilarity.
Ignores joint absences, and determinants are the variables with high values.
Suited for species abundance data.
BC{ij}=\frac{\sum |x{ij}-x{ik}|}{\sum (x{ij}+x_{ik})}
Where:
- BC_{ij} is the Bray-Curtis dissimilarity between samples i and j
- x_{ij} is the abundance of species k in sample i
- x_{ik} is the abundance of species k in sample j

Bray Curtis Example

Calculate the absolute differences of species A-E between site 1 and site 2.
Calculates the addition of them and if you add zero to that it doesn't affect the number.

Implementation in R

Using the Vegan package, calculate Bray-Curtis similarities/dissimilarity matrix.

Choosing Metric vs. Non-Metric

Principal components uses Euclidean, so you can use the principal components in your linear regressions or your ANOVAs because it has the same sort of properties.
Presence of zeros in species data often means you don't want things linked.
In measurement data, joint absences may be important.

Standardization and Transformation

Transform in univariate stats to make things more normal or to satisfy homogeneity of variance.
In multivariate data analysis, transformations are used for different reasons.
Non-metric analysis doesn't care about normality.

Transformations

Transformations can down-weight very common species by changing the scale of measurement.
Examples include square root, log (adding one to avoid log of zero), fourth-root, and presence/absence transformations.
log(x+1)
Where x is the original value, add one because you can't log zero because a log of zero is negative infinity.

Examples of Transformations

Square root transformation makes the differences between numbers smaller.
Fourth-root transformation further reduces the emphasis on common species.
Presence/absence transformation converts all values to 1 (present) or 0 (absent).

Transformation Considerations

Consider the context of the data and analysis.
Do not blindly apply transformations without justification.
Determine if you want dominant variables to dominate the analysis.
In exam questions, consider if the transformation is to increase linearity or to address the dominance of certain species.

Standardization

Standardize to make each species equally important or to make each sample equally important.
Useful for comparing samples of different sizes or with different sampling efforts.
Express values as proportions or relative to the maximum value.

Calculating Proportions

Divide each species number by the total.
Useful when comparing the relative importance of species or samples.

Important Considerations

Standardizations can change the interpretation of results.
Compare raw data analysis to standardized data analysis.
Transformation is usually better than standardization.

Summary

Understand dissimilarity and distance measures.
Know the difference between metric (Euclidean) and non-metric (Bray-Curtis) measures.
Understand the role of transformations in both metric and non-metric analyses.
Apply distance and dissimilarity matrices in clustering and ordination.

Clustering Analysis

Introduction to Cluster Analysis

Cluster analysis groups samples based on the extent and samples kit.
Methods use similarity coefficients between samples (Euclidean or Bray-Curtis).
Can custom groups or map them in two or three-dimensional states.

Main Question in Clustering

Looks against Descriptive things and not a statistical test.
Do samples form natural groupings?
May be used in taxonomy, genetics, ecology, soil science, etc.

Genetic Clusters

Genetic data often has different assumptions about mutation rates and stuff.
Caveali Zivosa is a called genetic distance
Techniques in clustering and ordination.
Treat diagram based on Genetic distance.

Note

0.0(0)

Take a practice test

Chat with Kai

Knowt Play

Explore Top Notes

Studied by 13 people

Chapitre 4 - les médias comme entreprises privées

Studied by 5 people

Early Childhood Health: Physical Growth

Studied by 10 people

2024Chem. Unit 6 Test

Studied by 1 person

AMSCO AP World History 6.6, 6.7

Studied by 229 people

AP Human Geography Ultimate Guide (copy)

Studied by 58 people