BC_{ij} is the Bray-Curtis dissimilarity between samples i and j
x_{ij} is the abundance of species k in sample i
x_{ik} is the abundance of species k in sample j
Bray Curtis Example
Calculate the absolute differences of species A-E between site 1 and site 2.
Calculates the addition of them and if you add zero to that it doesn't affect the number.
Implementation in R
Using the Vegan package, calculate Bray-Curtis similarities/dissimilarity matrix.
Choosing Metric vs. Non-Metric
Principal components uses Euclidean, so you can use the principal components in your linear regressions or your ANOVAs because it has the same sort of properties.
Presence of zeros in species data often means you don't want things linked.
In measurement data, joint absences may be important.
Standardization and Transformation
Transform in univariate stats to make things more normal or to satisfy homogeneity of variance.
In multivariate data analysis, transformations are used for different reasons.
Non-metric analysis doesn't care about normality.
Transformations
Transformations can down-weight very common species by changing the scale of measurement.
Examples include square root, log (adding one to avoid log of zero), fourth-root, and presence/absence transformations.
log(x+1)
Where x is the original value, add one because you can't log zero because a log of zero is negative infinity.
Examples of Transformations
Square root transformation makes the differences between numbers smaller.
Fourth-root transformation further reduces the emphasis on common species.
Presence/absence transformation converts all values to 1 (present) or 0 (absent).
Transformation Considerations
Consider the context of the data and analysis.
Do not blindly apply transformations without justification.
Determine if you want dominant variables to dominate the analysis.
In exam questions, consider if the transformation is to increase linearity or to address the dominance of certain species.
Standardization
Standardize to make each species equally important or to make each sample equally important.
Useful for comparing samples of different sizes or with different sampling efforts.
Express values as proportions or relative to the maximum value.
Calculating Proportions
Divide each species number by the total.
Useful when comparing the relative importance of species or samples.
Important Considerations
Standardizations can change the interpretation of results.
Compare raw data analysis to standardized data analysis.
Transformation is usually better than standardization.
Summary
Understand dissimilarity and distance measures.
Know the difference between metric (Euclidean) and non-metric (Bray-Curtis) measures.
Understand the role of transformations in both metric and non-metric analyses.
Apply distance and dissimilarity matrices in clustering and ordination.
Clustering Analysis
Introduction to Cluster Analysis
Cluster analysis groups samples based on the extent and samples kit.
Methods use similarity coefficients between samples (Euclidean or Bray-Curtis).
Can custom groups or map them in two or three-dimensional states.
Main Question in Clustering
Looks against Descriptive things and not a statistical test.
Do samples form natural groupings?
May be used in taxonomy, genetics, ecology, soil science, etc.
Genetic Clusters
Genetic data often has different assumptions about mutation rates and stuff.