Multivariate Analysis Study notes
Multivariate Data Analysis: Eurovision Example
Eurovision Song Contest as a case study for multivariate data analysis.
Objective: Determine if Eurovision judging is fair or if underlying patterns exist related to geopolitical relationships.
Data: Matrix of countries scoring other countries' acts.
Technique: Cluster analysis used to generate a dendrogram, revealing non-random patterns.
Findings:
Nordic/Scandinavian block.
Greece and Cyprus cluster together.
UK and Ireland cluster together.
Bosnia and Turkey cluster.
Conclusion: Voting is biased towards friendly nations rather than based purely on the quality of the acts.
Representation: Multivariate data analysis can be represented in 2D or 3D mapping, showing clusters (e.g., Eastern, Nordic/Baltic, Western European blocks).
Australia's chances: Australia is unlikely to win due to these geopolitical biases.
Course Overview: Techniques and Data
First lecture: Focuses on the definition, collection, analysis, and exploration of multivariate data.
Second lecture: Covers Principal Components Analysis and Factor Analysis.
Other techniques:
Cluster Analysis.
Non-metric multi-dimensional scaling.
MANOVA (multivariate, analysis of variance) - similar to ANOVA but with multiple variables.
References
Quinn and Keogh (old or new edition).
Chapters accessible online through the library.
Examples using soil samples and biodiversity.
Data available for download.
Data chapter for free.
Course goals
Understand major concepts such as ordination, principal components, factor analysis, non-metric multidimensional scaling, cluster analysis, MANOVA, and PerMANOVA.
*Learn to make decisions about data and its interpretation.
*To understand and interpret scientific papers using these techniques.
Multivariate Techniques: Key Themes
Linear combinations of variables: A recurring concept.
Distance/dissimilarity/similarity measures: Used repeatedly.
Transformation of data: Note that its use differs from univariate transformation.
Standardization.
Types of Multivariate Techniques
Focus on ordination and clustering.
Brief coverage of regression (multiple regression).
Classification (some overlap with clustering).
Multivariate Data: Definition
Multiple response variables.
Variables are not necessarily independent; they interrelate.
Examples:
Biodiversity of a park.
Physical properties of an environment (soil/water chemistry).
Specialized approaches are needed because data often do not conform to traditional statistical assumptions.
Why use Multivariate Analysis
Avoid conducting multiple ANOVAs on interrelated variables.
Problem: Type I errors (false positives) increase with multiple tests.
Multivariate statistics consider the interactions of variables together.
Types of Questions Addressed
Change in community composition.
Differences in water quality.
Changes in habitat characteristics.
Organismal traits (phenotypic traits, diet effects).
Examples of Outputs
Ordinations from principal components.
Clustering diagrams.
Non-metric multidimensional scaling plots.
Habitat Complexity: An Example
Habitat complexity is a multivariate term, comprising multiple interrelated variables.
Measurement:
Canopy cover.
Shrub cover.
Number of stems.
Diversity of trees/shrubs.
Number of logs.
Percentage of dirt cover.
Soil nutrients.
Multivariate analysis can compare sites based on these variables.
Data Analysis
Multivariate analysis summarizes existing data by measuring many variables.
Simplifies data into fewer derived variables (e.g., principal components).
Reduces Type I errors and reveals patterns.
Applicable to soil characteristics, habitat, organismal traits, water quality, etc.
Good Practices
Just because you can measure it, does not mean you should.
Brazil Football Example
Skills measured:
Passing accuracy.
Kicking strength (ball speed).
Speed on angle tracks.
Overall strength.
Principal components analysis results:
Skill Level: accuracy, technique, ball controle, etc..,
Athleticism Level: Physical strength and speed.