Eurovision Song Contest as a case study for multivariate data analysis.
Objective: Determine if Eurovision judging is fair or if underlying patterns exist related to geopolitical relationships.
Data: Matrix of countries scoring other countries' acts.
Technique: Cluster analysis used to generate a dendrogram, revealing non-random patterns.
Findings:
Nordic/Scandinavian block.
Greece and Cyprus cluster together.
UK and Ireland cluster together.
Bosnia and Turkey cluster.
Conclusion: Voting is biased towards friendly nations rather than based purely on the quality of the acts.
Representation: Multivariate data analysis can be represented in 2D or 3D mapping, showing clusters (e.g., Eastern, Nordic/Baltic, Western European blocks).
Australia's chances: Australia is unlikely to win due to these geopolitical biases.
Course Overview: Techniques and Data
First lecture: Focuses on the definition, collection, analysis, and exploration of multivariate data.
Second lecture: Covers Principal Components Analysis and Factor Analysis.
Other techniques:
Cluster Analysis.
Non-metric multi-dimensional scaling.
MANOVA (multivariate, analysis of variance) - similar to ANOVA but with multiple variables.
References
Quinn and Keogh (old or new edition).
Chapters accessible online through the library.
Examples using soil samples and biodiversity.
Data available for download.
Data chapter for free.
Course goals
Understand major concepts such as ordination, principal components, factor analysis, non-metric multidimensional scaling, cluster analysis, MANOVA, and PerMANOVA.
*Learn to make decisions about data and its interpretation.
*To understand and interpret scientific papers using these techniques.
Multivariate Techniques: Key Themes
Linear combinations of variables: A recurring concept.
Distance/dissimilarity/similarity measures: Used repeatedly.
Transformation of data: Note that its use differs from univariate transformation.
Standardization.
Types of Multivariate Techniques
Focus on ordination and clustering.
Brief coverage of regression (multiple regression).
Classification (some overlap with clustering).
Multivariate Data: Definition
Multiple response variables.
Variables are not necessarily independent; they interrelate.
Examples:
Biodiversity of a park.
Physical properties of an environment (soil/water chemistry).
Specialized approaches are needed because data often do not conform to traditional statistical assumptions.
Why use Multivariate Analysis
Avoid conducting multiple ANOVAs on interrelated variables.
Problem: Type I errors (false positives) increase with multiple tests.
Multivariate statistics consider the interactions of variables together.